|
4.
Process Modeling
4.6. Case Studies in Process Modeling 4.6.3. Ultrasonic Reference Block Study
|
|||
| Plot of Data |
The first step in fitting a nonlinear function is to
simply plot the data.
This plot shows an exponentially decaying pattern to the data. This suggests that some type of exponential function might be an appropriate model for the data. |
||
| Initial Model Selection |
There are two issues that need to be addressed in
the initial model selection when fitting a nonlinear
model.
|
||
| Determining an Appropriate Functional Form for the Model |
Due to the infinite number of potential functions that can
be used for a nonlinear model, the determination of an
appropriate model is not always obvious. Some
guidelines for selecting
an appropriate model were given in the analysis
chapter.
The plot of the data will often suggest a commonly known function. In addtion, we often use scientific and engineering knowledge in determining an appropriate model. In scientific studies, we are frequently interested in fitting a theoretical model to the data. We also often have historical knowledge from previous studies (either our own data or from published studies) of functions that have fit similar data well in the past. In the abscence of a theoretical model or experience with prior data sets, selecting an appropriate function will often require a certain amount of trial and error. Regardless of whether or not we are using scientific knowledge in selecting the model, model validation is still critical in determining if our selected model is adequate. |
||
| Determining Appropriate Starting Values |
Nonlinear models are fit with iterative methods that
require starting values. In some cases, inappropriate
starting values can result in parameter estimates
for the fit that converge to a local minimum or maximum
rather than the global minimum or maximum.
Some models are relatively insensitive to the choice
of starting values while others are extremely
sensitive.
If you have prior data sets that fit similar models, these can often be used as a guide for determining good starting values. We can also sometimes make educated guesses from the functional form of the model. For some models, there may be specific methods for determining starting values. For example, sinusoidal models that are commonly used in time series are quite sensitive to good starting values. The beam deflection case study shows an example of obtaining starting values for a sinusoidal model. In the case where you do not know what good starting values would be, one approach is to create a grid of values for each of the parameters of the model and compute some measure of goodness of fit, such as the residual standard deviation, at each point on the grid. The idea is to create a broad grid that encloses reasonable values for the parameter. However, we typically want to keep the number of grid points for each parameter relatively small to keep the computational burden down (particularly as the number of parameters in the model increases). The idea is to get in the right neighborhood, not to find the optimal fit. We would pick the grid point that corresponds to the smallest residual standard deviation as the starting values. |
||
| Fitting Data to a Theoretical Model |
For this particular data set, the scientist was trying
to fit the following theoretical model.
|
||
| Prefit to Obtain Starting Values |
We used the Dataplot PREFIT command to determine
starting values based on a grid of the parameter
values. Here, our grid was 0.1 to 1.0 in increments
of 0.1. The output has been edited slightly for
display.
LEAST SQUARES NON-LINEAR PRE-FIT
SAMPLE SIZE N = 214
MODEL--ULTRASON =(EXP(-B1*METAL)/(B2+B3*METAL))
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.3281762600D+01
REPLICATION DEGREES OF FREEDOM = 192
NUMBER OF DISTINCT SUBSETS = 22
NUMBER OF LATTICE POINTS = 1000
STEP RESIDUAL * PARAMETER
NUMBER STANDARD * ESTIMATES
DEVIATION *
----------------------------------*-----------
1-- 0.35271E+02 * 0.10000E+00 0.10000E+00 0.10000E+00
FINAL PARAMETER ESTIMATES
1 B1 0.100000
2 B2 0.100000
3 B3 0.100000
RESIDUAL STANDARD DEVIATION = 35.2706031799
RESIDUAL DEGREES OF FREEDOM = 211
REPLICATION STANDARD DEVIATION = 3.2817625999
REPLICATION DEGREES OF FREEDOM = 192
The best starting values based on this grid is
to set all three parameters to 0.1.
|
||
| Nonlinear Fit Output |
The following fit output was generated by Dataplot
(it has been edited for display).
LEAST SQUARES NON-LINEAR FIT
SAMPLE SIZE N = 214
MODEL--ULTRASON =EXP(-B1*METAL)/(B2+B3*METAL)
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.3281762600D+01
REPLICATION DEGREES OF FREEDOM = 192
NUMBER OF DISTINCT SUBSETS = 22
FINAL PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE
1 B1 0.190404 (0.2206E-01) 8.6
2 B2 0.613300E-02 (0.3493E-03) 18.
3 B3 0.105266E-01 (0.8027E-03) 13.
RESIDUAL STANDARD DEVIATION = 3.3616721630
RESIDUAL DEGREES OF FREEDOM = 211
REPLICATION STANDARD DEVIATION = 3.2817625999
REPLICATION DEGREES OF FREEDOM = 192
LACK OF FIT F RATIO = 1.5474 = THE 92.6461% POINT OF THE
F DISTRIBUTION WITH 19 AND 192 DEGREES OF FREEDOM
The estimated model is
|
||
| 6-Plot for Model Validation |
When there is a single independent variable, the
6-plot
provides a convenient method for initial model
validation.
The basic assumptions for regression models are that the residuals are random observations from a common distribution with constant mean and constant standard deviation (or variance). These plots show that the variance of the residuals is not constant. In order to see this more clearly, we will generate full size plots of the predicted values with the data and the residuals against the independent variable. |
||
| Plot of Predicted Values with Original Data |
This plot shows a reasonably good fit. It is difficult to detect any violations of the fit assumptions from this plot. |
||
| Plot of Residual Values Against Independent Variable |
This plot shows the residuals have greater variance for the values of metal distance less than one. That is, the assumption of homogeneous residuals is violated. |
||
| Non-Homogeneous Variances | Although the assumption violation is a mild violation rather than a gross violation, we can try to improve the quality of the fit by addressing the non-homogeneous variances for the residuals problem. We will use transformations and weighted fits to see if we can improve on the current model. | ||