|
4.
Process Modeling
4.3. Data Collection for Process Modeling
|
|||
| Six Principles of Experiment Design Applied to Process Modeling |
There are six principles of experiment design as applied to
process modeling:
|
||
| Capacity for Primary Model | For your best-guess model, make sure that the design has the capacity for estimating the coefficients of that model. For a simple example of this, if you are fitting a quadratic, then make sure you have at least three distinct horixontal axis points. | ||
| Capacity for Alternative Model | If your best-guess model happens to be incorrect, make sure that the design has the capacity to estimate the coefficients of your best-guess back-up alternative model (which means implicitly that you should have already identified such a back-up alternative model). For a simple example, if you suspect (but are not positive) that a linear model is appropriate, then it is best to employ a globally robust design (say, four points at each extreme, and three points in the middle, for a ten point design) as opposed to the locally optimal design (such as five points at each extreme). The locally optimal design will provide a best fit to the line, but have no capacity to fit a quadratic. The globally robust design will provide a good (though not optimal) fit to the line and additionally provide a good (though not optimal) fit to the quadratic. | ||
| Minimal Variance of Coefficients |
For a given model, make sure the design has the property of
minimizing the variation of the least squares estimated
coefficients. This is a general principle which is always in
effect but which in practice is hard to put in place for many
models beyond the simpler 1-factor
models. For more complicated 1-factor models, and for most
multi-factor models, the expressions for
the variance of the least squares estimates, although available,
are complicated and assume more than the analyst typically knows.
The net result is that this principle, though important, is harder
to apply beyond the simple cases.
|
||
| Sample Where the Variation Is (Non Constant Variance Case) |
Regardless of the simplicity or complexity of the model, there are
situations where certain regions of the curve are noisier than
others. A simple case is where there is a linear relationship
between X and Y but the recording device
is proportional rather than absolute and so larger values of
Y are intrinsically noisier than smaller values of
Y. In such cases, sampling where the variation is
means to have more replicated points in those regions which are
noisier. The practical answer to how many such replicated
points there should be is
where |
||
| Sample Where the Variation Is (Steep Curve Case) |
A common occurance for non-linear models is for some regions of the
curve to be steeper than others. For example, in fitting an
exponential model (small X yielding large
Y, and large Y yielding small
X) it is often the case that the Y data
in the steep region is intrinsically noisier than the
Y data in the relatively flat regions. The reason for
this is that commonly the X values themselves have
a bit of noise and this X-noise gets translated into
larger Y-noise in the steep sections than in the
shallow sections. In such cases, where we know the shape of the
response curve well enough to identify step-versus-shallow
regions, it is often a good idea to sample more heavily in the steep
regions than in shallow regions. A practical rule of thumb for where
to position the X values in such situations is to
The above rough procedure for an exponentially decreasing curve would thus yield a logarithmic preponderance of points in the steep region of the curve and relatively few points in the flatter part of the curve. |
||
| Replication | If affordable, replication should be part of every design. Replication allows us to compute an model-free estimate of the process standard deviation. Such an estimate may then be used as a criterion in an objective goodness of fit test to assess whether a given model is adequate (it makes no sense to have a model which is so complicated that the model-dependent residual standard deviation is smaller than the model-free residual standard deviation). We want to fit signal--why fit noise? Such an objective goodness of fit F test can be employed only if the design has built-in replication. Some replication is essential; replication at every point is ideal. | ||
| Randomization | Just because the X's have some natural ordering, does not mean that the data should be collected in the same order as the X's. Some aspect of randomization should enter into every experiment, and experiments for process modeling is no exception. Thus if your are sampling ten points of a curve, the ten Y values should not be collected by sequentially stepping through the X values from the smallest to the largest. If you do so, and if some extraneous drifting or wear occurs in the machine, the operator, the environment, the measuring device, etc., then that drift will unwittingly contaminate the Y values and in turn contaminate the final fit. To minimize the effect of such potential drift, it is best to randomize (use random number tables) the sequence of the X values. This will not make the drift go away, but it will spread the contaminatory drift effect fairly over the entire curve, realistically inflating the variation of the fitted values, and providing some mechanism after-the-fact (at the residual analysis model validation stage) of uncovering or discovering such a drift. If you do not randomize the run sequence, you give up your ability to detect such a drift if it occurs. | ||