|
4.
Process Modeling
4.2. Underlying Assumptions for Process Modeling 4.2.1. What are the typical underlying assumptions in process modeling?
|
|||
| Assumption Needed for Parameter Estimation |
As discussed earlier in this section, the random
errors (the 's) in the basic model,
,must have a mean of zero at each combination of explanatory variable values to obtain valid estimates of the parameters in the functional part of the process model (the 's).
Some of the more obvious sources of random errors with non-zero means include
's with non-zero means.
|
||
Explanatory Variables Observed with Random Error Add Terms to
|
The values of explanatory variables observed with independent,
normally distributed random errors, , can be
differentiated from their true values using the definition
.Then applying the mean value theorem from multivariable calculus shows that the random errors in a model based on ,
,are [Seber (1989)] ![]() where is the random error associated
with the basic form of the model,
,under all of the usual assumptions (denoted here more carefully than is usually necessary), and is a value between
and .
This extra term in the expression of the random error,
, complicates matters because
is typically not a constant.
For most functions will depend on
the explanatory variable values and, more importantly, on
. This is the source of the problem with observing
the explanatory variable values with random error.
|
||
Correlated with
|
Because each of the components of , denoted by
, are functions of the
components of , similarly denoted by
, whenever any of the
components of simplify to
expressions that are not constant, the random variables
and
will be correlated with one another. This correlation will then usually
induce a non-zero mean in the product
.
|
||
For example, a positive correlation between
and
means that when
is large,
will also tend to be large. Similarly, when
is
small, will also tend to be small.
This could cause and
to always have the same sign,
which would preclude their product from having a mean of zero, since all of the
values of would be greater than or
equal to zero. A negative correlation, on the other hand, could mean that the
signs of these two random variables would always be opposite one another,
resulting a negative mean for .
These examples are extreme, but illustrate how correlation can cause trouble
even if both and
have zero means individually.
What will happen in any particular modeling situation will depend on the
variability of the 's, the form of the function,
the true values of the 's, and the
values of the explanatory variables.
|
|||
Biases Can Affect Parameter Estimates When Means of
's are 0
|
Even if the 's have zero means, observation of the
explanatory variables with random error can still bias the parameter estimates.
Depending on the method used to estimate the parameters, the
explanatory variables can be used in the computation of the parameter estimates
in ways in which keep the 's from canceling out. One
unfortunate example of this phenomenon is the use the of least squares to
estimate the parameters of a straight line. In this case, because of the
simplicity of the model,
,the term simplifies to
. Because this term does not involve
, it does not induce non-zero means in the
's. The way the explanatory variables enter into the
formulas for the estimates of the 's,
however, the random errors in the explanatory variables do not cancel out on
average. This results in parameter estimates that are biased and will not
approach the true parameter values no matter how much data is collected.
|
||
| Berkson Model Does Not Depend on this Assumption | There is one type of model in which errors in the measurement of the explanatory variables do not bias the parameter estimates. The Berkson model [Berkson (1950)] is a model in which the observed values of the explanatory variables are directly controlled by the experimenter while the true values of each explanatory variable vary for each observation. The differences between the observed and true values for each explanatory variable are assumed to be independent random variables from a normal distribution with a mean of zero. In addition the errors associated with each explanatory variable must be independent of the errors associated with all of the other explanatory variables and independent of the observed values of each explanatory variable. Finally, the Berkson model requires the functional part of the model to be a straight line, a plane, or a higher-dimension first order model in the explanatory variables. When these conditions are all met, the errors in the explanatory variables can be ignored. | ||
| Applications for which Berkson model correctly describes the data are most often situations where the experimenter can adjust equipment settings so that the observed values of the explanatory variables will be known ahead of time. For example, in a study of the relationship between the temperature used to dry a sample for chemical analysis and the resulting concentration of a volatile consituent, an oven might be used to prepare samples at temperatures of 300 to 500 degrees in 50 degree increments. In reality, however, the true temperature inside the oven will probably not exactly equal 450 degrees each time that setting is used (or 300 when that setting is used, etc). The Berkson model would apply, though, as long as the errors in measuring the temperature randomly differed from one another each time an observed value of 450 degrees was used and the mean of the true temperatures over many repeated runs at an oven setting of 450 degrees really was 450 degrees. Then, as long as the model was also a straight line relating the concentration to the observed values of temperature, the errors in the measurement of temperature would not bias the estimates of the parameters. | |||
| Assumption Validity Requires Careful Consideration | The validity of this assumption requires careful consideration in scientific and engineering applications. In these types of applications it is most often the case that the response variable and the explanatory variables will both actually be measured with some random error. Fortunately, however, there is also usually some knowledge of the relative amount of information in the observed values of each variable. This allows a rough assessment of how much bias there will be in the estimated values of the parameters. As long as the biases in the parameter estimates have a negligible effect on the intended use of the model, then this assumption can be considered valid from a practical point of view. Section 4.4.4, which covers model validation, points to a discussion of a practical method for checking the validity of this assumption. | ||