(TOL)k = 1 - Rk2 k = 1, 2, ..., p-1where Rk2 is the R square when Xk is regressed on the other independent variables in the model including a constant.
(VIF)k = 1/(TOL)k
Why the expression "variance inflation factor"?
The standardized regression model is
a regression model in which all the variables (X and Y) are
standardized into z-scores with mean 0 and standard deviation 1, and divided
by (n - 1)1/2 (see NKNW Section 7.5, pp. 277-284).
rXXb* = rYXwhere rXX is the matrix of correlations of X, rYX are the correlations of Y with the X, and b* is the vector of standardized regression coefficients. One can show that (VIF)k is the kth diagonal element of (rXX)-1 so that s2{bk*} = (s*)2(VIF)k = (s*)2/(1 - Rk2)where (s*)2 is the error variance of the standardized model. Thus (VIF)k measures how much the variance of the standardized regression coefficient bk* is inflated by collinearity. |
Independent variable | TOL | VIF |
TRICEPS | .001411 | 708.717 |
THIGH | .001772 | 564.334 |
MIDARM | .00956 | 104.603 |
These values indicate high collinearity levels.
Similarly, the average VIF
(VIF). = (Si=1 to p-1 (VIF)k)/(p-1)is an indicator of collinearity for the entire model.
The following diagnostics are produced (see body fat exhibit):
E{(bR - b)2} = s2{bR} + (E{bR} - b)2where bR is the ridge estimator; the mean squared error is seen as the sum of the variance of the estimate and the squared bias.
Ridge regression is based on the standardized regression model with normal equations
rXXb* = rYXwhere rXX is the matrix of correlations of X, rYX are the correlations of Y with the X, and b* is the vector of standardized regression coefficients.
(rXX + cI)bR = rYXwhere bR contains the ridge coefficient estimates, so that
bR = (rXX + cI)-1rYXThe strategy is to try several successive values of c starting from zero and choose the value for which
bk = (sY/sk)bkR (k = 1, ..., p-1)where sY and sk are the sample standard deviations of Y and Xk, respectively.
b0 = Y. - b1X.1 - ... - bp-1X.,p-1