SOCI 209/HPA 332 - LINEAR REGRESSION MODELS - Spring 2003
Professor François Nielsen
Assignment 4 - Released Tue 15 April (should have been Tue 8 April;
my mistake!)
DUE Thu 24 April
From Neter, Kutner, Nachtsheim, and Wasserman (NKNW):
1. 9.3 p. 392 (just discard influential cases?)
2. This problem uses the Yule data set. It focuses on diagnostics and remedial measures for outliers and influential cases.
a. Estimate the full model paup = constant + outratio + propold + pop for the 32 unions and save the regression diagnostics.
b. Use the studentized deleted residuals (STUDENT in SYSTAT) to identify outliers in the Y dimension, using the Bonferroni procedure with an initial a = .01 level. State the decision rule and conclusion.
c. Identify any X-outlying (high-leverage) observation using the appropriate diagnostic and rule of thumb.
d. Identify any influential observation by looking at an index plot of Cook's distance (COOK in SYSTAT) and calculating the corresponding percentiles of the appropriate F distribution for cases with high values of COOK; compare the percentiles with the cutoffs suggested in NKNW.
e. Use the Hadi procedure for robust outlier detection. (Make sure you specify the print = long option in SYSTAT to get the list of outliers.) Are results of the Hadi procedure consistent with those of the other diagnostics? Why are these particular unions deviant? (How would you find more about the various neighborhoods of metropolitan London in late 19th century?) On what grounds could one justify removing these deviant cases?
f. After selecting out the outliers identified by the Hadi procedure, estimate the following 3 models. (In SYSTAT specify print = short if you don't want all the collinearity diagnostics.) If appropriate estimate a final (4th) trimmed model (if you do, disregard any new warnings concerning outliers).paup = constant + outratioPresent the regression results in a tabular form suitable for publication.
paup = constant + outratio + propold
paup = constant + outratio + propold + pop
paup = ?
g. Reestimate the full model with the 32 cases using robust regression (IRLS) with the bisquare weight function with parameters 3.5. (See NKNW Figure 10.4 p. 419 for the shape of the bisquare weight function.) Look at the example with the GRAD data on the web for using SYSTAT's nonlin module for robust regression; the commands for this problem will look like>nonlinHow do these estimates compare to OLS with the 32 cases and OLS with the outliers removed?
>model paup = b0 + b1*outratio+b2*propold+b3*pop
>robust bisquare=3.5
>estimate
3. 9.13 p. 395 (cosmetics sales; clues of collinearity)
To do part d. with SYSTAT you need to go to the CORR module and enter the
command pearson x1 x2 x3
4. 9.14 p. 395 (cosmetics sales; interpreting VIF, advantage of experiment) Note that VIFk = 1/TOLk, and conversely TOLk = 1/VIFk, where TOLk = (1 - Rk2) and Rk2 is the coefficient of multiple determination when Xk is regressed on the other independent variables in the model. SYSTAT outputs TOL instead of VIF.
5. 10.6 p. 445 (computer-assisted learning; handling heteroscedasticity) This is a small but complete paradigm for handling heteroscedasticity. Disregard the detailed instructions in NKNW. Instead use the data set and do the following steps
7. 12.14 p. 523 (advertising agency; Cochrane-Orcutt procedure) Omit part f and g.