soci209 module 9 - functional form & partial regression plots

Do not use -- Materials in this module now part of Module 10

Module 9 - FUNCTIONAL FORM & PARTIAL REGRESSION PLOTS

1. USES OF PARTIAL REGRESSION PLOTS

Various diagnostic tools and remedial measures are available to adress possible violations of the classical assumptions of multiple regression analysis. The most common tool is the residual plot in which the residuals e_i are plotted against the estimates (aka predictors) ^y_i. From the appearance of the plot we may be able to diagnose a variety of problems with the model (such as heteroskedasticity, nonlinearity, outlying observations, etc.) in the same way as for simple regression models ( see Module 3).

Partial regression plots are another diagnostic tool that permits evaluation of the role of individual variables within the multiple regression model. They are used to assess visually

whether a variable should be included or not in the model
the presence of outliers and influential cases that affect the coefficients of individual X variables in the model
the possibility of a nonlinear relationship between Y and individual X variables in the model

A partial regression plot is a way to look at the marginal role of a variable X_k in the model, given that the other independent variables are already in the model.

2. CONSTRUCTION OF A PARTIAL REGRESSION PLOT

Assume the multiple regression model (omitting the i subscript)

Y = b₀ + b₁X₁ + b₂X₂+ b₃X₃ + e

There is a regression plot for each one of the X variables.
To draw the partial regression plot of Y on X₁ "the hard way", for example, one proceeds as follows:
1. Regress Y on X₂ and X₃ and a constant, and calculate the predictors and residuals

^Y_i(X₂, X₃) = b₀ + b₂X_i2 + b₃X_i3
e_i(Y|X₂, X₃) = Y_i - ^Y_i(X₂, X₃)

2. Regress X₁ on X₂ and X₃ and a constant, and calculate the predictors and residuals

^X_i1(X₂, X₃) = b₀⁺ + b₂⁺X_i2 + b₃⁺X_i3
e_i(X₁|X₂, X₃) = X_i1 - ^X_i1(X₂, X₃)

3. The partial regression plot for X₁ is the plot of

e_i(Y|X₂, X₃) against e_i(X₁|X₂, X₃)

In practice, statistical programs such as SYSTAT have options to save the partial residuals e_i(Y|X₂, X₃) and e_i(X₁|X₂, X₃) when estimating the regression model, so one does not need to do these auxilliary regressions separately.

3. INTERPRETATION OF A PARTIAL REGRESSION PLOT & AN EXAMPLE

It can be shown that the slope of the partial regression of e_i(Y|X₂, X₃) on e_i(X₁|X₂, X₃) is equal to the estimated regression coefficient b₁ of X₁in the multiple regression model Y = b₀ + b₁X₁ + b₂X₂+ b₃X₃ + e . Thus the partial regression plot allows us to isolate the role of the specific independent variable in the multiple rgeression model. In practice one scrutinizes the plot for patterns such as the ones shown in the next exhibit.

Exhibit: Prototype partial regression plots (NKNW Figure 9.1 p. 363)

The patterns mean

pattern a, which shows no apparent relationship, suggests that X₁ does not add to the explanatory power of the model, when X₂ and X₃ are already included
pattern b suggests that a linear relationship between Y and X₁ exists, when X₂ and X₃ are already present in the model. The slope of the partial regression line is the same as the coefficient of X₁ in the multiple regression model
pattern c suggests that the partial relationship of Y with X₁ is curvilinear; one may try to model this curvilinearity with a transformation of X₁ or with a polynomial function of X₁
the plot may also reveals observations that are outliers with respect to the partial relationship of Y with X₁

As an example we look at the Graduation Rates file GRAD.SYD used in a previous assignment. The dependent variable is GRAD, the state rate of graduation from high school. We estimate the model

GRAD = CONSTANT + INC + PBLA + PHIS + EDEXP + URB.

The data and the regression results are shown in the next 2 exhibits.

Exhibit: Graduation rates data.
Exhibit: Regression of GRAD on INC, PBLA, PHIS, EDEXP, and URB (all cases included)

We look at the partial regression plots for INC, PBLA, and PHIS.

The following link shows how to do the same partial regression plots in STATA.

Exhibit: Partial regression plots for the GRAD data [m9008.htm]

As an extra refinement these partial regression plots use the INFLUENCE option of SYSTAT. In an influence plot the size of the symbol is proportional to the amount that the Pearson correlation between Y and X would change if that point were deleted. Large symbols therefore corespond to observations that are influential. The plots allow us to identify cases that are problematic with respect to specific independent variables in the model. For example, two observations stand out: DC in the plot for PBLA, and NM in the plot for PHIS.
The style of the symbol indicates the direction of the influence:

an open symbol indicates an observation that decreases (i.e., whose removal would increase) the magnitude (absolute value) of the correlation; for example, removing NM (case 32) in the partial regression plot for PHIS would increase the magnitude of the correlation between YPARTIAL(3) and XPARTIAL(3) from -.461 to -.558.
a filled symbol indicates an observation that increases (i.e., whose removal would decrease) the magnitude (absolute value) of the correlation; for example, removing DC (case 9) in the partial regression plot for PBLA would decrease the magnitude of the correlation between YPARTIAL(2) and XPARTIAL(2) from -.703 to -.641.

As I was not sure of the statistic used to determine the size of symbols with the INFLUENCE option I sent questions to the SYSTAT users list, which were answered by SYSTAT's founder Leland Wilkinson. It turns out that SYSTAT calculates the influence of an observation very simply as the absolute value of the difference in the correlation coefficient of Y and X with and without that observation.

Exhibit: Leland Wilkinson's answer to Nielsen's questions about INFLUENCE (17 March 1999)

Last modified 8 Apr 2003