soci209 m9 - Wilkinson's answer re: INFLUENCE

From leland@spss.com Wed Mar 17 10:42:21 1999
Date: Wed, 17 Mar 1999 08:11:40 -0600
From: "Wilkinson, Leland" <leland@spss.com>
Reply-To: systat-l@spss.com
To: "'systat-l@spss.com'" <systat-l@spss.com>
Subject: RE: INFLUENCE Plots

The manual describes the procedure exactly. The word INFLUENCE has so many
meanings that it is of no help here. The computation is very simple. The
size of a symbol is computed as follows:

Compute the Pearson correlation coefficient for all the cases.
Drop the one case
Compute the Pearson correlation coefficient again
The absolute value of the difference between the two numbers is proportional
to the size of the symbol.
The sign of the difference determines whether the symbol is filled.

This procedure is done for all points in the scatterplot.
The computations are fast because the algorithm uses a drop-out/insert
method for the calculations.

Thus, the size of the symbol reflects a particular kind of influence on the
Pearson correlation. Under several assumptions, Cook's D and other
regression influence measures are related to this statistic. The main
assumptions involve scaling of the variables. The Pearson influence is
computed after standardizing both X and Y. It therefore is of less use in a
regression context. I originally put this one into SYSTAT because Thissen
and Wainer showed its use for preliminary scatterplot diagnostics where
correlations were being computed. (Psychological Bulletin, 1981, 90,
179-184). This is particularly helpful in a SPLOM.

Also, some of the regression influence measures are amalgams of leverage and
error, so the relations are a bit more complicated. For this purpose, I
would advise using Cook's D or h or other measures to determine the size of
the symbols in a residuals plot. Blank and I discuss this in our book,
Desktop Data Analysis.

LW

> -----Original Message-----
> From: Francois Nielsen [SMTP:nielsen@email.unc.edu]
> Sent: Tuesday, March 16, 1999 1:38 PM
> To:   systat-l@spss.com
> Subject:      INFLUENCE Plots
> 
> I am using INFLUENCE plots, in which the size of the symbol denotes the
> influence of an observation on the Pearson correlation of Y and X, in my
> linear regression models course.  I would like more information than the
> STATISTICS manual (v6/v7) provides on two points: 
>   1.  When used in a regular bivariate scatterplot of Y against X, what is
> exactly the measure of influence used to size the symbols?  Is is related
> to the COOK distance and if so how?
>   2.  When used in the context of a partial regression plot, say of
> YPARTIAL(1) plotted against XPARTIAL(1), is the measure of influence
> related to the DFBETAS statistic?
>   I am still using v7.0 so I may have missed any related changes in v8.0.
> 
> FN.
> 
> ______________________________________________________________________
> 
> Francois Nielsen, Professor     919-962-5064  (office)
> Department of Sociology         919-962-1007  (sociology department)
> University of North Carolina    919-962-7568  (departmental fax)
> Chapel Hill, NC 27599-3210      919-968-0245  (home)
> E-mail: francois_nielsen@unc.edu (alias for nielsen@email.unc.edu)
> ______________________________________________________________________
>