HADI Robust Outlier Detection & Covariance Matrix Estimation Example

SAT 3/06/99 3:09:02 PM
SYSTAT VERSION 7.0.1
COPYRIGHT (C) 1997, SPSS INC.

Welcome to SYSTAT!

>USE 'C:\SYSTAT7\S209\GRAD.SYD'
SYSTAT Rectangular file C:\SYSTAT7\S209\GRAD.SYD,
created Wed Feb 17, 1999 at 08:32:46, contains variables:
 ID           STATE$       GRAD         INC          PWHI         PBLA
 PHIS         EDEXP        URB

>corr
The HADI robust outlier detection and covariance matrix estimation is part of
SYSTAT's CORR module

>print=long
>idvar=state$
>save gradhad
>cova grad inc pbla phis edexp urb/hadi
 
These 6 outliers are identified:
     Case      Distance
------------ ------------
AZ                5.02053
CA                5.67307
AK                5.70352
TX                6.72038
DC                7.06904
NM               13.90230
 
Means of variables of non-outlying cases
                      GRAD          INC         PBLA         PHIS        EDEXP
                    74.558    12071.178        9.549        2.644     3340.200
                       URB
                    62.596
 
HADI estimated covariance matrix
 
                      GRAD          INC         PBLA         PHIS        EDEXP
 GRAD               56.921
 INC              2555.267  2640730.604
 PBLA              -49.626    -3045.791       91.413
 PHIS               -4.847     2027.335       -2.655        7.217
 EDEXP            1123.927   900395.850    -1949.574      807.632   571965.618
 URB               -45.306    24011.094       40.903       27.858     6608.983
                       URB
 URB               499.580
 
Number of observations: 51
 
Matrix has been saved.

>mglh
>USE 'C:\SYSTAT7\S209\GRADHAD.SYD'
SYSTAT Covariance file C:\SYSTAT7\S209\GRADHAD.SYD,
created Sat Mar 06, 1999 at 15:11:32, contains variables:
 GRAD         INC          PBLA         PHIS         EDEXP        URB

>model grad=inc+pbla+phis+edexp+urb/n=51
>estimate

Since the model is estimated from a variance-covariance matrix the number od cases
has to be specified, and the model does not include a constant; the output includes
the detailed collinearity diagnostics (discussed later) as a by-product because we
had specified PRINT=LONG earlier
Eigenvalues of unit scaled X'X
 
                         1           2           3           4           5
                         2.598       1.178       0.643       0.409       0.172
 
Condition indices
 
                         1           2           3           4           5
                         1.000       1.485       2.010       2.520       3.891
 
Variance proportions
 
                         1           2           3           4           5
   INC                   0.034       0.000       0.039       0.014       0.913
   PBLA                  0.005       0.464       0.026       0.393       0.112
   PHIS                  0.052       0.004       0.828       0.098       0.018
   EDEXP                 0.042       0.022       0.091       0.397       0.448
   URB                   0.035       0.074       0.004       0.340       0.546
 
 
Dep Var: GRAD   N: 51   Multiple R: 0.816   Squared multiple R: 0.667
Adjusted squared multiple R: 0.630   Standard error of estimate: 4.404
 
Effect         Coefficient    Std Error     Std Coef Tolerance     t   P(2 Tail)
INC                  0.002        0.001        0.518     0.277    3.165    0.003
PBLA                -0.465        0.078       -0.590     0.756   -5.956    0.000
PHIS                -1.047        0.286       -0.373     0.715   -3.663    0.001
EDEXP               -0.001        0.001       -0.078     0.433   -0.597    0.553
URB                 -0.099        0.045       -0.295     0.414   -2.203    0.033
 
Effect         Coefficient    Lower   < 95%>   Upper
INC                  0.002        0.001        0.004                            
PBLA                -0.465       -0.623       -0.308                            
PHIS                -1.047       -1.623       -0.471                            
EDEXP               -0.001       -0.003        0.002                            
URB                 -0.099       -0.190       -0.009
 
Correlation matrix of regression coefficients
                           INC        PBLA        PHIS       EDEXP         URB
   INC                   1.000
   PBLA                  0.255       1.000
   PHIS                 -0.046       0.118       1.000
   EDEXP                -0.613       0.106      -0.130       1.000
   URB                  -0.594      -0.430      -0.283       0.145       1.000
 
                             Analysis of Variance


Source             Sum-of-Squares   df  Mean-Square     F-ratio       P
Regression              1745.384     5      349.077      17.994       0.000
Residual                 872.968    45       19.399

An alternative to estimating the regression model from the Hadi covariance matrix saved in the CORR module (as shown above) is to estimate the model from the original data after excluding the 6 cases flagged as outliers by the Hadi procedure.  The results are the same. 




Last modified 5 April 2000