ROBUST REGRESSION EXAMPLES (See next exhibit for summary results)
1. Estimate regular regression using OLS, for comparison purposes;
first with all 51 cases and then excluding AK, DC, and NM
use 'c:\systat\grad.sys' output robust01.prn mglh model grad=constant+inc+pbla+phis+edexp+urb estimate select state$<>"AK" AND state$<>"DC" AND state$<>"NM" model grad = constant+inc+pbla+phis+edexp+urb estimate Dep Var: GRAD N: 51 Multiple R: 0.790 Squared multiple R: 0.624 Adjusted squared multiple R: 0.582 Standard error of estimate: 5.139 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT 69.042 5.373 0.0 . 12.851 0.000 INC 0.002 0.001 0.416 0.240 2.225 0.031 PBLA -0.414 0.063 -0.652 0.861 -6.622 0.000 PHIS -0.407 0.117 -0.334 0.906 -3.483 0.001 EDEXP -0.001 0.001 -0.147 0.342 -0.944 0.350 URB -0.108 0.046 -0.307 0.491 -2.353 0.023 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression 1973.308 5 394.662 14.941 0.000 Residual 1188.641 45 26.414 ------------------------------------------------------------------------------- *** WARNING *** Case 2 has large leverage (Leverage = 0.409) Case 9 has large leverage (Leverage = 0.587) Case 32 has large leverage (Leverage = 0.605) Durbin-Watson D Statistic 2.014 First Order Autocorrelation -0.020 Dep Var: GRAD N: 48 Multiple R: 0.827 Squared multiple R: 0.684 Adjusted squared multiple R: 0.646 Standard error of estimate: 4.599 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT 62.613 6.210 0.0 . 10.082 0.000 INC 0.002 0.001 0.487 0.275 2.944 0.005 PBLA -0.452 0.082 -0.544 0.768 -5.494 0.000 PHIS -0.759 0.161 -0.467 0.769 -4.718 0.000 EDEXP -0.001 0.001 -0.086 0.458 -0.674 0.504 URB -0.110 0.047 -0.318 0.402 -2.324 0.025 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression 1923.195 5 384.639 18.187 0.000 Residual 888.253 42 21.149 ------------------------------------------------------------------------------- Durbin-Watson D Statistic 2.071 First Order Autocorrelation -0.050
2. Estimate same model using NONLIN (nonlinear regression); results should be the
same as regular OLS regression
nonlin model grad = b0 + b1*inc + b2*pbla + b3*phis + b4*edexp + b5*urb estimate Iteration No. Loss B0 B1 B2 B3 B4 B5 0 .438131D+06 .101000D-01 .102000D-01 .103000D-01 .104000D-01 .105000D-01 .106000D-01 1 .425439D+06 .102010D+01 .100769D-01 .409040D-02 .428860D-02 .103271D-01 .886020D-02 2 .139249D+04 .675509D+02 .196676D-02-.404949D+00-.398282D+00-.106339D-02 -.105744D+00 3 .118864D+04 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108312D+00 4 .118864D+04 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108312D+00 5 .118864D+04 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108312D+00 Dependent variable is GRAD Source Sum-of-Squares df Mean-Square Regression 277473.359 6 46245.560 Residual 1188.641 45 26.414 Total 278662.000 51 Mean corrected 3161.950 50 Raw R-square (1-Residual/Total) = 0.996 Mean corrected R-square (1-Residual/Corrected) = 0.624 R(observed vs predicted) square = 0.624 Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B0 69.042 5.373 12.851 58.221 79.863 B1 0.002 0.001 2.225 0.000 0.003 B2 -0.414 0.063 -6.622 -0.540 -0.288 B3 -0.407 0.117 -3.483 -0.643 -0.172 B4 -0.001 0.001 -0.944 -0.004 0.001 B5 -0.108 0.046 -2.353 -0.201 -0.016
3. Now estimate model with robust regression using ABOLUTE option (minimize sum of
absolute values of residuals); using estimate (rather than estimate/start) causes
NONLIN to use estimates from previous run as starting values robust absolute estimate Iteration No. Loss B0 B1 B2 B3 B4 B5 0 .118864D+04 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108312D+00 1 .208544D+03 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108312D+00 <lines omitted to save space> 21 .199849D+03 .650830D+02 .297313D-02-.373308D+00-.428259D+00-.285335D-02 -.189517D+00 22 .199848D+03 .650815D+02 .297293D-02-.373197D+00-.428261D+00-.285221D-02 -.189531D+00 23 .199846D+03 .650803D+02 .297279D-02-.373114D+00-.428262D+00-.285135D-02 -.189541D+00 24 .199845D+03 .650794D+02 .297268D-02-.373052D+00-.428263D+00-.285071D-02 -.189548D+00 25 .199844D+03 .650787D+02 .297260D-02-.373005D+00-.428264D+00-.285023D-02 -.189553D+00 ABSOLUTE robust regression: 51 cases have positive psi-weights The average psi-weight is 100180976520.95010 Dependent variable is GRAD Source Sum-of-Squares df Mean-Square Regression 277354.834 5 55470.967 Residual 1307.166 46 28.417 Total 278662.000 51 Mean corrected 3161.950 50 Raw R-square (1-Residual/Total) = 0.995 Mean corrected R-square (1-Residual/Corrected) = 0.587 R(observed vs predicted) square = 0.597 Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B0 65.079 . . . . B1 0.003 . . . . B2 -0.373 . . . . B3 -0.428 . . . . B4 -0.003 . . . . B5 -0.190 . . . .
4. Robust regression - HUBER method
robust huber=1.7 estimate/start Iteration No. Loss B0 B1 B2 B3 B4 B5 0 .125607D+09 .101000D+00 .102000D+00 .103000D+00 .104000D+00 .105000D+00 .106000D+00 1 .914998D+08 .102010D+02 .873183D-01 .272414D-01 .290928D-01 .894241D-01 .746028D-01 2 .118864D+04 .690420D+02 .178501D-02-.414116D+00-.407304D+00-.131865D-02 -.108312D+00 3 .105045D+04 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108312D+00 <line omitted to save space> 19 .983334D+03 .675216D+02 .203140D-02-.416370D+00-.485886D+00-.151955D-02 -.117518D+00 20 .983335D+03 .675216D+02 .203139D-02-.416370D+00-.485886D+00-.151955D-02 -.117517D+00 21 .983336D+03 .675217D+02 .203139D-02-.416370D+00-.485885D+00-.151955D-02 -.117517D+00 HUBER robust regression: 51 cases have positive psi-weights The average psi-weight is 0.94287 Dependent variable is GRAD Source Sum-of-Squares df Mean-Square Regression 277455.501 6 46242.583 Residual 1206.499 45 26.811 Total 278662.000 51 Mean corrected 3161.950 50 Raw R-square (1-Residual/Total) = 0.996 Mean corrected R-square (1-Residual/Corrected) = 0.618 R(observed vs predicted) square = 0.620 Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B0 67.522 5.453 12.382 56.538 78.505 B1 0.002 0.001 2.519 0.000 0.004 B2 -0.416 0.062 -6.705 -0.541 -0.291 B3 -0.486 0.135 -3.604 -0.757 -0.214 B4 -0.002 0.001 -1.066 -0.004 0.001 B5 -0.118 0.047 -2.520 -0.211 -0.024
5. Robust regression - HAMPEL method
robust hampel=1.7,3.4,8.5 estimate/start Iteration No. Loss B0 B1 B2 B3 B4 B5 0 .125607D+09 .101000D+00 .102000D+00 .103000D+00 .104000D+00 .105000D+00 .106000D+00 1 .914998D+08 .102010D+02 .873183D-01 .272414D-01 .290928D-01 .894241D-01 .746028D-01 2 .118864D+04 .690420D+02 .178501D-02-.414116D+00-.407304D+00-.131865D-02 -.108312D+00 19 .983334D+03 .675216D+02 .203140D-02-.416370D+00-.485886D+00-.151955D-02 -.117518D+00 <lines omitted to save space> 20 .983335D+03 .675216D+02 .203139D-02-.416370D+00-.485886D+00-.151955D-02 -.117517D+00 21 .983336D+03 .675217D+02 .203139D-02-.416370D+00-.485885D+00-.151955D-02 -.117517D+00 HAMPEL robust regression: 51 cases have positive psi-weights The average psi-weight is 0.94287 Dependent variable is GRAD Source Sum-of-Squares df Mean-Square Regression 277455.501 6 46242.583 Residual 1206.499 45 26.811 Total 278662.000 51 Mean corrected 3161.950 50 Raw R-square (1-Residual/Total) = 0.996 Mean corrected R-square (1-Residual/Corrected) = 0.618 R(observed vs predicted) square = 0.620 Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B0 67.522 5.453 12.382 56.538 78.505 B1 0.002 0.001 2.519 0.000 0.004 B2 -0.416 0.062 -6.705 -0.541 -0.291 B3 -0.486 0.135 -3.604 -0.757 -0.214 B4 -0.002 0.001 -1.066 -0.004 0.001 B5 -0.118 0.047 -2.520 -0.211 -0.024
6. Robust regression - BISQUARE method
robust bisquare=7 estimate/start Iteration No. Loss B0 B1 B2 B3 B4 B5 0 .125607D+09 .101000D+00 .102000D+00 .103000D+00 .104000D+00 .105000D+00 .106000D+00 1 .883111D+08 .102010D+02 .873183D-01 .272414D-01 .290928D-01 .894241D-01 .746028D-01 2 .118871D+04 .689837D+02 .178594D-02-.415258D+00-.405827D+00-.128777D-02 -.109111D+00 3 .103273D+04 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108313D+00 4 .102758D+04 .685665D+02 .186363D-02-.412174D+00-.418392D+00-.137597D-02 -.113036D+00 <lines omitted to save space> 13 .101944D+04 .683667D+02 .189238D-02-.412276D+00-.426970D+00-.139400D-02 -.114079D+00 14 .101943D+04 .683666D+02 .189239D-02-.412277D+00-.426974D+00-.139401D-02 -.114079D+00 15 .101943D+04 .683666D+02 .189239D-02-.412277D+00-.426976D+00-.139401D-02 -.114079D+00 16 .101943D+04 .683666D+02 .189239D-02-.412277D+00-.426977D+00-.139401D-02 -.114079D+00 BISQUARE robust regression: 51 cases have positive psi-weights The average psi-weight is 0.93080 Dependent variable is GRAD Source Sum-of-Squares df Mean-Square Regression 277471.483 6 46245.247 Residual 1190.517 45 26.456 Total 278662.000 51 Mean corrected 3161.950 50 Raw R-square (1-Residual/Total) = 0.996 Mean corrected R-square (1-Residual/Corrected) = 0.623 R(observed vs predicted) square = 0.624 Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B0 68.367 5.380 12.709 57.532 79.201 B1 0.002 0.001 2.353 0.000 0.004 B2 -0.412 0.062 -6.666 -0.537 -0.288 B3 -0.427 0.123 -3.462 -0.675 -0.179 B4 -0.001 0.001 -0.985 -0.004 0.001 B5 -0.114 0.046 -2.454 -0.208 -0.020
7. Robust regression - TRIM method
robust trim=0.05 estimate/start Iteration No. Loss B0 B1 B2 B3 B4 B5 0 .125607D+09 .101000D+00 .102000D+00 .103000D+00 .104000D+00 .105000D+00 .106000D+00 1 .914998D+08 .102010D+02 .873183D-01 .272414D-01 .290928D-01 .894241D-01 .746028D-01 2 .118864D+04 .690420D+02 .178501D-02-.414116D+00-.407304D+00-.131865D-02 -.108312D+00 3 .102403D+04 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108312D+00 4 .100909D+04 .705726D+02 .160999D-02-.407914D+00-.367187D+00-.115956D-02 -.116915D+00 5 .100909D+04 .705726D+02 .160999D-02-.407914D+00-.367187D+00-.115956D-02 -.116915D+00 6 .100909D+04 .705726D+02 .160999D-02-.407914D+00-.367187D+00-.115956D-02 -.116915D+00 TRIM robust regression: 49 cases have positive psi-weights The average psi-weight is 1.00000 Dependent variable is GRAD Zero weights, missing data or estimates reduced degrees of freedom Source Sum-of-Squares df Mean-Square Regression 277457.056 6 46242.843 Residual 1204.944 43 28.022 Total 278662.000 49 Mean corrected 3161.950 48 Raw R-square (1-Residual/Total) = 0.996 Mean corrected R-square (1-Residual/Corrected) = 0.619 R(observed vs predicted) square = 0.621 Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B0 70.573 5.568 12.676 59.345 81.801 B1 0.002 0.001 1.929 -0.000 0.003 B2 -0.408 0.065 -6.256 -0.539 -0.276 B3 -0.367 0.121 -3.022 -0.612 -0.122 B4 -0.001 0.001 -0.798 -0.004 0.002 B5 -0.117 0.048 -2.444 -0.213 -0.020
8. Robust regression - T method (weights based on Student t distribution)
robust t=5 estimate/start output * Iteration No. Loss B0 B1 B2 B3 B4 B5 0 .125607D+09 .101000D+00 .102000D+00 .103000D+00 .104000D+00 .105000D+00 .106000D+00 1 .782650D+08 .102010D+02 .873183D-01 .272414D-01 .290928D-01 .894241D-01 .746028D-01 2 .118960D+04 .688358D+02 .178572D-02-.418723D+00-.401193D+00-.119928D-02 -.111142D+00 3 .737287D+03 .690420D+02 .178500D-02-.414116D+00-.407305D+00-.131866D-02 -.108313D+00 <lines omitted to save space> 22 .705950D+03 .660164D+02 .228639D-02-.410960D+00-.514123D+00-.177926D-02 -.128484D+00 23 .705960D+03 .660166D+02 .228634D-02-.410960D+00-.514115D+00-.177923D-02 -.128483D+00 24 .705967D+03 .660168D+02 .228632D-02-.410960D+00-.514108D+00-.177920D-02 -.128481D+00 25 .705971D+03 .660169D+02 .228630D-02-.410960D+00-.514103D+00-.177918D-02 -.128481D+00 T robust regression: 51 cases have positive psi-weights The average psi-weight is 0.79320 Dependent variable is GRAD Source Sum-of-Squares df Mean-Square Regression 277432.307 6 46238.718 Residual 1229.693 45 27.327 Total 278662.000 51 Mean corrected 3161.950 50 Raw R-square (1-Residual/Total) = 0.996 Mean corrected R-square (1-Residual/Corrected) = 0.611 R(observed vs predicted) square = 0.615 Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B0 66.017 5.507 11.988 54.926 77.108 B1 0.002 0.001 2.769 0.001 0.004 B2 -0.411 0.061 -6.730 -0.534 -0.288 B3 -0.514 0.145 -3.539 -0.807 -0.222 B4 -0.002 0.001 -1.198 -0.005 0.001 B5 -0.128 0.049 -2.634 -0.227 -0.030
Last modified 7 April 1999