b1 = (S(Xi - X.)(Yi - Y.))/S(Xi - X.)2
b0 = Y.- b1X.Just like other statistics (such as the sample mean or variance) the estimates b1 and b0 are functions of the observed values Yi, which are functions of the random errors ei. Thus b1 and b0 are themselves random variables and b1 and b0 each has a probability distribution, called the sampling distribution. The sampling distribution of b1 (respectively b0) refers to "the different values of b1 (b0) that would be obtained with repeated sampling when the levels of the independent variables X are held constant from sample to sample" (ALSM5e p. 41; ALSM4e p. 45). Statistical inference concerning populations parameters such as b1 and b0 consists in testing hypotheses and constructing confidence intervals for that parameter. Inference concerning a parameter is based on the sampling distribution of that parameter.
For (simple or multiple) linear regression models statistical inference is commonly carried out for
![]() |
STEP 1: Sampling Distribution of b1Assuming that errors are normally distributedb1 ~ N ( b1 , s2{b1}) (~ means is distributed as)where s2 denotes the variance of ei and s2{b1} = s2 / S(Xi - X.)2. In words: b1 is normally distributed with mean E{b1} = b1 and variance s2{b1} = s2 / S(Xi - X.)2 (figure at left). The sampling distribution of b1 is normal because b1 is a linear combination of the observations Yi which are independent normally distributed random variables (ALSM5e, ALSM5e: Appendix A, Theorem A.40). Note the denominator S(Xi - X.)2 in the expression for s2{b1}: the greater the variance of X, the smaller the variance s2{b1} of the sampling distribution, and the more precise the estimation of b1. (In experimental research, where the experimenter sets the values of X, one may be able to space the X values optimally to achieve the smallest s2{b1} possible.) |
![]() |
STEP 2: Distribution of (b1-b1)/s{b1}(b1-b1)/s{b1} ~ N (0, 1)In words: the deviation of b1 from its mean b1 , divided by the standard error s{b1}, is normally distributed with mean 0 and standard deviation 1 (i.e., according to a standard normal distribution). This follows from STEP 1 by applying to b1 by the equivalent of the z transformation z = (X - m)/s. s{b1} denotes the standard deviation of the sampling distribution of b1. s{b1} is called the standard error of b1. |
![]() blue line) compared to distribution of (b1- b1)/ s{b1} (red dashed line) |
STEP 3: Distribution of (b1 - b1)/ s{b1}In practice s{b1} is not known (because s is not known) so one replaces the standard error s{b1} with the sample estimate s{b1} given bys{b1} = [MSE / S(Xi - X.)2]1/2where MSE is the mean squares error. When s{b1} is replaced by the sample estimate s{b1}, the sampling distribution is no longer normal. It becomes a Student t distribution with (n-2) df: t* = (b1- b1)/ s{b1} ~ t (n - 2)The distribution is no longer normal because s{b1} (being a sample estimate) is a random variable rather than a fixed constant. This "extra randomness" produces the thicker tails of the t distribution compared to the normal. The (n - 2) df correspond to the (n - 2) df of MSE (see Module 1).
|
Inference on the regression coefficients uses the following formulas.
Slope b1 | |
Estimated standard error of b1 | s{b1} = (MSE/S(Xi - X.)2)1/2 |
Estimated sampling distribution of b1 | (b1 - b1)/s{b1} ~ t(n - 2) |
Confidence limits for CI on b1 | b1 +/- t(1 - a/2; n - 2)s{b1} |
Test statistic for H0: b1 = b10 | t* = (b1 - b10)/s{b1} (1) |
Intercept b0 | |
Estimated standard error of b0 | s{b0} = (MSE(1/n+X.2/S(Xi - X.)2)1/2 |
Estimated sampling distribution of b0 | (b0 - b0)/s{b0} ~ t(n - 2) |
Confidence limits for b1 | b0 +/- t(1 - a/2; n - 2)s{b0} |
Test statistic for H0: b0 = b00 | t* = (b0 - b00)/s{b0} (1) |
Notes: (1)
|
The estimated standard error s{b1} is typically provided in the standard regression printout (labeled Std Error in Table 2).
Table 2 repeats the regression printout used for illustration.
Table 2. Printout of Regression of % Clerks on Seasonality
Index
Adjusted squared multiple R: 0.687 Standard error of estimate: 2.103
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT
14.631 1.700
0.000 .
8.607 0.000
SEASON
-0.169 0.039
-0.852 1.000 -4.311
0.004
Analysis of Variance
Source
Sum-of-Squares df Mean-Square
F-ratio P
Regression
82.199 1 82.199
18.583 0.004
Residual
30.963 7
4.423
b1 +/- t(1 - a/2; n - 2)s{b1}where t(1 - a/2; n - 2) refers to the (1 - a/2)100th percentile of the Student t distribution with (n-2) df.
b1 = -.169 (from regression printout under Coefficient)Therefore the 95% CI for b1 is
s{b1} = .039 (from regression printout under Std Error)
choose a = .05; then t(1-a/2; n-2) = t(.975; 7) = 2.365 (from statistical program or from table)
L = -.169 - (2.365)(.039) = -0.261 (lower bound of CI)One can say that with 95% confidence
U = -.169 + (2.365)(.039) = -0.077 (upper bound of CI)
-0.261 <= b1 <= -0.077
t* = (b1 - 0)/s{b1} = b1/s{b1}(provided in the regression printout under t).
Choose a significance level a = .05.
Using the P-value approach, the test statistic is
t* = (-.169)/(.039) = -4.311 (also from regression printout under t)Find the 2-tailed P-value
P{|t(n - 2)| > |t*| = 4.311 = .004 (from regression printout under P(2 Tail)).Since P-value = .004 < a = .05, one concludes H1, that "there is a significant linear association between % clerks and seasonality", or "the coefficient of seasonality is significant at the .05 level". (One would actually say "at the .01 level" in this case, choosing the lowest "round" significance level greater than the P-value .004.)
Using the decision theory method the decision rule is
if |t*| <= t(1-a/2; n-2), conclude H0With a = .05, t(1-a/2; n-2) is
if |t*| > t(1-a/2; n-2), conclude H1
t (0.975; 7) = 2.365Since |t*| = 4.311 > 2.365 conclude H1 (b1<> 0) at the .05 level with this method also.
P(t(n - 2) < t*) = (.004)/2 = .002Since P-value = .002 < .05 = a, conclude H1: b1 < 0.
Using the decision theory method the decision rule is
if |t*| <= t(1-a; n-2), conclude H0In the example
if |t*| > t(1-a; n-2), conclude H1
t(1-a; n-2) = t(0.95; 7) = 1.895Since |-4.311| > 1.895, conclude H1: b1 < 0 by this method also.
b0 = Y. - b1X.The estimated standard error of b0 is
s{b0} = (MSE(1/n+X.2/S(Xi - X.)2)1/2The standardized statistic
t* = (b0 - b0)/s{b0} is distributed as t(n - 2)(See ALSM5e pp. 48-49; ALSM4e pp. <>)
Point estimator of E{Yh} | ^Yh = b0 + b1Xh |
Estimated standard error of ^Yh | s{^Yh} = (MSE ((1/n) + (Xh - X.)2 / S(Xi - X.)2))1/2 |
Estimated distribution of ^Yh | (^Yh - E{Yh}) / s{^Yh} ~ t (n-2) |
Confidence limits for CI on E{Yh} | ^Yh +/- t(1- a/2; n-2)s{^Yh} |
Test statistic | Not often used |
Notes:
|
L = 8.716 - (2.365)(0.7233628) = 7.005 (lower bound)so that with 95% confidence
U = 8.716 + (2.365)(0.7233628) = 10.427 (upper bound)
7.005 <= E{Yh|Xh = 35} <= 10.427
^Y = -81432.9464 + 158.9502X n=522 R2 = .6715The tax collector wants to estimate of the average sale price (i.e., E{Yh}) of homes with 2,500 square feet of finished area and obtain a 95% CI for that average sale price. To calculate both ^Yh and s{^Yh} quickly, the tax collector adds a 523d "dummy" case to the data set, with only the value of 2500 for X and missing value for Y and re-runs the regression. Then he obtains ^Yh and s{^Yh} for all observations, including the dummy one (STATA predict yhat, xb; predict seyhat, <>; SYSTAT save resid with variables ESTIMATE and SEPRED).
^Yh = 315942.6306Thus using the formula above the 95% CI for the average sale price of homes with 2,500 square feet of finished space is
s{^Yh} = 3654.4390
L = 315942.6306 - (1.9645365)(3654.4390) = 308763.35 (lower bound)Note that this is a relatively narrow interval, indicating that the average sale price of 2500 square feet homes can be estimated quite precisely.
U = 315942.6306 + (1.9645365)(3654.4390) = 323121.91 (upper bound)
^Yh +/- Ws{^Yh}where
W = [2F(1-?; 2, n-2)]1/2(See ALSM5e pp. 61-63.)
s{^Yh} = (MSE ((1/n) + (Xh - X.)2 / S(Xi - X.)2))1/2(^Yh - E{Yh})/s{^Yh} is distributed as a t distribution with (n-2) df. (See ALSM5e pp. 52-54.)
Examples:
s2{Yh(new)} = s2 + s2{^Yh}Thus the sampling disribution of Yh(new) is described as in the following table.
Point estimator of Yh(new) | ^Yh = b0 + b1Xh |
Estimated standard error s{Yh(new)} (s{pred}) | s{pred} = (MSE (1 + (1/n) + (Xh - X.)2 / S(Xi - X.)2 ))1/2 |
Estimated distribution of Yh(new) | (Yh(new) - E{Yh}) / s{pred} ~ t (n-2) |
Confidence limits for CI on Yh(new) | ^Yh +/- t(1- a/2; n-2)s{pred} |
Test statistic | Not often used |
^Y = -81432.9464 + 158.9502X n=522 R2 = .6715Imagine this time that that someone plans to sell their home in this Midwestern town. They ask a real-estate agent to estimate the price the home might fetch on the market, given that it has 2500 square feet of finishes space, and to give them a 95% CI for that price. The real estate agent uses the same trick as the tax collector, of adding a "dummy" 523d case with X = 2500 to obtain both ^Yh and s{^Yh}; she also obtains MSE from the regression printout, so she has the following information
^Yh = 315942.6306From this information she calculates s2{Yh(new)} as
s{^Yh} = 3654.4390
MSE = 6.26043E+09
s2{Yh(new)} = s2 + s2{^Yh} = MSE + (SEPRED)2 = 6.26043E+09 + (3654.4390)2 = 6273784924.40Hence s{Yh(new)} = (6273784924.40)1/2 = 79207.23.
L = 315942.6306 - (1.9645365)(79207.23) = 160337.14 (lower bound)Note that the 95% CI for Yh(new) (160K to 472K) is considerably wider than the CI for the mean response E{Yh} (309K to 323K). (Q - Why is that?)
U = 315942.6306 + (1.9645365)(79207.23) = 471548.12 (upper bound)
E{MSE} = s2(ALSM5e pp. 68-71; ALSM4e pp. 75-76)
E{MSR} = s2 + b12 S(Xi - X.)2
Note that if b1=0, E{MSR}
= E{MSE} = s2.
Therefore, an alternative to the 2-sided hypothesis test for b1
H0: b1 = 0is the F test
H1: b1 <> 0
F* = MSR/MSEIf b1 = 0, E{F*} = s2/s2 = 1.
F* = MSR/MSE ~ (c2(1)/1 ) / (c2(n-2)/(n-2)) = F(1; n-2)(ALSM5e p. 70; ALSM4e pp. 76-77)
F* = MSR/MSE = (82.199)/(4.423) = 18.583The P-value is P{F(1;7) > 18.583} = 0.004.
if F* <= F(1-a ; 1, n-2), conclude H0F(1-a ; 1, n-2) = F(0.95;1,7) = 5.59.
if F* > F(1-a ; 1, n-2), conclude H1
(t*)2 = (-4.311)2 = 18.583 = F*This is no longer true in the multiple regression model.
L (m = 230) = (0.053991)(0.000873)(0.005953) = (.279)10-6
Similarly, if m = 259, then
L (m = 259) = (0.333225)(0.398942)(0.266085) = 0.035373Thus, the likelihood that m = 259 is greater than the likelihood of m = 230.
|
|
|
|
|
|
|
|
|
|
|
|
NOTE: maximum likelihood derivations often use the logarithm (base e) of L rather than L itself. See NKNW Equation 1.29 p. 35.
>USE "Z:\mydocs\S208\craft.syd"
SYSTAT Rectangular file
Z:\mydocs\S208\craft.syd,
created Thu Nov 07,
2002 at 08:24:57, contains variables:
TYPE$
SIZE SEASON
CLERKS
>regress
>rem can also use
command MGLH instead of REGRESS
>model clerks=constant+season
>estimate
Dep Var: CLERKS N: 9 Multiple R: 0.852 Squared multiple R: 0.726
Adjusted squared multiple R: 0.687 Standard error of estimate: 2.103
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT
14.631 1.700
0.000 .
8.607 0.000
SEASON
-0.169 0.039
-0.852 1.000 -4.311
0.004
Analysis of Variance
Source
Sum-of-Squares df Mean-Square
F-ratio P
Regression
82.199 1 82.199
18.583 0.004
Residual
30.963 7
4.423
-------------------------------------------------------------------------------
Durbin-Watson D Statistic
1.995
First Order Autocorrelation
-0.107
>rem show a scatterplot
with the regression line
>rem and the Working-Hotelling
95% confidence band
>plot clerks*season/stick=out
smooth=linear short confi=0.95
>rem now add a case
with Xh=35, missing Y, using menus
>rem redo the estimation
and save residuals to get SEPRED
>model clerks=constant+season
>save clerkres
>estimate
1 case(s) deleted due to missing data.
<...repeat output deleted...>
Residuals have been saved.
>use clerkres
SYSTAT Rectangular file
Z:\mydocs\s208\clerkres.SYD,
created Fri Nov 15,
2002 at 10:20:07, contains variables:
ESTIMATE
RESIDUAL LEVERAGE COOK
STUDENT SEPRED
>list estimate residual
sepred
Case number
ESTIMATE RESIDUAL
SEPRED
1 2.311
2.489 1.485
2 7.374
0.226 0.714
3 9.737
1.963 0.814
4 6.699
-3.399 0.759
5 7.374
-2.174 0.714
6 9.737
1.963 0.814
7 11.256
-0.356 1.038
8 12.437
0.063 1.254
9 4.674
-0.774 1.035
10 8.724
. 0.723
>rem SEPRED = 0.723 for SEASON=35 is same as calculated by hand earlier
Source |
SS df
MS
Number of obs = 16
---------+------------------------------
F( 1, 14) = 10.28
Model |
2260.31012 1 2260.31012
Prob > F = 0.0063
Residual | 3078.52104
14 219.89436
R-squared = 0.4234
---------+------------------------------
Adj R-squared = 0.3822
Total |
5338.83116 15 355.922077
Root MSE = 14.829
------------------------------------------------------------------------------
density |
Coef. Std. Err. t
P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
lntlabf |
-9.889076 3.084457 -3.206
0.006 -16.50458 -3.273573
_cons |
135.7141 28.3748 4.783
0.000 74.85625
196.572
------------------------------------------------------------------------------
. predict yhat
[option xb assumed;
fitted values]
. label variable
yhat "predicted mean density"
. predict e, resid
. label variable
e "residual"
. generate yhat0=_b[_cons]
+ _b[lntlabf]*lntlabf
. generate e0=density-yhat0
. predict new, stdp
. graph density yhat
lntlabf, connect(.s) symbol (Oi) ylabel xlabel
. graph e yhat, twoway box yline(0) ylabel xlabel
. display invttail(14,.025)
2.1447867
[If you’d rather use
the F-value here, use the command invFtail (dfn,dfd, p)]
. generate low1=yhat-2.1448*new
. generate high1=yhat+2.1448*new
. graph density yhat
low1 high1 lntlabf, connect(.sss) symbol(Oiii) ylabel xlabel
Q - Do you know the origin of the name "Student"?