soci209 - module 1

SOCI709 (formerly 209) - Module 1 - SIMPLE LINEAR REGRESSION

1. Introduction

Regression analysis is "a statistical methodology that utilizes the relation between two or more quantitative variables so that one variable [the dependent or response variable] can be predicted from the other, or others [the independent or predictor variables]." (ALSM5e p. 3)
Examples:

can one can predict the GPA of a student in college if one knows the SAT score?
what is the relationship between the area and the sale price of a house?
does salt intake increase blood pressure? By how much?

2. Functional & Statistical Relations

A functional relation between a dependent variable y and an independent variable X is an exact relation: the value of y is uniquely determined when the value of X is specified.
Examples:

the conversion of temperature from Celsius to Fahrenheit degrees is a linear functional relation F = 32 + (9/5)C. The oven temperature for roasting chicken recommended by Larousse Gastronomique (French edition, 1984) is 240 C, which corresponds to 32 + (1.8)(240) = 464 F. The English edition (1988) backs down (chickens out?), recommending 200 C (400 F) for poulet roti.

Exhibit: Linear functional relation - Celsius to Fahrenheit temperature conversion [m1001.gif]

Body Mass Index (BMI) is calculated as (weight in kilograms) divided by (height in meters) squared. For a constant weight (here 70 kilos) the relation between BMI and height is an example of a non-linear (curvilinear) functional relation.

Exhibit: Curvilinear functional relation - BMI by height for constant weight (70 kg) [m15004.jpg]

A statistical relation between a dependent variable y and an independent variable x is an inexact relation; the value of y is not uniquely determined when the value of X is specified.

(NOTE: the solid line in previous exhibit is estimated by the LOWESS algorithm, which is one form of nonparametric regression which we will look at later.)

The line or curve of statistical relationship refers to the tendency of y to vary systematically as a function of x.

3. Simple Linear Regression Model

1. Population Model

The idea of a statistical relationship can be formalized as in the following two exhibits. The first exhibit depicts the more general situation in which the regression function, or line tracing the means of Y as function of X, is not necessarily linear (it is said to be curvilinear).

Exhibit: Pictorial representation of general regression model (ALSM5e F1.4 p. 7) [m1004.gif]

The second exhibit shows a situation where the regression function is linear.

Exhibit: Pictorial representation of simple linear regression model (ALSM5e F1.6 p. 12) [m1005.gif]

The regression model is the formalization of the idea of a statistical relation; it translates the idea into two components:

the regression function of Y on X represents the relationship of the mean of the probability distribution of Y as a function of X; it captures the notion that Y varies systematically as a function of X (in general the regression function need not be linear)
the error term represents the deviation of Y from the regression function; there is a probability distribution of Y for each level of X that represents the scatter of points around the main trend

When the regression function is linear, the simple linear regression model is written

Y_i = b₀ + b₁X_i+ e_i i = 1, 2, ..., n (1)

where

y_i is the value of the dependent variable for the ith observation
x_iis the value of the independent (predictor) variable for the ith element, and is assumed to be a known constant
b₀ and b₁ are parameters (or coefficients)
e_i is a random error term

Greek letters b₀, b₁ and e are used for the regression coefficients and the error term to indicate that the regression model pertainins to the population from which the sample is drawn; these parameters are not directly known and must be estimated from sample data.
The two components of a statistical relation are translated in the simple regression model as

E{Y} = b₀ + b₁X is the regression function representing the systematic part of the model; E{Y} represents the expectation of Y for a given value of the independent variable X
e_i is an error term representing the deviation of Y_i from the regression function

Model (1) is called

simple [only 1 independent variable]
linear in the parameters
linear in the predictor variable [x appears only as first power]

2. Assumptions on the Error Term

There are two nested sets of assumptions concerning the distribution of the error term; the second set adds the assumption of normality of the errors.

1. Distribution of Errors Unspecified

e_i is a random error term

with mean E{e_i} = 0 (the expected value of each error term e_i is 0)
with variance s²{e_i} = s² (the variance of each e_i is the same at all levels of x, and equal to s², which denotes a constant number)
such that e_i and e_j are uncorrelated (their covariance is zero, i.e., s{e_i, e_j} = 0 for all i,j such that i <> j)

2. Distribution of Errors Normal

To the first set add the assumption that distribution of e_i is normal. Then the entire set of assumptions (including the normality one) can be expressed simply as

e_i is independent ~N(0,s²), i.e. e_i is normally distributed with mean 0 and variance s² (which implies that e_i and e_j are uncorrelated)

Q - Is it reasonable to assume that error terms are normally distributed?
A - To the extent that the error term e_i represents the sum of the effects of factors that are not explicitly included as independent variables in the model, and that these effects are additive and relatively independent, e_i will tend to behave as predicted by the Central Limit Theorem (CLT), i.e. be normally distributed when the number of factors is large.

Assumption of normality of the errors is necessary to theoretically justify statistical inference (see Module 2), especially in small samples. But most properties of least squares estimators of model parameters do not depend on the normality assumption.

3. Components of Simple Regression Model

The regression function or response function represents the systematic part of the model; it relates the expected value (or mean) E{Y} of Y to the value of the independent variable X. The graph of the regression function is called the regression line. In the simple linear regression model the regression function for any value of X is

E{Y} = E{b₀ + b₁X_i + e_i} = b₀ + b₁X_i + E{e_i} = b₀ + b₁X

since by assumption E{e_i} = 0.

The parameters b₀ and b₁ are called regression coefficients or regression parameters.
The meaning of each coefficient is as follows

the slope b₁ indicates the change in E{Y} per unit increase in X
the intercept b₀ corresponds to the value of E{Y}) at X=0

With respect to the regression model the population variance of Y_i is

s²{Y_i} = s²{b₀ + b₁X_i + e_i} = s²{e_i} = s²

where s² is the variance of e_i. This is because e_i is the only random variable in the expression, and the variance of the error e_i is assumed to be the same and equal to s² regardless of the value of X. (This definition of the population variance of Y as s^2
,i.e., the variance of e_i, may be confusing as it does not corresponds directly to the sample variance of Y, S_y² . It is helpful to consider that s²{Y_i} actually means "variance of y around the regression line", so an equivalent expression for s²{Y_i} is s²{Y|X}, or "variance of Y (around the regression line) given a certain value of X".)

The quantities b₀ and b₁ and s² are the parameters of the regression model; they have to be estimated from the data. (In reality one estimates b₀ and b₁ and then the estimate of s² is obtained as a by-product.)

4. An Example of Simple Linear Regression

Table 1 shows data on 9 sectors of the construction industry in Ohio in 19?? (Stinchcombe 19??). The dependent variable y is the percentage of clerks (white-collar workers) in the labor force of a sector, a measure of bureaucratization. The independent variable X is a measure of seasonality (seasonal variation in sector activity). Data for the nine sectors are shown in Table 1. The index i (i=1,...,9) refers to each sector. The author of the study theorized that more seasonal sectors should be less bureaucratic , i.e., have lower percentages of clerks in their labor forces.

**Table 1 . Calculations for OLS Estimates b₀ and b₁**
i	Sector	X_i	Y_i	(X_i - X_.)	(Y_i - Y_.)	(X_i - X_.)²	(Y_i - Y_.)²	(X_i - X_.) x (Y_i - Y_.)
1	STRSEW	73	4.8	33.444	-3.156	1118.531	9.958	-105.536
2	SAND	43	7.6	3.444	-0.356	11.864	0.126	-1.225
3	VENT	29	11.7	-10.556	3.744	111.420	14.021	-39.525
4	BRICK	47	3.3	7.444	-4.656	55.420	21.674	-34.658
5	GENCON	43	5.2	3.444	-2.756	11.864	7.593	-9.491
6	SHEET	29	11.7	-10.556	3.744	111.420	14.021	-39.525
7	PLUMB	20	10.9	-19.556	2.944	382.420	8.670	-57.580
8	ELEC	13	12.5	-26.556	4.544	705.198	20.652	-120.680
9	PAINT	59	3.9	19.444	-4.056	378.086	16.448	-78.858
Total		356	71.6	-0.000	0.000	2886.222	113.162	-487.078
Mean		39.556	7.956

The next exhibit shows a scatterplot in which each point corresponds to the pair of values (yi, xi), with y measured on the vertical axis and x measured on the horizontal axis.

Exhibit: Graph of simple linear regression of % clerks (Y) on seasonality (X) [m15001.jpg]

The solid line is the estimated regression line. It is represented by the equation

^y = b₀ + b₁x

where ^y (called "y hat") represents the vertical coordinate of a point on the regression line corresponding to horizontal coordinate x. The coefficient b₀ and b₁ are calculated by the method of least squares (explained below). The model implies that for each observation in the sample the vertical coordinate y_i of a point is given by the formula

y_i = ^y_i + e_i or

y_i = b₀ + b₁x_i + e_i (i = 1, ..., 9)

where e_i corresponds to the vertical deviation between the observed value y_i and ^y_i (called the fitted value or predictor of y) implied by the regression line. e_i is called the residual for observation i.

Exhibit: Regression of % clerks on seasonality showing the meanings of b₀ and b₁ [m15008.gif]

b₀ and b₁ are the (estimated) regression coefficients. Their meaning is the same as that of the population parameters, i.e.

the slope b₁ measures the predicted change in y per unit increase in x
the intercept b₀ corresponds to the predicted value of ^y for x=0, i.e., b₀ is the value of y at the point where the regression line crosses the vertical line at x=0; b₀ may not be substantively meaningful if the scope of the model does not include x=0

For the construction industry data the slope b₁= -.169 in the regression of Y (%clerks) on X (seasonality index) means that for each increase of one unit of seasonality, % clerks decreases by .169 of a percent. The intercept b₀ = 14.631 means that in a sector with no seasonality (x=0), % clerks would be 14.6 percent. (Q - Is this value of the intercept substantively meaningful here?)

Note that the simple regression model establishes an asymmetry between the dependent variable y and independent variable x, because deviations are measured along the dependent variable dimension (usually the vertical axis). In general a different regression line is obtained if one exchange y and x in their roles. The choice of one variable as dependent and the other as independent is a substantive choice. (Correlational models do not assume this asymmetry.)

4. Least Squares (OLS)

1. Estimation of b₀ and b₁

The coefficients of the regression line (regression coefficients) are originally unknown. They can be estimated from a sample containing n observations on y and x with the method of least squares. The method of least squares (or OLS for ordinary least squares) consists in finding values for b₀ and b₁ that minimize the sum (over all observations) of the squared vertical deviations e_i of the observed value Y_i from the predicted value ^y_i on the regression line. Mathematically, one wants to find the values of b₀ and b₁ that minimize the quantity Q defined as

Q = S_{i=1
to n} (y_i - b₀ - b₁x_i)²

The following exhibit shows the vertical deviations e_i that are squared and summed up to evaluate Q.

Exhibit: Vertical deviations e_i of Y_i from regression line involved in estimating b₀ and b₁ by OLS [m15010.gif]

To minimize Q one could: (1) use a "brute force" numerical search using a grid of values for b₀ and b₁ (this is actually what computers can do in situations), or (2) take advantage of the analytical solution originally discovered by French mathematician Legendre. Legendre discovered that the values b₀ and b₁ that minimize Q are given by the formulas

b₁ = (S(X_i- X_.)(Y_i- Y_.))/S(X_i- X_.)²

b₀ = Y_. - b₁X_.

called the normal equations. All the sums are over all observations (from i=1 to n). Table 1 shows how one can organize the calculations for the construction industry data. In the table units are sectors of the construction industry, y_i stands for % clerks, and x_i for the index of employment seasonality. One calculates b₁first, then b₀. Thus having calculated the sums of squares and cross-products one calculates b₀ and b₁ using the formulas above as

b₁ = (-487.078)/(2886.222) = -0.169
b₀ = (7.956) - (-0.169)(39.556) = 14.631

Note that the slope b₁ could also be calculated as the ratio of the same two numbers, each divided by n-1, i.e. as the sample covariance of x and y, denoted s_xy, divided by the variance of x, denoted s_x², as

b₁ = s_XY/s_X²
b₀ = Y_. - b₁X_.

In practice one uses a computer program to carry out the calculations. The next exhibit shows a typical simple regression output.

Exhibit: Estimated regression of % clerks (Y) on seasonality (X) [craft.htm]

2. Estimation of Other Aspects of the Model

Aspects of the population model that may be of substantive interest are shown in Table 2.

**Table 2. Aspects of Population Regression Model and Point Estimators**
Aspect of the Population Model	Symbolic Form	Estimate
Regression coefficients	b_{0 ,}b₁	b₀, b₁
Mean response (regression function for given value X_h of X, whether or not X_h is represented in the sample)	E{Y_h} = b₀+ b₁X_h	^Y_h = b₀ + b₁X_h
Estimate, predictor or fitted value of Y (for X_i in the sample)	E{Y_i} = b₀+ b₁X_i	^Y_i = b₀ + b₁X_i
Predicted value of Y for known value X_h of X (X_h not necessarily in the sample)	Y_h = b₀+ b₁X_h + e	^Y_h = b₀ + b₁X_h
Residual (X_i in the sample)	e_i	e_i = Y_i - ^Y_i
Residual variance (variance of e_i)	s²	MSE = SSE/(n-2)
Standard error of estimate (standard deviation of e_i)	s	MSE^1/2

MSE and SSE are explained later.
Examples from the construction industry study:

estimated regression function E{Y} is ^Y = 14.631 + (-.169)X
fitted value for observation 8 (ELEC) is ^Y₈ = 14.631 + (-.169)(13) = 12.437
estimated e_i for observation 8 is the residual e_i = 12.500 - 12.437 = .063
if a sector of the construction industry had a seasonality score of X_h = 35 (a value not found in the data set), the estimated mean response E{Y_h} would be ^Y_h = 14.631 - (0.169)(35) = 8.716
if a sector of the construction industry had a seasonality score of X_h = 35 and one wants to estimate (or predict) the actual value of Y_h for this sector (not the mean value of Y), one would also use ^Y_h = 14.631 - (0.169)(35) = 8.716; but the prediction of the individual value will have a much larger error than the estimate of the mean response E{Y_h} (see Module 2)

It is important to always distinguish between parameter and estimate. For example e_i is not the same as e_i, i.e.

e_i = Y_i - ^Y_i (where ^Y_i is a known fitted value)
e_i = Y_i - E{Y_i} (where E{Y_i} is the true expected value of Y, which is unknown)

5. (Optional) Derivation of Least Squares Formulas and Properties of LS Residuals

1. Derivation of Formulas for b₀ and b₁ (Uses Calculus)

The sum of squared deviations

Q = S_{i=1
to n} (Y_i - b₀ - b₁X_i)²

can be viewed as a function of two variables, b0 and b1. To find the values of b0 and b1 that minimize Q one differentiates the function in turn with respect to b0 and with respect to b1, obtaining

dQ/db₀ = -2S_{i=1 to
n} (Y_i - b₀ - b₁X_i)
dQ/db₁ = -2S_{i=1 to
n} X_i(Y_i - b₀ - b₁X_i)

The values b₀ and b₁ that minimize Q are found by setting the derivatives to zero, as

-2S_{i=1
to n} (Y_i - b₀ - b₁X_i) = 0
-2S_{i=1
to n} X_i(Y_i - b₀ - b₁X_i) = 0

and solving for b₀ and b₁. Solving is done by simplifying and expanding these equations and rearranging the terms to produce the normal equations

SY_i = nb₀ + b₁SX_i
SX_iY_i = b₀SX_i + b₁SX_i²

One can also derive the normal equations (although not demonstrate that their solution provides the values of b₀ and b₁ that minimize the sum of squared residuals) by multiplying through the equation Y=b₀+b₁X in turn by 1 and by X, and summing the products over all observations. This observation presages the multiple regression model seen later.

2. Properties of OLS Residuals

Properties of fitted values ^Y_i and residuals e_i:

Se_i = 0
Se_i² is minimum (by LS)
SX_ie_i = 0
S^Y_ie_i = 0

(See derivations in ALSM5e pp. <> )

6. Analysis of Variance (ANOVA)

Analysis of variance (ANOVA) in the regression context is easy and illuminating. One should note that ANOVA results for simple linear regression generalize immediately to multiple regression; the only difference is that in the multiple regression case ^Y_i will be calculated using several independent variables instead of one, and the degrees of freedom (see later) will be adjusted accordingly.

1. Predictor and Residual

As presented earlier the regression model implies that for each observation in the sample (see previous exhibit)

Y_i = ^Y_i + e_i i=1,...,n

where

^Y_i = b₀ + b₁X_i

is the predictor (or fitted value, or estimate) of Y_i given Y_i, and

e_i = Y_i - ^Y_i

is called the residual.
Note that Y_i , ^Y_i , and e_i are all measured on the same vertical axis.

2. Partitioning Sum of Squares Total

The principle of ANOVA is shown in the next figure.

Exhibit: Partitioning of total deviation of Y from the mean (NWW F18.10 p. 549) [m1012.gif]

From the figure, one can see that the total variation of Y_i from the sample mean of Y, (y_i - y_.), can be decomposed into two components:

y_i - y_.	=	^y_i - y_.	+	y_i - ^y_i
(total deviation of Y_i from mean)		(deviation of fitted value from mean)		(deviation of Y_ifrom fitted value)

Next take the sum of the squares of each deviation over all observations in the sample.

Sum of squares:	S(Y_i - Y_.)²	S(^Y_i - Y_.)²	S(Y_i - ^Y_i)²
Name:	SSTO for sum of squares total	SSR for sum of squares regression	SSE for sum of squares error
Meaning:	(total variation in Y)	(variation in Y accounted for by regression line)	(variation in Y around regression line)

SSE is also called residual sum of squares. The basic ANOVA result (or theorem) is that the sums of squared deviations stand in the same relation as the (unsquared) deviations, so that:

	S(Y_i - Y_.)²	=	S(^Y_i - Y_.)²	+	S(Y_i - ^Y_i)²
or	SSTO	=	SSR	+	SSE

This is actually a remarkable and non-obvious property that must be proven! (See optional proof in ALSM5e p. <>.) Table 3 shows the calculations of ANOVA sums of squares for the regression of % clerks (Y) on employment seasonality (X).

Table 3. Calculations for ANOVA Sums of Squares
i Sector X_i Y_i ^Y_i e_i (Y_i - Y_.)² (^Y_i - Y_.)² (Y_i - ^Y_i)²

1 STRSEW 73 4.8 2.311 2.489 9.958 31.856 6.193

2 SAND 43 7.6 7.374 0.226 0.126 0.338 0.051

3 VENT 29 11.7 9.737 1.963 14.021 3.173 3.854

4 BRICK 47 3.3 6.699 -3.399 21.674 1.578 11.555

5 GENCON 43 5.2 7.374 -2.174 7.593 0.338 4.727

6 SHEET 29 11.7 9.737 1.963 14.021 3.173 3.854

7 PLUMB 20 10.9 11.256 -0.356 8.670 10.891 0.127

8 ELEC 13 12.5 12.437 0.063 20.652 20.084 0.004

9 PAINT 59 3.9 4.674 -0.774 16.448 10.768 0.599

Total 356 71.6 113.162 82.199 30.963

Mean 39.556 7.956 = SSTO = SSR = SSE

b₁ = -0.169

b₀ = 14.631

**Table 3. Calculations for ANOVA Sums of Squares**
i	Sector	X_i	Y_i	^Y_i	e_i	(Y_i - Y_.)²	(^Y_i - Y_.)²	(Y_i - ^Y_i)²
1	STRSEW	73	4.8	2.311	2.489	9.958	31.856	6.193
2	SAND	43	7.6	7.374	0.226	0.126	0.338	0.051
3	VENT	29	11.7	9.737	1.963	14.021	3.173	3.854
4	BRICK	47	3.3	6.699	-3.399	21.674	1.578	11.555
5	GENCON	43	5.2	7.374	-2.174	7.593	0.338	4.727
6	SHEET	29	11.7	9.737	1.963	14.021	3.173	3.854
7	PLUMB	20	10.9	11.256	-0.356	8.670	10.891	0.127
8	ELEC	13	12.5	12.437	0.063	20.652	20.084	0.004
9	PAINT	59	3.9	4.674	-0.774	16.448	10.768	0.599
Total		356	71.6			113.162	82.199	30.963
Mean		39.556	7.956			= SSTO	= SSR	= SSE
b₁ =	-0.169
b₀ =	14.631

Alternative computational formulas are:

SSTO = SY_i² - nY_.²
SSR = b₁²S(X_i- X_.)²

3. Partitioning of Degrees of Freedom

To each sum of squares correspond degrees of freedom (df). Degrees of freedom are additive.

n - 1	=	1	+	(n - 2)
df for SSTO		df for SSR		df for SSE
(1 df lost estimating Y_.)		(1 df lost estimating b₁)		(2 df lost estimating b₀ and b₁)

4. Mean Squares

Mean squares are the sums of squares divided by their respective df. Mean squares are not additive.

SSTO/(n - 1)	SSR/1	SSE/(n - 2)
s²(Y) = mean squares y or sample variance of y	MSR = mean squares regression	MSE = mean squares error

Mean squares total is simply the sample variance of Y.
MSE is an estimate of the variance of the residuals s².

5. ANOVA Table

The ANOVA table summarizes all this information. Table 4a shows the ANOVA table in symbolic form.

**Table 4a. ANOVA Table in Symbolic Form**
Source	Sum of Squares	df	Mean Squares	F-ratio
Regression	SSR	1	MSR=SSR/1	F*=MSR/MSE
Error	SSE	n-2	MSE=SSE/(n-2)
Total	SSTO	n-1	s²{Y}=SSTO/(n-1)

Table 3b shows the ANOVA table for the regression of % clerks on employment seasonality.

Table 4b. ANOVA Table for Construction Industry Data
Source Sum of Squares df Mean Squares F-ratio

Regression 82.199 1 82.199 18.583

Error 30.963 7 4.423

Total 113.162 8 14.145

The F-ratio is calculated as the ratio F* = MSR/MSE (here 82.199/4.423 = 18.583); the meaning of F* is discussed in Module 2.

Q - What are the meanings of the quantities 4.423 and 14.145 in Table 3?

The ANOVA table is part of the usual regression output.

7. (Optional) Derivation of ANOVA Relation SSTO = SSR + SSE

See ALSM5e p. <>.

8. Measures of Association: Coefficients of Determination & Correlation

1. Coefficient of Determination (R-squared)

The following formulas are equivalent:

r² = (SSTO - SSE)/SSTO = SSR/SSTO = 1 - SSE/SSTO

where 0 <= r²<= 1

Example: In the regression of % clerks on seasonality the r² can be calculated equivalently as

(113.162 - 30.963)/113.162 = 82.199/113.162 = 1 - (30.963/113.162) = 0.726

Limiting cases:

if all observations on regression line, then SSE=0 and r² = 1
if slope b₁ = 0 then SSR = 0 and r² = 0

It is customary to interpret r² as the proportion of the variation in y "explained" by the regression model. But note that "variation" refers to the sum of squared deviations, so variation is measured in squared units, not original units of y, so interpretation of r² as explained variation is not entirely intuitive. An alternative substantive interpretation focuses on the standardized regression coefficient b₁* (explained later).

2. Coefficient of Correlation

The following formulas are equivalent derivations of the correlation coefficient r:

r = +/- (r²)^1/2
r = (S(X_i- X_.)(Y_i- Y_.))/((S(X_i- X_.)²S(Y_i- Y_.)²)^1/2
r = s_XY/(s_Xs_Y)

In the first formula the expression "+/-" means that r takes the sign of b₁. Thus r can be thought of as the positive square root of the r², associated with the same sign as the slope b₁. The third formula expressing r as the ratio of the covariance of X and Y divided by the product of the sample standard deviations of X and of Y is equivalent to the second one; since the numerator and denominator of the second formula are each divided by (n - 1), the divisor cancels out.
When r² is not 0 or 1,

|r| > r²

so that the absolute value of r is always larger than r². Thus r suggests a stronger relationship and thus has a greater psychological impact than that of r². An example is the regression of % clerks on seasonality where the r² is 0.726 and the correlation coefficient r is a more impressive -.852.

Examples of the degree of association corresponding to various values of r are shown in the next exhibit.

Exhibit: Degree of statistical relationship corresponding to several values of r (NWW F18.11 p. 556) [m1016.gif]

The correlation coefficient r alone can give a misleading idea of the nature of a statistical relationship, so it is important to always look at the scatterplot of the relationship.

Exhibit: Two misunderstandings concerning correlation coefficients (ALSM5e F2.9 p. 83) [m1017.gif]

3. Standardized Regression Coefficient

1. Calculation

The standardized regression coefficient b₁* is calculated as:

b₁* = b₁(s_X/s_Y)

i.e., b₁* is equal to b₁ multiplied by the standard deviation of X and divided by the standard deviation of Y. Thus in the simple linear regression model the standardized regression coefficient is the same as the correlation coefficient:

b₁* = (s_XY/s_X²)(s_X/s_Y) = s_XY/(s_Xs_Y) = r

but this is no longer true in the multiple regression model.
Conversely, recover the unstandardized regression coefficient b₁ from the standardized coefficient b₁* as

b₁ = b₁*(s_Y/s_X) ( = r(s_Y/s_X), in simple linear regression only)

where s_X and s_Y are the sample standard deviations of X and Y, respectively.

In the construction industry study the regression coefficient b₁ is -.169; the standard deviations of X and Y are 18.994 and 3.761, respectively. Thus the standardized coefficient of seasonality is -.169(18.944/3.761) = -.852. Thus an increase of in SD in X is associated with a decrease of .852 SD deviation of Y. (The standardized coefficient can also be automatically computed by the statistical program.)

Standardized coefficients are found in many statistical contexts, including multiple regression models and structural equations models. Calculating the standardized coefficient from the unstandardized coefficient, and vice-versa, is always done the same way. Suppose the unstandardized coefficient b of the regression of a variable Y on a variable X is represented as

X -- b --> Y

Then

to obtain standardized coefficient b* multiply b by the SD of the variable at the tail of the arrow (s_X) and divide by the SD of variable at the head of the arrow (s_Y)
to recover unstandardized coefficient b multiply b* by the SD of the variable at the head of the arrow (s_Y) and divide by the SD of the variable at the tail of the arrow (s_X)

It always works, even in the most complicated situations! (Loehlin 2004, p. ??).

2. Interpretation

b₁* measures the change in ^y, measured in units of standard deviation of y, associated with a one standard deviation increase in x.

Example: Brody (1992: 253) reports a correlation of .57 between 6th grade IQ test score and the number of years of education that a person obtained. One can interpret this correlation as a standardized regression coefficient: b* = .57 means that an individual with a 6th grade IQ score 1 SD above the mean would be expected to obtain .57 SD years of education above the mean.

The following picture shows an interpretation of b₁* as a shift along distribution of years of education caused by a positive shift of 1 SD of IQ from the mean.

Exhibit: Interpretation of b₁* as the shift along the scale of Y associetd with a shift in X

Standardized coefficients are especially useful in the multiple regression model, where they permit comparing the relative magnitudes of the coefficients of independent variables measured in different units (such as a variable measured in years, and another measured in thousands of dollars).

9. Data for Regression Analysis & Causal Interpretation

In current usage the term data is used either as a collective in the singular ("data is") or as the plural of datum ("data are").

Data for regression analysis comes from two kinds of sources.

Observational data are data obtained from nonexperimental studies so that values of X are not controlled. An example is life expectancy of countries as a function of literacy. Observational data do not directly offer strong support for causal interpretations.
Experimental data are measured from experimental units that are randomly assigned to treatments, i.e., different values of the independent variable(s) X set by the experimenter. Experimental data allow stronger causal inferences.

Compare the following two studies with respect to the strength of causal inference.

Example of observational data: Regression of female life expectancy on literacy rate for countries

Estimated regression: Y = 36.212 + .377X R²=.844 N=131

Example of experimental data: Shepard's experiment

"The data are from a perceptual experiment in which subjects viewed pairs of objects differing only by rotational angle. [...] The rt variable is reaction time (delay in saying "same" for a pair). [...] Shepard's remarkable discovery in this and other experiments was that the rotational angle is linearly related to reaction time. The February 19, 1971 cover of Science magazine displayed five of Shepard's computer-generated images under various rotations. This research has been replicated by psychologists and neuroscientists studying spatial processing in humans and other primates. Shepard received the National Medal of Science for this and other work in cognitive psychology." (Wilkinson 1999, p. 337.)

Estimated regression: RT = 1.916 + (.021)ANGLE R2 = .949 N=10

10. Historical Note

1. The Method of Least Squares

The method of least squares (French "moindres carrés") was developed by Adrien Legendre (1752-1833) in the context of reconciling astronomical observations; it was first published in 1805. Try your French on Legendre's appendix below (but be aware today's notation is different from Legendre's).

2. The Idea of Regression and Correlation

The idea of regression and correlation (and the term regression) is attributed to British polymath-genius Francis Galton (1822-1911), a cousin of Charles Darwin. The term regression originated in the regression of the height of sons regressed on the height of fathers; the height of sons exhibits a "regression to mediocrity" (i.e., toward the mean of the population).

11. Simple Linear Regression in Practice

1. SYSTAT Examples

>USE "Z:\mydocs\ys209\sexdipri.syd"
SYSTAT Rectangular file Z:\mydocs\ys209\sexdipri.syd,
created Tue Apr 23, 2002 at 08:35:28, contains variables:
SPECIES$ LENGTHDI WEIGHTDI MEANHARE MAXHARE

>rem relationship berween sex dimorphism (length ratio male to female) and
>rem mean harem size in primates, a measure of sexual competition among males
>rem ask me for the whole bizarre story
>regress
>model lengthdi=constant+meanhare
>estimate

Dep Var: LENGTHDI N: 22 Multiple R: 0.403 Squared multiple R: 0.162

Adjusted squared multiple R: 0.120 Standard error of estimate: 0.115

Effect         Coefficient    Std Error     Std Coef Tolerance     t   P(2 Tail)
CONSTANT             1.055        0.035        0.000      .      29.949    0.000
MEANHARE             0.014        0.007        0.403     1.000    1.967    0.063

Analysis of Variance
Source Sum-of-Squares df Mean-Square F-ratio P

Regression 0.051 1 0.051 3.870 0.063
Residual 0.266 20 0.013

-------------------------------------------------------------------------------
*** WARNING ***
Case 17 has large leverage (Leverage = 0.487)
Case 19 is an outlier (Studentized Residual = 3.821)

Durbin-Watson D Statistic 1.949
First Order Autocorrelation -0.026

>plot lengthdi*meanhare/stick=out smooth=linear short

>USE "Z:\mydocs\ys209\yule.syd"
SYSTAT Rectangular file Z:\mydocs\ys209\yule.syd,
created Wed Feb 17, 1999 at 09:34:32, contains variables:
UNION$ PAUP OUTRATIO PROPOLD POP

>model paup=constant+outratio
>estimate

Dep Var: PAUP N: 32 Multiple R: 0.594 Squared multiple R: 0.353
Adjusted squared multiple R: 0.331 Standard error of estimate: 13.483

Effect         Coefficient    Std Error     Std Coef Tolerance     t   P(2 Tail)
CONSTANT            31.089        5.324        0.000      .       5.840    0.000
OUTRATIO             0.765        0.189        0.594     1.000    4.045    0.000

Analysis of Variance
Source             Sum-of-Squares   df Mean-Square     F-ratio       P
Regression              2973.751     1     2973.751      16.359       0.000
Residual                5453.468    30      181.782

-------------------------------------------------------------------------------
*** WARNING ***
Case 15 has large leverage (Leverage = 0.328)

Durbin-Watson D Statistic 1.853
First Order Autocorrelation -0.018

>plot paup*outratio/stick=out smooth=linear short

2. STATA Examples

. set mem 32000
(32000k)

. use "Z:\mydocs\S208\gss98.dta", clear

. su income

Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
income | 2699 10.85624 2.429604 1 13

. su educ

Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
educ | 2820 13.25071 2.927512 0 20

. regress income educ

      Source |       SS       df       MS              Number of obs =    2688
-------------+------------------------------           F( 1, 2686) = 235.66
       Model | 1269.91329     1 1269.91329           Prob > F      = 0.0000
    Residual | 14474.0495 2686 5.38870049           R-squared     = 0.0807
-------------+------------------------------           Adj R-squared = 0.0803
       Total | 15743.9628 2687 5.85930882           Root MSE      = 2.3214

------------------------------------------------------------------------------
      income |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .2359977   .0153731    15.35   0.000     .2058533    .2661421
       _cons |    7.71967   .2094621    36.85   0.000     7.308947    8.130393
------------------------------------------------------------------------------

Last modified 9 Jan 2006