soci208 - appendix 1

Appendix 1 - Using Statistical Functions

1. PRINCIPLES OF USE OF STATISTICAL FUNCTIONS

Exhibit: Probability density, cumulative, and inverse cumulative functions [a1003.gif]

You can find calculators on the web at:

F function: http://dostat.stat.sc.edu/prototype/calculators/index.php3?dist=F
T function: http://dostat.stat.sc.edu/prototype/calculators/index.php3?dist=T

2. SYSTAT and STATA

1. SYSTAT

SYSTAT provides a comprehensive set of statistical functions that can be used with a calc statement in the interactive window.
Before using calc one should make sure that the default number of decimal places is set high enough (so the result is sufficiently precise) using a format statement.
Example: to set the decimals to 6 digits and then calculate the value of the standard normal corresponding to the 97.5th percentile, use the commands
>format 6
>calc zif(0.975)
1.959964

The number of decimals needs to be set only once before a series of calculations.

SYSTAT provides cumulative, density, inverse and random generator functions for the 13 distributions listed in the table below. The functions are named systematically with 3-letters names with suffix -CF, -DF, -IF, or -RN, accrding to the type of function.

Cumulative distribution functions (suffix -CF) compute the probability that a random value from the specified distribution falls below or is equal to the given value, i.e., F(X) = P(X <= x) where X is the RV and x is a specific value.
Density functions (suffix -DF) is the height at x of the ordinate under the density curve of the specified distribution, i.e., the p.d.f. of X f(x).
Inverse (cumulative) distribution functions (suffix -IF) take a specified alpha (a probability value between zero and one) and return the critical value below which lies that proportion of the specified distribution, i.e., F^-1(a) = {x| (P(X <= x) = a}.
Random generator functions (suffix -RN) generate pseudo-random numbers from the specified distribution.

Exhibit: Table of SYSTAT distribution functions (SYSTAT 6.0 DATA p. 143)

2. STATA

In STATA the statistical functions are used with the display comand.
Example: to calculate the value of the standard normal corresponding to the 97.5th percentile, use the command
.display invnorm(0.975)
.1.959964

The statistical functions in STATA are not as systematically labelled and as numerous as in SYSTAT. Existing STATA functions (V. 6) are shown in the following table, arranged in the same order as the one above for SYSTAT.

Exhibit: Table of STATA distribution functions (STATA V7.0)

3. EXAMPLES OF USE OF STATISTICAL FUNCTIONS

1. Cumulative Distribution Functions

Use the -CF distributions to obtain probabilities (i.e., p-values) associated with observed sample statistics.

EX: calculate the 2-sided p-value of the slope of a simple regression model (i.e., with 1 independent variable plus a constant) with t* = 1.79 and n=27. Use the Student t distribution with n-2 = 25 df:

calc 2*(1-tcf(1.79,25))
0.085575

. display 2*(ttail(25,1.79))
.08557458

This calculates the 2-sided p-value as twice the area under the curve above 1.79.

EX: calculate the 2-sided p-value of a regression coefficient in a multiple regression model with p-1 = 5 independent variables plus a constant (so that p=6), with t* = 1.79 and n=27. Use the Student t distribution with n-p = 27-6 = 21 df:

calc 2*(1-tcf(1.79,21))
0.087885

. display 2*(ttail(21,1.79))
.08788488

Note this is slightly larger than in the previous example because of fewer df (21 versus 25).

EX: calculate the 2-sided p-value of a regression coefficient in a regression model (simple or multiple, it doesn't matter) when n is large (say n>= 100), with t* = 1.79. Use the standard normal distribution:

calc 2*(1-zcf(1.79))
0.073454

. display 2*(1-norm(1.79))
.07345391

Note that you can use the t distribution tcf or ttail with any n. The tcf result will automatically converge to the zcf result when n becomes large.

EX: calculate the p-value for an F test. F* is 4.14; the F distribution has 3 and 7 df.

calc 1 - fcf(4.14,3,7)
0.055480

. display 1 - F(3,7,4.14)
.05548043

Here the result is not multiplied by 2 because the F test is one-sided.

EX: calculate the one-sided p-value of a regression coefficient from a multiple regression when n is large (say above 100), so one can assume the sampling distribution is normal, t* = -2.033, and the research hypothesis is that the regression coefficient is negative:

calc zcf(-2.033)
0.021026

. display norm(-2.033)
.02102626

Here the result is not multiplied by 2, because one wants the one-sided p-value. Note also that (unlike normal tables) the zcf (SYSTAT) or norm (STATA) functions return the probability for negative values of the sample statistic.

2. Density Functions

Use the -DF density functions to calculate the probability density at x and to plot graphs of statistical functions.

EX: NKNW (pp. 30-32) illustrate the calculation of the likelihood of a sample of 3 observations given values of the mean m, assuming normally distributed errors. Y₁ = 250, s = 10, m=259. Thus the likelihood of Y₁ is

calc zdf((250-259)/10)
0.266085

. display normden((250-259)/10)
.26608525

Note that the values given by NKNW (p. 31) and KNN (p. 28) are incorrect. They are all shifted one decimal to the right.

3. Inverse (Cumulative) Distribution Functions

Use the -IF inverse cumulative distribution functions to calculate critical values given alpha and to construct confidence intervals.

EX: In a simple regression model with n = 17, and a = 0.05, what is the critical value ofthe t-ratio t* such that t* greater than this value indicates that the regression coefficient is significantly different from zero at the .05 level (2-tailed)? Use the inverse t distribution with n-2 = 15 df.

calc tif(0.975,15)
2.131450
OR
calc tif(0.025,15)
-2.131450

. display invttail(15,0.025)
2.1314495
OR
. display invttail(15,0.975)
-2.1314495

Note how STATA's invttail function refers to the upper tail probability.

EX: in a multiple regression analysis a regression coefficient is 3.77 with s.e. 0.23. Calculate the 95% CI assuming that n is large, so the sampling distribution can be assumed normal.

calc 3.77 + zif(0.975)*0.23
4.220792
calc 3.77 - zif(0.975)*0.23
3.319208

. display 3.77 + invnorm(0.975)*0.23
4.2207917
. display 3.77 - invnorm(0.975)*0.23
3.3192083

EX: same thing, but now n = 24, and there are p-1 = 3 independent variables plus a constant term, so that p = 4. You now need the t distribution with 24-p = 20 df

calc 3.77 + tif(0.975,20)*0.23
4.249772
calc 3.77 - tif(0.975,20)*0.23
3.290228

. display 3.77 + invttail(20,0.025)*0.23
4.2497716
. display 3.77 - invttail(20,0.025)*0.23
3.2902284

The CI is a bit wider, as one would expect. Note again that that STATA function invttail refers to areas to the right of the value of t.

4. Random Generator Functions

The -RN functions generate psudo-random numbers distributed according to the particular distribution. They are mostly useful to generate large samples of random observations to do Monte-Carlo studies, using SYSTAT's programming language. However, there may be situations when you want to get single random values.

EX: pick a uniformly distributed random number between 0 and 1

calc urn(0,1)
0.179755

. display uniform()
.13698408

NOTE: You do need the parentheses in the STATA function uniform().

EX: assign yourself a random IQ score (with mean = 100 and sd = 15)

calc zrn (100, 15)
95.609537 (ouch!)

. display 100 + 15*invnorm(uniform())
105.50621 (better, but still...)

NOTE: STATA does not have random generators other than uniform, so you have to use uniform() in combinaiton with an inverse distribution function, as shown here.

EX: flip a coin

calc nrn (1,0.5)
1.000000
(nrn(1,0.5) is the binomial function with 1 trial and p=0.5)

. display int(2*uniform())
1
. display int(2*uniform())
0

NOTE: In STATA to generate pseudo-random numbers over the interval [a,b] use a+int((b-a+1)*uniform()), as shown here for a=0, b=1.

Last modified 3 Feb 2004