Exhibit: Probability density, cumulative, and inverse cumulative functions [a1003.gif]You can find calculators on the web at:
The number of decimals needs to be set only once before a series of calculations.
SYSTAT provides cumulative, density, inverse and random generator functions for the 13 distributions listed in the table below. The functions are named systematically with 3-letters names with suffix -CF, -DF, -IF, or -RN, accrding to the type of function.
Exhibit: Table of SYSTAT distribution functions (SYSTAT 6.0 DATA p. 143)
The statistical functions in STATA are not as systematically labelled and as numerous as in SYSTAT. Existing STATA functions (V. 6) are shown in the following table, arranged in the same order as the one above for SYSTAT.
Exhibit: Table of STATA distribution functions (STATA V7.0)
EX: calculate the 2-sided p-value of
the slope of a simple regression model (i.e., with 1 independent
variable plus a constant) with t* = 1.79 and n=27. Use the Student
t distribution with n-2 = 25 df:
calc 2*(1-tcf(1.79,25))
0.085575 |
. display 2*(ttail(25,1.79))
.08557458 |
This calculates the 2-sided p-value as twice the area under the curve above 1.79.
EX: calculate the 2-sided p-value of
a regression coefficient in a multiple regression model with p-1 = 5 independent
variables plus a constant (so that p=6), with t* = 1.79 and n=27.
Use the Student t distribution with n-p = 27-6 = 21 df:
calc 2*(1-tcf(1.79,21))
0.087885 |
. display 2*(ttail(21,1.79))
.08788488 |
Note this is slightly larger than in the previous example because of fewer df (21 versus 25).
EX: calculate the 2-sided p-value of
a regression coefficient in a regression model (simple or multiple, it
doesn't matter) when n is large (say n>= 100), with t* = 1.79. Use
the standard normal distribution:
calc 2*(1-zcf(1.79))
0.073454 |
. display 2*(1-norm(1.79))
.07345391 |
Note that you can use the t distribution tcf or ttail with any n. The tcf result will automatically converge to the zcf result when n becomes large.
EX: calculate the p-value for an F test.
F* is 4.14; the F distribution has 3 and 7 df.
calc 1 - fcf(4.14,3,7)
0.055480 |
. display 1 - F(3,7,4.14)
.05548043 |
Here the result is not multiplied by 2 because the F test is one-sided.
EX: calculate the one-sided p-value of
a regression coefficient from a multiple regression when n is large (say
above 100), so one can assume the sampling distribution is normal,
t* = -2.033, and the research hypothesis is that the regression coefficient
is negative:
calc zcf(-2.033)
0.021026 |
. display norm(-2.033)
.02102626 |
Here the result is not multiplied by 2, because one wants the one-sided p-value. Note also that (unlike normal tables) the zcf (SYSTAT) or norm (STATA) functions return the probability for negative values of the sample statistic.
EX: NKNW (pp. 30-32) illustrate the calculation
of the likelihood of a sample of 3 observations given values of the mean
m,
assuming normally distributed errors. Y1 = 250, s
= 10, m=259.
Thus the likelihood of Y1 is
calc zdf((250-259)/10)
0.266085 |
. display normden((250-259)/10)
.26608525 |
Note that the values given by NKNW (p. 31) and KNN (p. 28) are incorrect. They are all shifted one decimal to the right.
EX: In a simple regression model with
n = 17, and a
= 0.05, what is the critical value ofthe t-ratio t* such that t* greater
than this value indicates that the regression coefficient is significantly
different from zero at the .05 level (2-tailed)? Use the inverse
t distribution with n-2 = 15 df.
calc tif(0.975,15)
2.131450 OR calc tif(0.025,15) -2.131450 |
. display invttail(15,0.025)
2.1314495 OR . display invttail(15,0.975) -2.1314495 |
Note how STATA's invttail function refers to the upper tail probability.
EX: in a multiple regression analysis
a regression coefficient is 3.77 with s.e. 0.23. Calculate the 95%
CI assuming that n is large, so the sampling distribution can be assumed
normal.
calc 3.77 + zif(0.975)*0.23
4.220792 calc 3.77 - zif(0.975)*0.23 3.319208 |
. display 3.77 + invnorm(0.975)*0.23
4.2207917 . display 3.77 - invnorm(0.975)*0.23 3.3192083 |
EX: same thing, but now n = 24, and there
are p-1 = 3 independent variables plus a constant term, so that p = 4.
You now need the t distribution with 24-p = 20 df
calc 3.77 + tif(0.975,20)*0.23
4.249772 calc 3.77 - tif(0.975,20)*0.23 3.290228 |
. display 3.77 + invttail(20,0.025)*0.23
4.2497716 . display 3.77 - invttail(20,0.025)*0.23 3.2902284 |
The CI is a bit wider, as one would expect. Note again that that STATA function invttail refers to areas to the right of the value of t.
EX: pick a uniformly distributed random
number between 0 and 1
calc urn(0,1)
0.179755 |
. display uniform()
.13698408 |
NOTE: You do need the parentheses in the STATA function uniform().
EX: assign yourself a random IQ score (with
mean = 100 and sd = 15)
calc zrn (100, 15)
95.609537 (ouch!) |
. display 100 + 15*invnorm(uniform())
105.50621 (better, but still...) |
NOTE: STATA does not have random generators other than uniform, so you have to use uniform() in combinaiton with an inverse distribution function, as shown here.
EX: flip a coin
calc nrn (1,0.5)
1.000000 (nrn(1,0.5) is the binomial function with 1 trial and p=0.5) |
. display int(2*uniform())
1 . display int(2*uniform()) 0 |
NOTE: In STATA to generate pseudo-random
numbers over the interval [a,b] use a+int((b-a+1)*uniform()), as
shown here for a=0, b=1.