University of North Carolina at Chapel Hill
SOCI 208 - STATISTICS FOR SOCIOLOGISTS - Fall 2002
Professor François Nielsen
Assignment 1 - Released Tue 3 Sep
DUE Thu 19 Sep
MATERIALS
The problems in this assignment cover Modules 1 to 4.
PROBLEMS TO DO BY HAND
Do the problems from Chapter 3 by hand with a calculator, or using a spreadsheet,
to get a feel for how these statistics are calculated.
From Neter, Wasserman, and Whitmore (NWW):
1. 3.2 p. 98 (mean)
2. 3.10 p. 99 (trimmed mean; When NWW say "80 percent trimmed
mean" they mean the mean of the 80 percent remaining observations, after
trimming 20 percent. Usage of other authors may differ.)
3. 3.12 p. 99 (median)
4. 3.24 p. 100 (percentile; be sure to look first at how they
do this in Figure 3.4a)
5. 3.25 p. 100 (range and IQR)
6. 3.32 p. 101 (standard deviation)
7. 3.35 p. 101 (coefficient of variation)
8. 3.44 p. 102 (box plot by hand) Do a box plot following
the conventions shown in class (and in the notes), defining inner and outer
"fences" for outliers, NOT the simplified conventions used in NWW, who
do not use fences.)
9. 4.5 p. 134 (sample space) Hint: construct a table similar to
the one shown in Figure 4.4 p. 113.
10. 4.6 p. 134 (events, complements, etc.)
11. 4.15 p. 135 (intersection, union, union of complements)
12. 4.20 p. 135 (objective and subjective probability)
13. 4.23 p. 136 (probability and odds)
14. 4.34 p. 138 (joint and conditional probability distribution; reconstructing
joint probability distribution from partial information)
15. 4.37 p. 138 (multiplication, complementation, addition theorems)
16. 4.51 p. 140 (dependence of two variables)
17. 4.55 p. 141 (inequalitites involving probabilities; this one is
a bit more abstract; do your best)
18. 4.59 p. 141 Optional (two events cannot be both mutually exclusive
and independent)
BY COMPUTER
19. To do problem 19 use the file world209.syd or world209.dta.
Refer to the WORLD HANDBOOK list of variable (on the web in Data) as necessary
to find the definitions of variables.
19.1 For each of the following variables find the mean, the standard
deviation, the variance, the minimum, the maximum, the range, and the coefficient
of skewness.
-
v156 Gini: income inequality (The Gini coefficient is a measure of
inequality of a distribution that varies between 0 - perfect equality -
and 1 - absolute inequality. Values of Gini in the World Handbook
are expressed as percentages so that 41.7, for example, represents a Gini
of 0.417.)
-
v189 ?
-
v195 Life expectancy, females
-
v111 GNP/Capita
-
l10v111 Log 10 GNP/Capita
19.2 For each of the variables, produce a stem and leaf display and
a box plot. Identify each outlier in a box plot using the listing
in Appendix 1 and write the four letter code of the country next to the
symbol. Explain briefly why that country might be an outlier for
that variable. If you don't know why write "I don't know".
Describe laconically the distribution of each variable:
-
Using the box plot and stem and leaf plot, describe the shape of the distribution,
noting any tail and assessing its length
-
Compare the values of the sample mean and median (calculated in problem
9), and try to relate any difference in these quantities to the shape of
the distribution.
-
Is the sign and value of the coefficient of skewness (G1) consistent with
the visual appearance of the distribution and the comparison of the mean
and median? Explain.
-
Compare the distributions of v111 and l10v111. Which one is the most
"compact" looking?
19.3. Do a SPLOM (scatterplot matrix) of l10v111, v111, v189,
v195, v156 (in this order) with little histograms in the diagonal using
SYSTAT. Use the HALF option by clicking it in the dialog box.
Comment on the following:
-
Why does the relationship between l10v111 and v111 look the way it does?
-
Describe the relationship between v189 and v195 (after figuring out what
v189 is from the WORLD HANDBOOK variable list) and explain why it is not
perfect.
20. For this set of problems you will use the 1998 General Social
Survey data in the file gss98.dta.
Suppose you have had a heated discussion about whether women (being
from Venus) tend to have more friends than men (who are from Mars).
You will test your belief with the following steps.
20.1 (2 pts.) You will be using the variable
numfrend. What does this variable mean? (Try using the GSS on-line
codebook or the description of variables.) What is its level of measurement
(ratio, interval, ordinal, categorical)?
20.2 (2 pts.) To check out this variable,
do two frequencies, one including missing data, one not.
20.3 (1 pt.) Why do you think there are so many missing
data? (Try looking at the GSS codebook again.)
20.4 (1 pt.) Recode this variable to exclude the
codes that are missing data ("no number given" "many" "96+). Do another
frequency distribution to make sure you did this correctly (comparing the
N without these values with that in the frequency you did before that included
them).
20.5 (7 pts.) Now, what are the mean, mode, first
quartile, coefficient of variation, standard deviation, and variance of
this variable? Is it skewed?
20.6 (2 pts.) What is the standardized value corresponding
to having 6 friends? What does this mean?
20.7 (1 pt.) Do a graph of the distribution of numfrend,
with an appropriate scale.
20.8 (7 pts.) Do a box plot of numfrend and explain
what it means.
20.9 (2 pts.) Now create a graph that shows gender
differences in the distribution of number of friends.
20.10 (4 pts.) What are the mean number of friends
for women and for men? Which group has greater variation in number of friends?
20.11 (3 pts.) You are tired of using so many numbers,
so you decide that you can better describe number of friends using 3 categories--low,
medium, high. You look at the frequencies and graphs to decide where
to make the cutoffs for the categories, that of course must be exhaustive
and mutually exclusive. You are very careful to make everything
explicit and to deal with missing data. (Be sure to show your Stata programming
for this.) Run a frequency distribution for this new variables, with
labels for the categories. Check it against the original frequency distribution.
Do you have the same number of missing observations as before? Do the categories
have the right numbers of observations in them?
20.12 (2 pts.) Using the new variable, crosstabulate
sex and number of good friends. Be sure to show percentages
in a way that answers your question about gender differences in friends.
20.13 (1 pt.) Using the new variable, do a graph
that you feel best shows gender differences in number of good friends.
20.14 (3 pts.) What do you conclude about gender
differences in friends? Refer back to appropriate parts of your analysis,
including graphs. (You do not need to summarize everything
-- just what is relevant to reaching your conclusion.)
Appendix 1 - STATA Commands for Problem 20
[Expressions in brackets are comments by the instructors; they are not
part of the Stata program.]
set mem 32000(k)
use "\\\Asnt1\Users\Sociology\sharenew\soc208\GSS1998\gss98.dta", clear
set more 1
ta numfrend
ta numfrend, missing
ta numfrend, nolabel
replace numfrend=. if numfrend>75 [assigning missing values]
ta numfrend, missing
ta numfrend
summ numfrend, detail
gr numfrend, bin(25)
gr numfrend, box
sort sex
gr numfrend, by (sex) bin(25)
ta sex, summ (numfrend)
generate frendcat=3 if numfrend>10 &numfrend!=. [creating a variable
with 3 categories]
replace frendcat=1 if numfrend<4 &numfrend!=.
replace frendcat=2 if numfrend>3& numfrend<11 &numfrend!=.
ta frendcat
ta frendcat, missing
ta sex frendcat, row col cell
sort sex
gr frendcat, by (sex) bin(3)
label define cat 1 "1-3 friends" 2 "4-10 friends" 3 "11-75 friends"
label values frendcat cat [applying defined labels to values of a variable]
ta frendcat
exit, clear
Appendix 2 - Selected variables from the WORLD HANDBOOK (world209.syd or
world209.dta)
ID
V3$ V156 V189 V195
V280 V111 L10V111
1 USA 41.7
69 77 72 7120
3.852
2 PRTR 45.3
69 76 37 2300
3.362
3 CNDA 33.3
69 76 55 6930
3.841
4 BHMS 46.7
64 67 60 3110
3.493
5 CUBA
69 72 31
800 2.903
6 HATI
49 51 15
190 2.279
7 DMNR 49.3
57 59 26
720 2.857
8 JMCA 57.7
63 67 25 1110
3.045
9 TRNT
64 68 0
2000 3.301
10
BRBD 36.9 63
67 0 1410 3.149
11
GRND
60 66 0
390 2.591
12
MXCO 58.3 63
67 33 1050 3.021
13
GTML 30.0 48
50 13 570 2.756
14
HNDS 61.9 52
55 16 360 2.556
15
ELSL 46.5 57
60 9 460
2.663
16
NCRG
51 55 21
700 2.845
17
CRCA 44.5 62
65 23 960 2.982
18
PNMA 42.6 64
68 33 1290 3.111
19
CLMB 56.2 59
63 40 580 2.763
20
VNZL 54.5 66
66 50 2280 3.358
21
GYNA 41.9 59
63 23 510 2.708
22
SRNM 32.4 63
67 0 1370 3.137
23
ECDR 68.3 55
58 22 590 2.771
24
PERU 59.4 53
56 30 760 2.881
25
BRZL 57.4 58
61 39 1030 3.013
26
BOLV 53.0 46
48 23 360 2.556
27
PRGY
60 64 22
580 2.763
28
CHLE 50.7 61
66 49 990 2.996
29
ARGN 43.8 65
71 51 1550 3.190
30
URGY 42.8 66
72 38 1300 3.114
31
UK 33.9 73
73 63 3780 3.577
32
IRLD
69 74 23 2390
3.378
33
NTHL 44.9 71
77 28 5750 3.760
34
BLGM
68 74 28 6270
3.797
35
LXBG
67 74 0
6020 3.780
36
FRNC 51.8 69
77 45 5950 3.775
37
SWTZ
70 76 30 8410
3.925
38
SPAN 39.3 70
75 40 2750 3.439
39
PRTG
65 72 13 1570
3.196
40
FRG 39.4 68
75 35 6670 3.824
41
GDR 20.4 69
74 24 3910 3.592
42
PLND 26.4 67
75 20 2600 3.415
43
AUST
68 75 31 4870
3.688
44
HNGR 24.4 67
72 28 2150 3.332
45
CZCH 19.4 67
74 17 3610 3.558
46
ITLY
69 75 29 2810
3.449
47
MLTA
68 73 0
1390 3.143
48
ALBN
65 67 8
510 2.708
49
YGSL 34.7 65
70 13 1550 3.190
50
GRCE 38.1 68
71 37 2340 3.369
51
CYPR 31.8 70
73 18 1240 3.093
52
BLGR 21.2 69
74 24 2110 3.324
53
RMNA
67 72 25 1240
3.093
54
USSR
64 74 36 2550
3.407
55
FNLD 47.3 67
76 23 5420 3.734
56
SWDN 38.7 72
78 28 8150 3.911
57
NRWY 36.2 72
78 20 6760 3.830
58
DNMK 36.7 71
77 27 6810 3.833
59
ICLD
73 79 54 5930
3.773
60
CVRD
48 52
260 2.415
61
STPR
0 460 2.663
62
GNBS
37 40
120 2.079
63
EQGN
42 45
320 2.505
64
GMBA
39 42 0
180 2.255
65
MALI
37 40 7
90 1.954
66
SNGL 58.7 39
42 20 360 2.556
67
BNIN 46.8 39
9 130 2.114
68
MRTN
37 40 9
320 2.505
69
NGER
37 40 3
130 2.114
70
IVCT 53.4 42
45 540
2.732
71
GNEA
39 42 11
130 2.114
72
UPVL
32 31 0
110 2.041
73
LBRA
46 44 0
410 2.613
74
SRLE 61.2 42
45 8 200
2.301
75
GHNA
37 14
590 2.771
76
TOGO
32 39 8
250 2.398
77
CMRN
39 43 7
280 2.447
78
NGRA
37 37 11
340 2.531
79
GBON 64.4 25
45 0 2540 3.405
80
CAFR
33 36 8
220 2.342
81
CHAD 36.9 29
35 5 120
2.079
82
CNGO
42 45
510 2.708
83
ZAIR
42 45 21
140 2.146
84
UGND 40.1 48
52 4 230
2.362
85
KNYA 63.7 47
51 8 220
2.342
86
TNZN 50.3 40
3 170 2.230
87
BRND
40 43 0
110 2.041
88
RWND
39 43 0
100 2.000
89
SMLA
39 43 8
110 2.041
90
ETHP
37 40 5
100 2.000
91
ANGL
37 40 8
370 2.568
92
MZBQ
42 45 5
180 2.255
93
ZMBA 52.3 43
46 36 420 2.623
94
ZIMB 66.3 50
53 14 550 2.740
95
MLWI 47.0 41
44 4 130
2.114
96
SAFR 58.1 50
53 20 1270 3.104
97
LSTO
44 48 0
160 2.204
98
BTSN 57.4 42
45 0 350
2.544
99
SWAZ
0 440 2.643
100
MDGS 56.2 38
38 6 200
2.301
101
CMRS
41 44 0
200 2.301
102
MRTS
61 65 17
610 2.785
103
SYCH
62 68 0
580 2.763
104
MRCO 50.0 51
55 35 470 2.672
105
ALGR
52 55 13
870 2.940
106
TNSA 50.2 53
56 18 730 2.863
107
LBYA 26.7 51
55 5530
3.743
108
SDAN 44.6 47
50 5 270
2.431
109
IRAN 50.2 51
57 30 1660 3.220
110
TRKY 56.8 61
61 6 900
2.954
111
IRAQ 62.9 51
54 33 1250 3.097
112
EGPT 43.4 52
54 34 260 2.415
113
SYRA
55 59 34
720 2.857
114
LBNN 53.7 61
65 40 1070 3.029
115
JRDN
53 52 38
460 2.663
116
ISRL 38.4 70
74 41 3790 3.579
117
SDAR
44 47 26 4010
3.603
118
YMNS
44 46 3
200 2.301
119
YMNA
44 46 18
250 2.398
120
KWAT
66 66 44 15190
4.182
121
BHRN
0 2210 3.344
122
QTAR
10970 4.040
123
UAR
0 13600 4.134
124
OMAN
0 2300 3.362
125
AFGN
40 41 8
150 2.176
126
CHNA
60 63
380 2.580
127
MNGL
52 62 23
860 2.934
128
TWAN 28.4 70
70 930
2.968
129
HGKG 43.0 67
75 100 1760 3.246
130
KORN
59 63
450 2.653
131
KORS 36.0 63
67 46 560 2.748
132
JPAN 39.3 72
77 58 4450 3.648
133
INDA 47.8 42
41 10 140 2.146
134
BHTN
42 45
70 1.845
135
PKST 33.0 54
49 17 160 2.204
136
BNGL 34.2 36
36 4 90
1.954
137
BRMA 38.1 49
52 10 110 2.041
138
SRLK 35.3 65
67 7 190
2.279
139
MLDV
0 110 2.041
140
NPAL
42 45 1
110 2.041
141
TLND 51.0 54
59 11 350 2.544
142
KMPC
44 47
143
LAOS
39 42 5
90 1.954
144
VNM
43 46 10
145
VNMN
47
146
VNMS 34.0 39
147
MLYS 51.8 65
71 12 760 2.881
148
SNGP
65 70 100 2450
3.389
149
PHLP 49.4 57
60 15 380 2.580
150
INDS 46.3 48
48 12 220 2.342
151
AUSL 31.9 68
74 69 5700 3.756
152
PPNG
48 48 4
470 2.672
153
NZLD 35.6 69
75 68 4280 3.631
154
FIJI 42.3
20 1090 3.037
155
WSMA
61 65 0
320 2.505