University of North Carolina at Chapel Hill

SOCI 208 - STATISTICS FOR SOCIOLOGISTS - Fall 2002

Professor François Nielsen

Assignment 1 - Released Tue 3 Sep
DUE Thu 19 Sep

MATERIALS

The problems in this assignment cover Modules 1 to 4.

PROBLEMS TO DO BY HAND

Do the problems from Chapter 3 by hand with a calculator, or using a spreadsheet, to get a feel for how these statistics are calculated.

From Neter, Wasserman, and Whitmore (NWW):

1.  3.2 p. 98 (mean)

2.  3.10 p. 99 (trimmed mean; When NWW say "80 percent trimmed mean" they mean the mean of the 80 percent remaining observations, after trimming 20 percent.  Usage of other authors may differ.)

3.  3.12 p. 99 (median)

4.  3.24 p. 100 (percentile; be sure to look first at how they do this in Figure 3.4a)

5.  3.25 p. 100 (range and IQR)

6.  3.32 p. 101 (standard deviation)

7.  3.35 p. 101 (coefficient of variation)

8.  3.44 p. 102 (box plot by hand)  Do a box plot following the conventions shown in class (and in the notes), defining inner and outer "fences" for outliers, NOT the simplified conventions used in NWW, who do not use fences.)

9. 4.5 p. 134 (sample space)  Hint: construct a table similar to the one shown in Figure 4.4 p. 113.

10. 4.6 p. 134 (events, complements, etc.)

11. 4.15 p. 135 (intersection, union, union of complements)

12. 4.20 p. 135 (objective and subjective probability)

13. 4.23 p. 136 (probability and odds)

14. 4.34 p. 138 (joint and conditional probability distribution; reconstructing joint probability distribution from partial information)

15. 4.37 p. 138 (multiplication, complementation, addition theorems)

16. 4.51 p. 140 (dependence of two variables)

17. 4.55 p. 141 (inequalitites involving probabilities; this one is a bit more abstract; do your best)

18. 4.59 p. 141 Optional (two events cannot be both mutually exclusive and independent)
 

BY COMPUTER

19.  To do problem 19 use the file world209.syd or world209.dta.  Refer to the WORLD HANDBOOK list of variable (on the web in Data) as necessary to find the definitions of variables.

19.1  For each of the following variables find the mean, the standard deviation, the variance, the minimum, the maximum, the range, and the coefficient of skewness.

19.2  For each of the variables, produce a stem and leaf display and a box plot.  Identify each outlier in a box plot using the listing in Appendix 1 and write the four letter code of the country next to the symbol.  Explain briefly why that country might be an outlier for that variable.  If you don't know why write "I don't know".  Describe laconically the distribution of each variable:


19.3.  Do a SPLOM (scatterplot matrix) of l10v111, v111, v189, v195, v156 (in this order) with little histograms in the diagonal using SYSTAT.  Use the HALF option by clicking it in the dialog box.  Comment on the following:

20.  For this set of problems you will use the 1998 General Social Survey data in the file gss98.dta.
Suppose you have had a heated discussion about whether women (being from Venus) tend to have more friends than men (who are from Mars).  You will test your belief with the following steps.

20.1  (2 pts.)  You will be using the variable numfrend. What does this variable mean? (Try using the GSS on-line codebook or the description of variables.)  What is its level of measurement (ratio, interval, ordinal, categorical)?

20.2  (2 pts.)  To check out this variable, do two frequencies, one including missing data, one not.

20.3  (1 pt.)  Why do you think there are so many missing data? (Try looking at the GSS codebook again.)

20.4  (1 pt.)  Recode this variable to exclude the codes that are missing data ("no number given" "many" "96+). Do another frequency distribution to make sure you did this correctly (comparing the N without these values with that in the frequency you did before that included them).

20.5  (7 pts.)  Now, what are the mean, mode, first quartile, coefficient of variation, standard deviation, and variance of this variable? Is it skewed?

20.6  (2 pts.)  What is the standardized value corresponding to having 6 friends?  What does this mean?

20.7  (1 pt.)  Do a graph of the distribution of numfrend, with an appropriate scale.

20.8  (7 pts.)  Do a box plot of numfrend and explain what it means.

20.9  (2 pts.)  Now create a graph that shows gender differences in the distribution of number of friends.

20.10  (4 pts.)  What are the mean number of friends for women and for men? Which group has greater variation in number of friends?

20.11  (3 pts.)  You are tired of using so many numbers, so you decide that you can better describe number of friends using 3 categories--low, medium, high.  You look at the frequencies and graphs to decide where to make the cutoffs for the categories, that of course must be exhaustive and mutually exclusive.  You are very careful to make everything explicit and to deal with missing data. (Be sure to show your Stata programming for this.)  Run a frequency distribution for this new variables, with labels for the categories. Check it against the original frequency distribution. Do you have the same number of missing observations as before? Do the categories have the right numbers of observations in them?

20.12  (2 pts.)  Using the new variable, crosstabulate sex and number of good friends.  Be sure to show percentages in a way that answers your question about gender differences in friends.

20.13  (1 pt.)  Using the new variable, do a graph that you feel best shows gender differences in number of good friends.

20.14  (3 pts.)  What do you conclude about gender differences in friends?  Refer back to appropriate parts of your analysis, including graphs.  (You do not need to summarize everything -- just what is relevant to reaching your conclusion.)
 

Appendix 1 - STATA Commands for Problem 20

[Expressions in brackets are comments by the instructors; they are not part of the Stata program.]

set mem 32000(k)

use "\\\Asnt1\Users\Sociology\sharenew\soc208\GSS1998\gss98.dta", clear

set more 1

ta numfrend

ta numfrend, missing

ta numfrend, nolabel

replace numfrend=. if numfrend>75 [assigning missing values]

ta numfrend, missing

ta numfrend

summ numfrend, detail

gr numfrend, bin(25)

gr numfrend, box

sort sex

gr numfrend, by (sex) bin(25)

ta sex, summ (numfrend)

generate frendcat=3 if numfrend>10 &numfrend!=. [creating a variable with 3 categories]

replace frendcat=1 if numfrend<4 &numfrend!=.

replace frendcat=2 if numfrend>3& numfrend<11 &numfrend!=.

ta frendcat

ta frendcat, missing

ta sex frendcat, row col cell

sort sex

gr frendcat, by (sex) bin(3)

label define cat 1 "1-3 friends" 2 "4-10 friends" 3 "11-75 friends"

label values frendcat cat [applying defined labels to values of a variable]

ta frendcat

exit, clear
 

Appendix 2 - Selected variables from the WORLD HANDBOOK (world209.syd or world209.dta)

      ID   V3$      V156   V189   V195   V280   V111  L10V111
       1   USA      41.7     69     77     72   7120    3.852
       2   PRTR     45.3     69     76     37   2300    3.362
       3   CNDA     33.3     69     76     55   6930    3.841
       4   BHMS     46.7     64     67     60   3110    3.493
       5   CUBA              69     72     31    800    2.903
       6   HATI              49     51     15    190    2.279
       7   DMNR     49.3     57     59     26    720    2.857
       8   JMCA     57.7     63     67     25   1110    3.045
       9   TRNT              64     68      0   2000    3.301
      10   BRBD     36.9     63     67      0   1410    3.149
      11   GRND              60     66      0    390    2.591
      12   MXCO     58.3     63     67     33   1050    3.021
      13   GTML     30.0     48     50     13    570    2.756
      14   HNDS     61.9     52     55     16    360    2.556
      15   ELSL     46.5     57     60      9    460    2.663
      16   NCRG              51     55     21    700    2.845
      17   CRCA     44.5     62     65     23    960    2.982
      18   PNMA     42.6     64     68     33   1290    3.111
      19   CLMB     56.2     59     63     40    580    2.763
      20   VNZL     54.5     66     66     50   2280    3.358
      21   GYNA     41.9     59     63     23    510    2.708
      22   SRNM     32.4     63     67      0   1370    3.137
      23   ECDR     68.3     55     58     22    590    2.771
      24   PERU     59.4     53     56     30    760    2.881
      25   BRZL     57.4     58     61     39   1030    3.013
      26   BOLV     53.0     46     48     23    360    2.556
      27   PRGY              60     64     22    580    2.763
      28   CHLE     50.7     61     66     49    990    2.996
      29   ARGN     43.8     65     71     51   1550    3.190
      30   URGY     42.8     66     72     38   1300    3.114
      31   UK       33.9     73     73     63   3780    3.577
      32   IRLD              69     74     23   2390    3.378
      33   NTHL     44.9     71     77     28   5750    3.760
      34   BLGM              68     74     28   6270    3.797
      35   LXBG              67     74      0   6020    3.780
      36   FRNC     51.8     69     77     45   5950    3.775
      37   SWTZ              70     76     30   8410    3.925
      38   SPAN     39.3     70     75     40   2750    3.439
      39   PRTG              65     72     13   1570    3.196
      40   FRG      39.4     68     75     35   6670    3.824
      41   GDR      20.4     69     74     24   3910    3.592
      42   PLND     26.4     67     75     20   2600    3.415
      43   AUST              68     75     31   4870    3.688
      44   HNGR     24.4     67     72     28   2150    3.332
      45   CZCH     19.4     67     74     17   3610    3.558
      46   ITLY              69     75     29   2810    3.449
      47   MLTA              68     73      0   1390    3.143
      48   ALBN              65     67      8    510    2.708
      49   YGSL     34.7     65     70     13   1550    3.190
      50   GRCE     38.1     68     71     37   2340    3.369
      51   CYPR     31.8     70     73     18   1240    3.093
      52   BLGR     21.2     69     74     24   2110    3.324
      53   RMNA              67     72     25   1240    3.093
      54   USSR              64     74     36   2550    3.407
      55   FNLD     47.3     67     76     23   5420    3.734
      56   SWDN     38.7     72     78     28   8150    3.911
      57   NRWY     36.2     72     78     20   6760    3.830
      58   DNMK     36.7     71     77     27   6810    3.833
      59   ICLD              73     79     54   5930    3.773
      60   CVRD              48     52           260    2.415
      61   STPR                             0    460    2.663
      62   GNBS              37     40           120    2.079
      63   EQGN              42     45           320    2.505
      64   GMBA              39     42      0    180    2.255
      65   MALI              37     40      7     90    1.954
      66   SNGL     58.7     39     42     20    360    2.556
      67   BNIN     46.8     39             9    130    2.114
      68   MRTN              37     40      9    320    2.505
      69   NGER              37     40      3    130    2.114
      70   IVCT     53.4     42     45           540    2.732
      71   GNEA              39     42     11    130    2.114
      72   UPVL              32     31      0    110    2.041
      73   LBRA              46     44      0    410    2.613
      74   SRLE     61.2     42     45      8    200    2.301
      75   GHNA              37            14    590    2.771
      76   TOGO              32     39      8    250    2.398
      77   CMRN              39     43      7    280    2.447
      78   NGRA              37     37     11    340    2.531
      79   GBON     64.4     25     45      0   2540    3.405
      80   CAFR              33     36      8    220    2.342
      81   CHAD     36.9     29     35      5    120    2.079
      82   CNGO              42     45           510    2.708
      83   ZAIR              42     45     21    140    2.146
      84   UGND     40.1     48     52      4    230    2.362
      85   KNYA     63.7     47     51      8    220    2.342
      86   TNZN     50.3     40             3    170    2.230
      87   BRND              40     43      0    110    2.041
      88   RWND              39     43      0    100    2.000
      89   SMLA              39     43      8    110    2.041
      90   ETHP              37     40      5    100    2.000
      91   ANGL              37     40      8    370    2.568
      92   MZBQ              42     45      5    180    2.255
      93   ZMBA     52.3     43     46     36    420    2.623
      94   ZIMB     66.3     50     53     14    550    2.740
      95   MLWI     47.0     41     44      4    130    2.114
      96   SAFR     58.1     50     53     20   1270    3.104
      97   LSTO              44     48      0    160    2.204
      98   BTSN     57.4     42     45      0    350    2.544
      99   SWAZ                             0    440    2.643
     100   MDGS     56.2     38     38      6    200    2.301
     101   CMRS              41     44      0    200    2.301
     102   MRTS              61     65     17    610    2.785
     103   SYCH              62     68      0    580    2.763
     104   MRCO     50.0     51     55     35    470    2.672
     105   ALGR              52     55     13    870    2.940
     106   TNSA     50.2     53     56     18    730    2.863
     107   LBYA     26.7     51     55          5530    3.743
     108   SDAN     44.6     47     50      5    270    2.431
     109   IRAN     50.2     51     57     30   1660    3.220
     110   TRKY     56.8     61     61      6    900    2.954
     111   IRAQ     62.9     51     54     33   1250    3.097
     112   EGPT     43.4     52     54     34    260    2.415
     113   SYRA              55     59     34    720    2.857
     114   LBNN     53.7     61     65     40   1070    3.029
     115   JRDN              53     52     38    460    2.663
     116   ISRL     38.4     70     74     41   3790    3.579
     117   SDAR              44     47     26   4010    3.603
     118   YMNS              44     46      3    200    2.301
     119   YMNA              44     46     18    250    2.398
     120   KWAT              66     66     44  15190    4.182
     121   BHRN                             0   2210    3.344
     122   QTAR                                10970    4.040
     123   UAR                              0  13600    4.134
     124   OMAN                             0   2300    3.362
     125   AFGN              40     41      8    150    2.176
     126   CHNA              60     63           380    2.580
     127   MNGL              52     62     23    860    2.934
     128   TWAN     28.4     70     70           930    2.968
     129   HGKG     43.0     67     75    100   1760    3.246
     130   KORN              59     63           450    2.653
     131   KORS     36.0     63     67     46    560    2.748
     132   JPAN     39.3     72     77     58   4450    3.648
     133   INDA     47.8     42     41     10    140    2.146
     134   BHTN              42     45            70    1.845
     135   PKST     33.0     54     49     17    160    2.204
     136   BNGL     34.2     36     36      4     90    1.954
     137   BRMA     38.1     49     52     10    110    2.041
     138   SRLK     35.3     65     67      7    190    2.279
     139   MLDV                             0    110    2.041
     140   NPAL              42     45      1    110    2.041
     141   TLND     51.0     54     59     11    350    2.544
     142   KMPC              44     47
     143   LAOS              39     42      5     90    1.954
     144   VNM               43     46     10
     145   VNMN              47
     146   VNMS     34.0     39
     147   MLYS     51.8     65     71     12    760    2.881
     148   SNGP              65     70    100   2450    3.389
     149   PHLP     49.4     57     60     15    380    2.580
     150   INDS     46.3     48     48     12    220    2.342
     151   AUSL     31.9     68     74     69   5700    3.756
     152   PPNG              48     48      4    470    2.672
     153   NZLD     35.6     69     75     68   4280    3.631
     154   FIJI     42.3                   20   1090    3.037
     155   WSMA              61     65      0    320    2.505