STAT6128 Key Topics in Social Science: Measurement and Data

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

STAT6128

Key Topics in Social Science: Measurement and Data

Computer Workshop 4 -Social Mobility

The data

The data we shall be using today comes from the 2006 Programme for International Student Assessment (PISA). This data is designed to be cross-nationally comparable across a wide selection of developed nations. Today we shall focus on occupations. Recall from the lectures that this is the primary outcome of interest for Sociologists. However, in PISA, we cannot measure social mobility in itself; PISA is cross-sectional data, and therefore we do not have any information on children’s eventual outcomes. Instead, we shall investigate the relationship between parental occupation and 15 year old children’s occupational expectations (what job they expect to have when they are 30 years old). So just for today, think of these expectations as  if are  actual  outcomes.  (As  an  aside,  there  has  been  some  work  by  sociologists  and economists who claim expectations mediate the link between social background and attainment during adulthood. So in fact this type of analysis could actually be quite interesting for our understanding of intergenerational mobility).

Start Stata. Create a do file like last week (use ‘version’ to tell Stata which version you use, use the ‘cd’ command to tell Stata from where to open data files and where to save the do file, use the ‘use’ command to open the Stata dataset PISA_IM, which you first need to download from Blackboard into the folder you name behind the ‘cd’ command.)

Like last week, write the bold command lines into your do file and the italic ones into the command window.

Country code and sample size

Once you have opened your data set type

label list Country

You receive the error command ‘value label Country not found’ . As a consequence, the data do not contain any information on which value refers to which country. Given that the data set does not contain information on the country coding,I give it to you here:

Country code

Country name

208

Denmark

276

Germany

352

Iceland

380

Italy

410

Korea

442

Luxembourg

554

New Zealand

616

Poland

620

Portugal

792

Turkey

Type

tab Country

You see a table giving the 3-digit country code. Each number in this first column represents one country. The second column gives you the sample size per country, the third column the percentage of the sample per country.

Measurement of Occupation (Ganzeboom Index)

As mentioned in the lecture, there are many different ways one can “measure” (or rank) occupations. The main method PISA uses is the Ganzeboom ISEI indexof social class. This is a “continuous” measure of occupational prestige, and basically ranks occupations through their impact on people’s income.

To begin, we use this as our measure of occupation. The three variables of interest are: Father’s occupation is labelled BFMJ

Mother’s occupation is labelled BMMJ

Child’s (expected) occupation is labelled BSMJ

Let us investigate BSMJ first. To find out more about the distribution of the variable BSMJ, type:

sum BSMJ, d

Something is wrong…….more than the top 10% of data is coded at one point (“99”).

Normally missing values in Stata are coded as “.” As such,they would be excluded in all commands. However, the original data was coded in SPSS. In SPSS, the missing were coded with the value 99. Transferring the SPSS file into Stata leads to a data point 99, since the transfer was not done properly.

Type

label list BSMJ

You see that 97 and 99 values attributed to the variable are coded as missing values.

If the SPSS data had been transferred properly into Stata format, the missing values should be coded ‘ . ’

We will do that now ourselves.

Type

gen bsmj=BSMJ

(you generate a variable that has exactly the same values as your original BSMJ variable)

replace bsmj=. if BSMJ>96

Now type

sum bsmj, d

Compare this with the sum command beforehand. You see that if missing values are properly coded in Stata (with a ‘.’) then Stata does not show them.

Sometimes you might want to see them though. In this case you can type

tab bsmj, m

The here tells Stata you want to see the missings. You see, that 17 % of values are missing for children’s expected occupation.

Also the variables BFMJ and BMMJ have allocated the values 97 and 99 to missings. Please independently try to create a variable bfmj and bmmj that have the missing values coded properly as ‘ . ’. The solution is given on the next page.

gen bfmj=BFMJ

replace bfmj=. if BFMJ>96

gen bmmj=BMMJ

replace bmmj=. if BMMJ>96

We now want to see how children’s expected occupation is associated with their parents’ occupation. As our measures are “continuous”, we shall use OLS regression.

Firstly, we need to take into account PISA’s complex sampling design. We covered this last week. The PISA survey design uses clustered sampling: first schools are selected and then students within schools. Clustering increases the standard error. We therefore need to tell Stata to take clustering into account.

Type:

svyset SCHOOLID [pw=W_FSTUWT]

This has set up the complex survey design. Now let us perform a regression, relating fathers’ occupation to the child’s expectation. We will estimate this model using all observations from  all countries. Type:

svy: regress bsmj bfmj i.Country

The prefix i. before the variable Country indicates that this is a categorical variable. In this case, we have 10 countries (10 categories) in the variable Country. Hence Stata will create 9 dummy variables.

You should get something like the following output:

The table shows you that there are 788 schools in your data (Number of PSUs), the total sample size is 37,560 students.

Now interpret this table. Which country is the reference country? (Tip: look at the table with the country codes given beforehand)

The  coefficient  of interest  is  the  one  associated  to  BFMJ.  It  is positive  and  statistically significant. This suggests that a 1 point increase in fathers Ganzeboom index is associated with a 0.234 point increase in the child’s Ganzeboom index.

Remember, last week we talked in the lecture briefly about how to interpret regression results. The Ganzeboom index lacks a natural metric (scale). How could we give some more meaning to our results here? We could express the change in the Ganzeboom index in terms of standard deviations.

Find the standard deviations of bfmj and bsmj by typing:

svy: mean bfmj

estat sd

svy:mean bsmj

estat sd

You will receive the following results:

Mean

Standard deviation

bfmj

42.73

15.86

bsmj

60.59

16.81

Question:

If the fathers Ganzeboom index increases by one standard deviation, by how many standard deviations will the child’s index increase? You know that a 1 point increase in the father’s index increases the child’s index by 0.234 points.

0.234*15.86=3.71

Hence if father’s index increases by one standard deviation, the child index increases by 3.71 points. We can express the 3.71 points in standard deviations:

3.71/16.81=0.22 Result:

If the  father’s  Ganzeboom  index  increases  by  one  standard  deviation,  the  child’s  index increases by 0.2 standard deviations.

In conclusion our regression results show that from an intergenerational mobility perspective, we can say that children of fathers with higher ranking occupations enter (or at least “expect to enter”) better jobs.

How does this vary across developed nations? To get a rough idea (and only this time ignoring the complex sampling design), type:

bysort Country: regress bsmj bfmj

tab Country,gen(C)

forval i=1(1)10{

svy, subpop(C`i'): regress bsmj bfmj

}

This generates a set of dummy variables for each country (named C1-C10); then uses a loop to execute a svy:regress command for each of these countries.

This has reproduced the analysis for each individual country. Notice the relationship is weakest in Turkey (country 792) and Korea (country 410). It seems that the jobs children “expect” to enter in these countries are not strongly associated with their father’s occupation. On the other hand, in Poland (country 616) the relationship is particularly strong.

Alternative measure of occupation

Perhaps in this case another way of measuring occupation may also be suitable.

The PISA dataset contains an alternative measure of occupation; 4 digit ISCO codes. This is the ILO classification of occupation, look at the following webpage:

http://www.ilo.org/public/english/bureau/stat/isco/index.htm

This data is very interesting because of its detail. Occupations are defined into over 300 categories. However, for today we will convert this into a binary measurement

(“Professional” and “Non-Professional” jobs). In other words, we will examine the

relationship between whether a child is expecting to enter a professional job and whether the child’s parents have a professional job. (We could go further by using logistic regression to   investigate this relationship.  We will examine logistic regression in a later workshop.)

Let us start with this conversion. Create a variable called Student_Pro, which has the value 1 if the variable Student_Occ_ICSO is below 3000 (that means the student aims to become a “Professional”) and it is 0 if the value of Student_Occ_ICSO is 3000 and above. In

addition, give the newly created variable Student_Pro a missing value ‘ .’, if the value of a Student_Occ_ICSO is 9999. First, try yourself to create this variable Student_Pro. If you do not manage the code is given on the next page.

gen Student_Pro=.

replace Student_Pro=0   if   Student_Occ_ICSO>2999

replace Student_Pro=1   if   Student_Occ_ICSO<3000

replace Student_Pro=. if Student_Occ_ICSO==9999

Now create the variable Father_Pro and Mother_Pro using the same specification:

gen Father_Pro=.

replace Father_Pro=0 if Father_Occ_ICSO>2999

replace Father_Pro=1 if Father_Occ_ICSO<3000

gen Mother_Pro=.

replace Mother_Pro=0 if Mother_Occ_ICSO>2999

replace Mother_Pro=1 if Mother_Occ_ICSO<3000

Now type the following:

svy:tabulate Father_Pro Student_Pro , row

svy:tabulate Mother_Pro Student_Pro , row

What do these results show?

Up to now, we have looked at all countries together.  Now let’s examine Poland and Korea separately.

Start with Korea. Type:

svy:tabulate Father_Pro Student_Pro  if Country==410, row

svy:tabulate Mother_Pro Student_Pro  if Country==410, row

Then do the same for Poland (code 616).

What results do you find?  Compare the tables.

Measurement Error

We shall finish this part of the workshop by briefly considering the role of measurement error. Firstly, recall from the lectures that children act as proxy respondents for their parents. That is, it is children who report their parents’ education and occupation (not the parents themselves). Children may not always report this correctly.

For this set of countries, however, data has been collected from both the parent and the child (note this was not done for all countries, and was not done in the PISA 2000 or 2003 waves). We can therefore investigate how well children report their parents’ occupation. In particular,

Parent_Report_Father_Occ_ICSO is fathers’ reports of their own occupation

Parent_Report_Father_Pro is fathers’ reports about whether they are a professional Parent_Report_Mother_Occ_ICSO is mothers’ reports of their own occupation

Parent_Report_Mother_Pro is mothers’ reports about whether they are a professional

Let’s consider whether children can accurately report if their mother or father is a professional. Type (ALL ON ONE LINE):

tab    Parent_Report_Father_Pro    Father_Pro    if    Parent_Report_Father_Pro!=.    &

Father_Pro!=., col

Look at the main diagonal (top left to bottom right). If there was no measurement error, all observations would be in these cells. Instead, we can see some misclassification: children report their father to be a professional when he is not (and viceversa). This is of course assuming that parents accurately report their own occupation …

发表评论

电子邮件地址不会被公开。 必填项已用*标注