PROBLEM SET 1 - EC551 (A1)
Spring 2024
Due: no later than Thursday, 2/8/2024 at 11:59pm Eastern time
Problem Set must be uploaded on Blackboard in the appropriate section.
INSTRUCTIONS:Please submit your problem set as a single PDF file titled PS1_[LastName][Firstname].pdf. Example: PS1_PasermanDaniele.pdf. This is an individual problem set. You are encouraged to discuss your papers with each other but must turn in separate work demonstrating independent thought and investigation.
Total: 100 points (25 points each section). Considerable weight will be given to effort, even if the final answer is not 100% correct.
Please TYPE your answers to the extent possible.
1. Omitted variable bias in returns to schooling. Assume that you have individual level data on wages (y) and years of schooling (x) for a representative sample of the US population. An economist hypothesizes that individuals learn in school various skills that employers value. Therefore, the economist hypothesizes that there is a positive relationship between years of schooling and wages.
a) Draw a sketch of a scatter plot, with years of schooling on the horizontal axis and wages on the vertical axis, if the economist’s hypothesis is correct. Add to this scatter plot the line of best fit (you don’t have any data, so this will only be a sketch of what the actual data might look like).
b) Now assume that there are in the population two types: individuals with high cognitive ability (HighAbility = 1) and individuals with low cognitive ability (HighAbility = 0).
i. On average, do you expect individuals with high cognitive ability to have more years of schooling, fewer years of schooling or about the same, relative to individuals with low cognitive ability?
ii. On average, do you expect individuals with high cognitive ability to have higher wages, lower wages, or about the same, relative to individuals with low cognitive ability?
c) Sketch on a single new figure a scatter plot of years of schooling and wages for individuals with high and low cognitive ability (use separate markers or colors to indicate the two different types). The figure should reflect your answers to parts b.i and b.ii.
d) Now, assume that the true model is:
wagei = Y0 + Y1 YTsschooling + Y2HighAbility + vi
You estimate instead the regression:
wagei = β0 + β1 YTsschoolingi + ui
Will the estimate of β1 be an unbiased estimate of Y1 , or will it biased? If biased, upward or downward biased? Explain, using the scatter plots you drew in part c)
e) Can you think of any variable Z that, if omitted from a regression of wages on years of schooling, would bias the estimate of the slope coefficient downwards? How should this variable be correlated with wages and with years of schooling?
2. More on omitted variable bias. For the following two examples, assume that the true model is
Yi = Y0 + Y1xi + Y2Di + vi .
Using scatter plots as in question 1, explain whether the estimate of the slope coefficient in a regression of Y on x alone is an upward biased estimate, a downward biased estimate, or an unbiased estimate of Y1 . Make sure to state clearly your assumptions about the signs of Y1, Y2 and the sign of the relationship between xi and Di . [Careful: when the slope is negative , a downward biased estimate means that the estimated regression line is steeper (i.e., the slope is more negative) than the true regression line, and an upward biased estimate means that the regression line is flatter (i.e., a less negative slope, or maybe even a positive slope) than the true regression line].
a) Yi is weekly hours worked of individual i, xi is the hourly wage of individual i, and Di is a dummy variable equal to 1 if individual i is highly motivated, and equal to zero if individual i is not highly motivated.
b) Yi is the average wage growth of US-born workers in city i between 2010 and 2019, xi is the number of new immigrants who settled in city i between 2010 and 2019, and Di is an dummy variable equal to 1 if city i had above-average wage growth between 2000 and 2010, and equal to zero if city i had below-average wage growth between 2000 and 2010.
3. Labor Force Participation Rates and Wages.
Go to the data section of the website of the Bureau of Labor Statistics, https://www.bls.gov/data/home.htm. Use the “One Screen” option, and select the following series:
- From the “Employment/Labor Force Statistics (CPS)” section: the civilian labor force participation rate, separately for men and women aged 25 and over, between 1979 and 2019 (you will be able to choose the date range on the second screen).
Select the options All Races, All Origins, All educational levels, Marital Status N/A, Seasonally Adjusted, and Quarterly Periodicity. (Two series in total)
- From the “Pay & Benefits/Weekly & Hourly Earnings (CPS)” section: Median usual weekly earnings - in current dollars (second quartile), separately for men and women aged 25 and over, between 1979 and 2019. Select the options All Industries, All Occupations, All Races, All Origins, All educational levels, Wage and salary workers, excluding incorporated self-employed, and Employed full time. (Two series in total).
- From the “Inflation & Prices/All Urban Consumers (Current Series)” section:
the consumer price index for all urban consumers (CPI-U), U.S. city average, All Items, between 1979 and 2019, seasonally adjusted. On the second screen, choose the option “include annual averages” (One series in total)
Download all the series (you should have a total of five series). Before starting the analysis, transform the weekly earnings into real 1999 dollars using the CPI series.
a) Using your preferred software (Excel, Stata, or any other statistical software program), create a figure depicting the time series of labor force participation rates (a separate graph for men and women). Comment briefly on the results.
b) Using your preferred software (Excel, Stata, or any other statistical software program), create a figure depicting the time series of median weekly earnings (a separate graph for men and women). Comment briefly on the results.
c) The model of labor supply we studied in class predicts that male labor force participation increases with male wages (assuming that the substitution effect dominates the income effect), and, for married men, decreases with female wages (assuming that female wages represent unearned income from the perspective of the husband). Similarly, female labor force participation rates should increase with female wages, and decrease with male wages (for married women, assuming that male wages represent unearned income from the perspective of the wife). In (very) broad brushes, are the patterns of male and female labor supply consistent with the predictions of the model?
d) What are some possible limitations of the analysis in part c)? [Hint: think both in terms of data limitations – how would you improve the analysis if you were not constrained by the data available on the BLS website; and in terms of econometric analysis – can you interpret the relationships as causal?]
4. Exploring heterogeneity in labor force participation rates. Using the Labor Force
Statistics (CPS) – “One screen” option of the BLS web site, pick one breakdown of the population of your choosing, and plot on one graph the evolution of labor force participation rates, separately for the different groups that you have chosen. For example, you could plot on one graph the LFP rates of men aged 16-24, 25-54, 55-64, and 65+.
Alternatively you could plot the LFP rates of women who are married, divorced, widowed, and single.
a) Explain briefly why you chose to focus on your particular breakdown, and what you expect to see.
b) Comment briefly on what you found.