Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
UN3412
Fall 2020
Problem Set 4
Introduction to Econometrics
for both sections
1. (20p) In this problem we will try and work towards understanding whether government-subsidized savings accounts help people save towards retirement, and if so, by how much. We’ll do this using the 401ksubs.dta dataset attached to this problem set.1 This is a dataset of a cross-section of individuals and includes information on basic demographics, their income and wealth, and whether they participate in a 401(k) account.
(a) (4p) Start by running a naïve regression. Regress net total assets on the dummy variable indicating whether the respondent has a 401(k) account. Interpret the sign and magnitude of the coefficient. Can you give this estimate a causal interpretation? Why (not)?
(b) (4p) Now add in the dummy for eligibility for a 401(k) account and interpret the coefficient [Hint: Can you have a 401(k) account if you are not eligible?]. Does the coefficient on eligibility imply that being eligible for a 401(k) lowers savings? Why (not)? What omitted factors do you think are being picked up here?
(c) (4p) Now let’s drop eligibility from the regression, but let’s add in a set of controls. Add in age, age squared, family size, income, income squared, the male dummy, and the marriage dummy. Interpret five of the coefficients. How does the coefficient on p401k change? Now do you think you can interpret the coefficient on p401k as causal? Why (not)?
(d) (4p) Let’s explore the possibility that the controls matter differently for men than for women. Run a regression of net total assets on the dummy for 401(k) participation and then all the controls as well as their interactions with the male dummy (but do not interact p401k with the male dummy). Interpret the coefficient on p401k and the interaction with the male dummy for two of the controls. Test whether all of the interactions of the controls with the male dummy are jointly significant. How does this change whether you think the coefficient on p401k is causal?
(e) (4p) Finally, let’s see whether 401(k) participation affects savings differentially for men vs women. Run the regression from part d) but also interact the 401k participation dummy with the male dummy. What does this regression imply is the effect on savings of 401k participation for women? For men? Test whether the effect for men is = 0. Test whether the effect for women is = 0. Test whether the effects is the same for men as for women.
2. (10p) What are the root causes of terrorism? Poverty? Repressive political regimes? Religious or ethnic conflicts arising from heterogeneous populations?
In this problem set you will take a look at some empirical evidence on cross-country sources of terrorism. Variables in the data set, terrorism.dta, are defined in Table 1. Note that, to do this problem set, you will need to create (generate) some new variables, which are functions of the variables in terrorism.dta.
Preliminary data analysis:
(a) (2p) Produce the scatterplot offtmpop vs. gdppc.
(b) (2p) Generate the variables lnftmpop = log(ftmpop) vs. lngdppc = log(gdppc). Produce the scatterplot of lnftmpop vs. lngdppc.
(c) (2p) Produce the scatterplot of lnftmpop v. lackpf.
(d) (2p) Using the scatterplots from (a) and (b), would you suggest using the variables (i) ftmpop and gdppc or (ii) lnftmpop and lngdppc for modeling using linear regression?
(e) (2p) Using the scatterplot from (c), does the relation between lnftmpop and lackpf appear to be linear or nonlinear? If nonlinear, what sort of nonlinear curve might you want to explore (briefly explain)?
3. (8p) Estimate the regressions in Table 2 and fill in the empty entries. You may write in the entries by hand or type them using the .doc electronic version of the table on the course Web site. Note: for these regressions, only use the countries that have nonzero values offtmpop.
4. (24p) Use the results in Table 2 to answer the following questions. Note that you do not need much further algebra once you fill the table in question 3.
(a) (2p) Using regression (1), test the hypothesis that the coefficient on lngdppc is zero, against the alternative that it is nonzero, at the 5% significance level. Explain in words what the coefficient means.
(b) (2p) Using regression (3), test the hypothesis that the coefficients on lngdppc and lngdppc2 are both zero, against the alternative that one or the other coefficient is nonzero, at the 5% significance level.
(c) (2p) Explain why the conclusions in (a) and (b) differ.
(d) (2p) Using regression (3), is there evidence that the relationship between lnftmpop and lngdppc is nonlinear?
(e) (2p) Using regression (3), is there evidence that the relationship between lnftmpop and lackpf is nonlinear?
(f) (2p) Using regression (5), test the null hypothesis (at the 5% significance level) that the coefficients on the “other regional dummies” all are zero, against the alternative hypothesis that at least one is nonzero. What is number of restrictions q in your test? What is the critical value of your test?
(g) (2p) Using regression (4), discuss the evidence that ethnic diversity is associated with increases in terrorism, holding constant GDP per capita, religious diversity, and a measure of political freedoms? Interpret the sign of the coefficient on ethnic.
(h) (2p) Using regression (4), discuss the evidence that religious diversity is associated with increases in terrorism, holding constant GDP per capita, ethnic diversity, and a measure of political freedoms? Interpret the sign of the coefficient on religious.
(i) (2p) Using regression (4), test the hypothesis that the population coefficients on ethnic and religion are both zero, against the alternative that one or the other coefficient is nonzero. Explain in words what hypothesis you have tested, and what your conclusion is.
(j) (2p) Can you use regressions (3) and (4) to test the same hypothesis as in (i), this time using the R2 formula for the homoskedasticity-only F statistic? Explain.
(k) (2p) Using regression (4), estimate the effect on lnftmpop of changing from lackpf = 7 (extremely limited political freedoms) to lackpf = 5 (some political freedoms), holding constant the values of the other regressors in regression (4).
(l) (1p) Using regression (4), at approximately what value of lackpf is this relationship maximized?
The quadratic is maximized at –1.466/(2*-.170) = 4.30.
(m)(1p) In words, briefly describe the relationship you found in part (l).