UN3412 Introduction to Econometrics

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


Department of Economics

UN3412

Fall 2023

Problem Set 1

Introduction to Econometrics

(Erden - Section 1)

Please make sure to select the page number for each question while you are uploading your solutions to Gradescope. Otherwise, it is tough to grade your answers, and you may lose points.

“Calculator” was once a job description.  This problem set gives you an opportunity to do some calculations on the relation between smoking and lung cancer, using a (very) small sample of five countries. The purpose of this exercise is to illustrate the mechanics of ordinary least squares (OLS) regression. You will calculate the regression “by hand” using formulas from class and the textbook. For these calculations, you may relive history and use long multiplication, long division, and tables of square roots and logarithms; or you may use an electronic calculator or a spreadsheet.

The data are summarized in the following table. The variables are per capita cigarette consumption in 1930 (the independent variable, “X”) and the death rate from lung cancer in 1950 (the dependent variable, “ Y”).  The cancer rates are shown for a later time period because it takes time for lung cancer to develop and be diagnosed.

Observation #

Country

Cigarettes consumed per capita in 1930 (X)

Lung cancer deaths per million people in 1950 (Y)

1

Switzerland

530

250

2

Finland

1115

350

3

Great Britain

1145

465

4

Canada

510

150

5

Denmark

380

165

Source: Edward R. Tufte, Data Analysis for Politics and Management, Table 3.3.

1. (21p) Use a calculator, a spreadsheet, or “by hand” methods to compute the following: refer to the textbook for the necessary formulas.  (Note:  if you use a spreadsheet, attach a printout)

(a) (3p) The sample means of Xand YX and Y .

(b) (3p) The standard deviations of Xand YsX and sY.

(c) (3p) The correlation coefficient, r, between Xand Y.

(d) (3p)β(^)1 , the OLS estimated slope coefficient from the regression Yi β0 + β1Xi ui (e) (3p)β(^)0 , the OLS estimated intercept term from the same regression.

(f) (3p) Y(ˆ)= 1,…, n, the predicted values for each country from the regression

(g) (3p) u(ˆ), the OLS residual for each country.

2. (4p) On graph paper or using a spreadsheet, graph the scatterplot of the five data points and the regression line.  Be sure to label the axes, clearly show the data points.

3. (15p) You are hired by the governor to study whether a tax on liquor has decreased average   liquor consumption in New York. From a random sample of n individuals in New York, you obtain each person’s liquor consumption both for the year before and for the year after the introduction of the tax. From this data, you compute Yi ="change in liquor consumption" for  individual i = 1, … . n. Yi  is measured in ounces so if, for example, Yi = 10, then individual i   increased his liquor consumption by 10 ounces. Let the parameters μy and σy2 of Y denote the population mean and variance of Y.

(a) (3p) You are interested in testing the hypothesis H0 that there was no change in liquor consumption due to the tax. State this formally in terms of the population parameters.

(b) (3p) The alternative, H1, is that there was a decline in liquor consumption; state the alternative in terms of the population parameters.

(c) (3p) Suppose that your sample size isn = 900 and you obtain estimates Y(̅) = -32.8 and

sy = 466.4. Report the t-statistic for testing H0 against H 1. Obtain the p-value for the test [use Table 1 in Stock and Watson,p. 749-750]. Do you reject at a 5% level? At 1% level?

(d) (3p) Would you say that the estimated fallin consumption is large in magnitude? Comment on the practical versus statistical significance of this estimate.

(e) (3p) In your analysis, what has been implicitly assumed about other determinants of liquor consumption over the two-year period in order to infer causality from the tax  change to liquor consumption?

4. (6p) Let Y be a Bernoulli random variable with success probability Pr(Y=1) = p, and let

Y1 ,..., Ybei.i.d. draws from this distribution.  Let  p(ˆ) be the fraction of successes (1s) in this sample.

(a) (2p) Show that p(ˆ) Y

(b) (2p) Show that p(ˆ) is an unbiased estimator of p.

(c) (2p) Show that var( p(ˆ) ) = p(1-p)/n

5. (8p) Let Y1, Y2, Y3, Y4, be independently, identically distributed random variables from a

population with mean μand variance σ2. Let Y = (1/4) (Y1+Y2+Y3+Y4) denote the average of these four random variables.

(a) (2p) What are the expected value and variance of Y in terms of μ and σ2?

(b) (2p) Now, consider a different estimator of μ:  =(1/8)Y1+(1/8)Y2,+(1/4)Y3+(1/2)Y4.   This is an example of a weighted average of the Yi.’s. Show that  is also an unbiased estimator of μ . Find the variance of .

(c) (2p) Based on your answer to parts (a) and (b), which estimator of μ do you prefer, or ?

(d) (2p) Suppose Y1, Y2, Y3, Y4 follow a Normal distribution with mean μ=5 and variance σ2=3. What is the distribution of Y and  ?


6. (6p) Suppose at Columbia University, grade point average (GPA) and SAT scores are related by the conditional expectation E(GPA|SAT) = .90 + .001 SAT.

(a) (2p) Find the expected GPA when SAT = 1600. 

(b) (2p) Find E(GPA|SAT=2200)

(c) (2p) If the average SAT in the university is 2000, what is the average GPA?

7. (12p) Suppose that X is randomly drawn from a uniform distribution on the interval [0, 3].    Also, suppose that after the value X = x has been observed (0 < x < 3), Y is randomly drawn from auniform distribution on the interval [x, 3].

(a) (3p) For any given value of x (0 < x < 3), obtain E[Y |X = x].

(b) (3p) In view of part (i), obtain E[Y|X].

(c) (3p) What is the difference between E[Y|X = x] and E[Y |X]?

(d) (3p) Obtain E[Y].

8. (18p) Adult males are taller, on average, than adult females. Visiting two recent American

Youth Soccer Organization (AYSO) under-12-years-old (U12) soccer matches on a Saturday, you do not observe an obvious difference in the height of boys and girls of that age. You suggest to your little sister that she collect data on height and gender of children in 4th to 6th grades as part of her science project. The accompanying table shows her findings.

Height of Young Boys and Girls, Grades 4-6, in inches

Boys

Girls

Y(̅)BoYs

s BoYs

n BoYs

Y(̅)Girls

sGirls

nGirls

57.8

3.9

55

58.4

4.2

57

Where Y(̅)BoYs  is the sample average height for boys,n BoYs  is the number of boys in the samples2BoYs  is the sample variance of height of boys.

(a) (3p) Let your null hypothesis be that there is no difference in the height of females and males at this age level. Specify the alternative hypothesis.

(b) (3p) What is the unbiased estimate of the difference in height between boys and girls?

Provide a formula and check the unbiasedness. Calculate the value of this estimate for the given sample.

(c) (3p) Derive the formula for the variance of the estimate from (b). Calculate the estimate of the variance for the given sample.

(d) (3p) Create a statistic for testing the hypothesis in (a) using the Central Limit Theorem and the Law of Large Numbers.

(e) (3p) Calculate the t-statistic for comparing the two means. Is the difference statistically

significant at the 1% level? Which critical value did you use? Why would this number be smaller if you had assumed a one-sided alternative hypothesis? What is the intuition behind this?

(f) (3p) Generate a 95% confidence interval for the difference in height.


9. (10p) Use the following data to show Law of Iterated Expectations.

(i.e. Show that E(M) = E[E(M|A)])

Following questions will not be graded, they are for you to practice and will be discussed at the recitation:

10. [Practice question, not graded] SW 2.3

Rain (X=0)

No Rain (X=1)

Total

Long Commute (Y=0)

0.15

0.07

0.22

Short Commute (Y=1)

0.15

0.63

0.78

Total

0.30

.70

1.00

Using the random variables X and Y from Table 2.2 (given above), consider two new random variables W = 3 + 6X and V = 20 – 7Y.  Compute:

(a) E(W) and E(V).

(b) σ²W and σ²V.

(c) σW,V and Corr(W,V).


11. [Practice question, not graded] SW 2.6

The following table gives the joint probability distribution between employment status and college graduation among those either employed or looking for work (unemployed) in the   working age US population, based on the 1990 US Census.

Unemployed (Y=0)

Employed (Y=1)

Total

Non-college grads (X=0)

0.045

0.709

0.754

College grads (X=1)

0.005

0.241

0.246

Total

0.050

0.950

1.000

(a) Compute E(Y).

(b) The unemployment rate is the fraction of the labor force that is unemployed.  Show that the unemployment rate is given by 1-E(Y).

(c) Calculate the E(Y|X=1) and E(Y|X=0).

(d) Calculate the unemployment rate for (i) college graduates and (ii) non-college graduates. (e) A randomly selected member of this population reports being unemployed.  What is the probability that this worker is a college graduate? A non-college graduate? (f) Are educational achievement and employment status independent? Explain.

12. [Practice question, not graded] SW 2.14 [Hint: Use SW Appendix Table 1.]

In a population E[Y] = 100 and Var(Y) = 43. Use the central limit theorem to answer the following questions:

(a) In a random sample of size n = 100,find Pr( Y ≤101) 

(b)In a random sample of size n = 165,find Pr( Y >98)

(c) In a random sample of size n = 64,find Pr(101 ≤ Y ≤103)

13. [Practice question, not graded] SW 3.12

To investigate possible gender discrimination in a firm, a sample of 100 men and 64 women with similar job descriptions are selected at random.  A summary of the resulting monthly salaries are:

Avg. Salary (Y )

Stand Dev (of Y)

n


Men

$3100

$200

100

Women

$2900

$320

64

(a) What do these data suggest about wage differences in the firm? Do they represent

statistically significant evidence that wages of men and women are different? (To answer this question, first state the null and alternative hypothesis; second, compute the relevant  t-statistic; and finally,use the p-value to answer the equation.)

(b) Do these data suggest that the firm is guilty of gender discrimination in its compensation politics? Explain.

14. [Practice question, not graded] SW 2.10 [Hint: Use SW Appendix Table 1.]

Compute the following probabilities:

(a) If Y is distributed N(1,4), find Pr(Y≤3). 

(b) If Y is distributed N(3,9), find Pr(Y>0).

(c) If Y is distributed N(50,25), find Pr(40≤Y≤52).

(d) If Y is distributed N(5,2), find Pr(6≤Y≤8)

15. [Practice question, not graded]  SW 3.3

In a survey of 400 likely voters, 215 responded that they would vote for the incumbent and 185 responded that they would vote for the challenger.  Let p denote the fraction of all likely voters that preferred the incumbent at the time of the survey, and let  p(ˆ) be the fraction of survey respondents that preferred the incumbent. (a) Use the survey results to estimate p.

(b) Use the estimator of the variance of p(ˆ) ,  p(ˆ) (1 -  p(ˆ) )/n to calculate the standard error of your estimator.

(c) What is the p-value for the test H0: p=0.5 vs. H1:p≠0.5?

(d) What is the p-value for the test H0: p=0.5 vs. H1:p>0.5?

(e) Why do the results from (c) and (d) differ?

(f) Did the survey contain statistically significant evidence that the incumbent was ahead of the challenger at the time of the survey? Explain.


16. [Practice question, not graded] Consider two events A and B with Pr(A) = 0.5 and Pr(B) = 0.9. Determine the maximum and minimum values of Pr(A U B).

17. [Practice question, not graded] Assume that events A and Bc  are independent. That is, Pr(A ∩ Bc ) = Pr(A)Pr(Bc ). Are events A and B also independent?

18. [Practice question, not graded] Let X and Y denote two random variables.

(i) Show that if at least one of X or Y has expectation equal to zero, then cov(X, Y) = E[XY].

19. [Practice question, not graded]  The following admission data are for the graduate program in the six largest majors at the University of California at Berkeley for the fall 1973 quarter.

(a) What is the overall probability of being admitted for males? For females? What is the standard deviation for males and for females?

(b) How would you write down the null and alternative hypotheses in order to test that the overall probability of admission is higher for men than for women?

(c) Conduct at-test of the hypothesis from part (b) and report the p-value.

(d) Is the result significant at the 5% level? Does it provide evidence of discrimination?

(e) Committee chairpersons claim they are more likely to admit women than men. Is this claim true? Compute acceptance rates for men and women by graduate program.

(f) Do these data suggest that the university is guilty of gender discrimination in its admission policy? Explain briefly.

发表评论

电子邮件地址不会被公开。 必填项已用*标注