Applied Econometrics (Semester 1, 2023/2024) -- Assignment 2
Submitted to the TA or teacher in hard copy before 5:00pm, Friday, 17 November 2023
Using AI tools in doing this Assignment is strongly prohibited!!! |
This assignment paper has a total of 100 marks, and contributes 25% to the course’s overall assessment, where for C4 and C5, do and ONLY do the REQUIRED parts based on your Student Number’s being odd or even.
CLEARLY write down your answers/solutions to each question with your Name and Student Number on some clean paper. Necessary steps/formulas/calculations/arguments MUST be included in your answers as a good practice. Keep FOUR (4) decimals for all calculations/results for relatively higher accuracy, unless clearly unnecessary.
For each estimated regression model, the number in parentheses below each estimated coefficient is its standard error, unless otherwise indicated. In testing hypotheses and/or constructing confidence intervals, using the critical values corresponding to the closest degrees of
freedom from the t-table or F-table (included at the end of this assignment paper for your easy use).
In the Unbiased Income and Comfort (UIC) Kingdom, there are about 1,000 finance schools providing college level education. These finance schools ’ graduates need to pass the Advanced Investment (AI) test in order to get relevant jobs, and their starting wages of course depend on their AI test scores, among other determining factors. To better understand how starting wages of these finance schools ’ graduates are affected by AI test score and other factors, the following population regression model (1) is proposed:
wage = β0 + β1 AI + β2 GPA + β3 book + β4 fee + β5 rank + β6 stut + u (1)
Here in model (1), wage is the average starting wage (in dollars per year) for new finance school graduates, AI is average AI test score (in marks from 0 to 100) for the finance school’s graduating class, GPA is average GPA (I points from 0 to 4) for the finance school’s graduating class, book is the number of books in the finance school’s library (in 1,000s),fee is the average annual cost (in dollars) of attending the finance school, rank is the finance school’s ranking in the UIC Kingdom (with rank = 1 being the best), and stut is the finance school’s student-teacher ratio (i.e., number of students per teacher).
Part A: Basic Concepts (25 marks)
A1 (4 marks): In the population model (1), indicate each of the seven (7) variables is a dependent or an independent variable and each of the seven (7) regression coefficients is a slope or an intercept parameter.
A2 (5 marks): Do you expect β1 > 0 or β1 < 0? Do you expect β5 > 0 or β5 < 0? Briefly explain.
A3 (3 marks): State the normality assumption about the population error u.
A4 (13 marks): For this question only, suppose you want to test the null hypothesis H0 : β2 = 0 against the alternative hypothesis H1 : β2 > 0 at the commonly-used 5% significance level.
A4.1 (4 marks): Explain in words the meanings of H0 : β2 = 0 and H1 : β2 > 0.
A4.2 (4 marks): IF you can or cannot reject H0 : β2 = 0 against H1 : β2 > 0 at the 5% significance level, can you have the same conclusion at the 10% and 1% levels?
A4.3 (5 marks): What does it mean if β2 is said to be statistically significant at the 5% (significance) level? IF you can reject H0 : β2 = 0 against H1 : β2 > 0 at the 5% significance level, can you say β2 is statistically significant at the 5% level? On the other hand, IF β2 is statistically significant at the 5% level, can you reject H0 : β2 = 0 against H1 : β2 > 0 at the 5% level?
Part B: Multiple Regression Estimation (35 marks)
B1 (12 marks): The above population model (1) is estimated using the ordinary least squares (OLS) method
= - 14,670.3073 + 112.3640AI + 14,088.5910GPA + 11.3908 book + 0.3942fee – 121.6621rank + 316.8550stut (2) (14,547.1916) (---.----)
(4,599.6347) (3.5092) (0.1560) (17.0798) (544.1768)
(n = 126, R2 = 0.8088, SSR = 3,766,326,130.5648)
B1.1 (2 marks): What are the population, sample, and sample size in this study?
B1.2 (2 marks): What is the practical meaning of R2 of the estimated sample regression model (2)?
B1.3 (4 marks): Find the total sum of squares (SST) and the explained sum of squares (SSE). When more independent variables are added into the regression model, will the SSE/SST ratio increase or decrease?
B1.4 (2 marks): For two finance schools (Schools 1 and 2), what is the predicted difference in their graduates’ average starting wages IF School 1 has an average AI test score 10-mark higher and an average GPA 0.3-point higher than School 2 (other factors fixed)?
B1.5 (2 marks): For two finance schools (Schools 3 and 4), what is the predicted difference in their graduates’ average starting wages IF School 3 is ranked no. 20 and its library has 30,000 more books than School 4 which is ranked no. 30 (other factors fixed)?
B2 (4 marks): For this question only, suppose you are interested in the casual relationship between AI test score and the average starting wage for new finance school graduates. Let β1 be the slope estimate from regressing wage on AI only, and letβ(教)1 be AI’s coefficient estimate from the population model (1). Which of β1 andβ()1 would you expect to have a bigger variance? Briefly explain.
B3 (14 marks): To examine the relationship between AI test score and other independent variables, the following auxiliary regression model
AI = a0 + a1 GPA + a2 book + a3fee + a4 rank + a5 stut + v (3)
is estimated using the same sample data (as used in model (2)), which has a resulting R-squared (R2) of 0.7265 and an SSR of 2,077.7150.
B3.1 (3 marks): Calculate the variance inflation factor (VAF1) of AI in the estimated regression model (2). Is there any multicollinearity in regression model (2) according to VAF1 and the rule of thumb?
B3.2 (4 marks): For this question only, IF VAF1 was larger than 10, should you delete AI from regression model (2) or use other strategies to solve this serious multicollinearity problem in regression model (2)? Briefly explain.
B3.3 (7 marks): For the estimated regression model (2), calculate the standard error of (the sample)
regression (i.e.,员(σ)) and the standard error of the sample regression coefficient β(教)1 on AI, based on model (2) and the sample estimated results of model (3). Is β1 statistically significant at the 5% level?
B4 (5 marks): For this question only, IF each of the following situations was true for the population regression model (1), indicate which of the five assumptions (e.g., MLR.1) of the Gauss-Markov theorem would be violated.
a) The finance schools that are closer to the researcher are more likely to be included in the sample.
b) Numbers of international/national award-winning professors of finance schools highly influence their ranks.
c) The sample correlation coefficient between stut and fee is as high as 0.95.
d) The variance of the numbers of international/national award-winning professors of finance schools decreases clearly as their ranks get better.
e) Surprisingly it is found that there is a nice relationship in the sample: AI = 15 + 20根GPA.
Part C. Multiple Regression Inference (40 marks)
C1 (10 marks): Based on the population model (1) and the estimated sample model (2):
C1.1 (5 marks): Test the null hypothesis that rank has no effect on the dependent variable wage against the alternative that rank has a negative effect at the 5% significance level. You should clearly write down the null and alternative hypotheses first.
C1.2 (5 marks): Calculate the 99% confidence interval (CI) for β5, based on which can you reject β5 = -50 and β5 = - 100 against the corresponding 2-tailed alternative hypotheses at the 1% significance level?
C2 (8 marks): For this question only, since β1 and β6 are not (so) statistically significant according to the estimated model (2), they are excluded or dropped to form a restricted model which is estimated (using the same sample data as used in model (2)) to have a resulting R-squared (R2) of 0.8072 and an SSR of 3,797,142,116.3451.
C2.1 (4 marks): In terms of the parameters of the population model (1), write down the null hypothesis H0 about the exclusion restrictions. What is the alternative hypothesis H1 and what is its meaning?
C2.2 (4 marks): Test the above null hypothesis H0 (in C2.1) against the alternative hypothesis at the 5% significance level. Interpret your test result.
C3 (6 marks): Test the overall significance of the population model (1) at the 5% significance level based on the estimated sample model (2). You should clearly write down the null and alternative hypotheses first.
C4-odd (9 marks) - for students with odd Student Numbers only: Let θ3 = β3 - β4. Show the population model (1) can also be written as
wage = β0 + β1 AI + β2 GPA + θ3 book + β4 (fee + book) + β5 rank + β6 stut + u (4)
What is the practical meaning of the null hypothesis H0 : θ3 = 0? How to test H0 : θ3 = 0 (vs. θ3 0) at the 5% significance level? Do you think it is more likely or less likely to reject H0 : θ3 = 0 based on the available sample results? Briefly explain.
C4-even (9 marks) - for students with even Student Numbers only: Let θ5 = β3 + β5. Show the population model (1) can also be written as
wage = β0 + β1 AI + β2 GPA + β3 (book - rank) + β4fee + θ5 rank + β6 stut + u (4)
What is the practical meaning of the null hypothesis H0 : θ5 = 0? How totest H0 : θ5 = 0 (vs. θ5 0) at the 5% significance level? Do you think it is more likely or less likely to reject H0 : θ5 = 0 based on the available sample results? Briefly explain.
C5-odd (7 marks) - for students with odd Student Numbers only: For this question only, suppose a restricted model for the population model (1) is
wage - 2000 GPA = Q0 + Q1 (AI + 3 stut) + e, (5)
where Q0 and Q1 are regression coefficients and e is the random error. In terms of the parameters of the population model (1), write down the null hypothesis H0 about the restrictions and interpret its meaning. What is q (the number of restrictions) in determining the degrees of freedom in doing the F-test?
C5-even (7 marks) - for students with even Student Numbers only: For this question only, suppose the restricted model for the population model (1) is
wage - 100 AI = Q0 + Q1 (rank - 20 GPA) + e, (5)
where Q0 and Q1 are regression coefficients and e is the random error. In terms of the parameters of the population model (1), write down the null hypothesis H0 about the restrictions and interpret its meaning. What is q (the number of restrictions) in determining the degrees of freedom in doing the F-test?