Introduction to Regression 2023/24
Assessment - Part I
1. The questions in this assessment add up to 100 marks – your total mark contributes 40% of the overall grade for this module
2. To achieve full marks, no additional references/bibliography are required or expected. However, if your answers rely on them, do list them at the end of the assessment following an accepted
referencing style (check these UCL guidelines)
3. For questions 2 and 3, use the data set MCS_testscore.dta that we have been using in the seminars. Be mindful of showing all your
recoding/cleaning of the data – we will not be able to recognise any work that you have done on the data prior to this assessment
4. For the main part of the assessment: Read the questions carefully and make sure that you answer concisely what the question asks.
Regarding Stata results, you can copy and paste graphs. However, if you need to report output, do so in an informative table
5. Collate all your workingsin Stata or R either in a .do file or R script. This file does not need to be formatted, however, please, add some headings/signposting (eg, “Answer to question 2.e”)
6. Submit your answers, converted to PDF, in two files: (1) The
answers to the assessment questions, which need to be concise and clearly labelled, and (2) The appendix (either the .do file or R script of your relevant results) via Wise Flow before 13:00hon 14th
November 2023
7. GOOD LUCK
1. (20% of total marks) Univariate linear regression model. Assume that you are
working as a researcher for a national political party. Your party wants to
understand voting shares in the national elections held at the end of last year.
Specifically, the party is interested in how its vote share in each of the country’s
municipalities, i, is associated with its advertising expenditure earlier last year in
each of the municipalities. Answer the following, making sure that your answers are directly related to this case (ie., do not include generic theoretical explanations).
a. Write down the population regression line that assumes a linear association between both variables
b. What are the coefficients in your equation and what do they represent?
c. In your equation above, do you expect that the estimated coefficients will capture this association exactly? How will this show in your estimation
results?
d. Suppose that you are quite sure that there is no omitted variable bias in your analysis. How would you check that this could be true?
2. (60% of total marks) Estimation of the multivariate model. Using the data set that we explored in the seminars MCS_testscore.dta, you are going to research the
maths skills of 7 year olds in the UK using the variable mtotscor. Make sure that you inspect (and recode, if necessary) this variable and any other variable that you choose for your analysis.
a. Formulate a hypothesis that has mtotscor as a dependent variable and a main independent variable within the data set (make sure that it makes sense). Write down its corresponding null hypothesis too.
b. Assess the possible linear association of both variables using at least two methods. Is there a linear association between the variables?
c. Add two additional continuous variables to your model. Provide a table of
summary statistics of all your variables and comment briefly (make sure that
your table is not Stata output)
d. Estimate the model using OLS:
i. Present the results clearly on an informative table (not Stata output) and write down the equation that you have estimated
ii. Interpret each of the parameters in relation to their statistical and substantive significances, including the intercept
iii. What do the results show in relation to your hypotheses in question 2.a?
e. Make predictions using your model
i. Using the three independent variables in your model, what are the characteristics of the child that receives the highest score in maths skills
ii. Do the same for the child that scores the lowest in maths skills
iii. What is the predicted maths score for the average child according to your model (ie. the ‘average child’ is defined as the child that shows the mean values in all your three independent variables).
3. (20% of total marks) Assess your model
a. Using the statistics that Stata provides in the regression output, evaluate your model
b. Show the residuals from your estimation. What do they tell you about your model?