STAT2008/STAT2014/STAT6014 Regression Modelling Assignment 1

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS
REGRESSION MODELLING
(STAT2008/STAT2014/STAT6014)
Assignment 1 for Semester 2, 2024

INSTRUCTIONS:

  • This assignment is a total of 60 marks worth 10% of your overall grade for this course.
  • Please submit your assignment in the Assignment section on Wattle using the Turnitin submission link. When uploading to Wattle you must submit the following, combinedinto a single ’PDF’ document:
    1. Your assignment/report in a pdf document.
    2. All your R codes you have used for the assignment added as an Appendix to the end of the report. Failure to upload the R code will result in a penalty.
  • • Assignment solutions should be typed. Your assignment may include some carefully edited R output (e.g. graphs, tables) showing the results of your data analysis and a discussion of these results, as well as some carefully selected code. Please be selectiveabout what you present and only include as much R output as necessary to justifyyour solution. It is important to be be concise in your discussion of the results. Clearly label each part of your report with the part of the question that it refers to.
  • Unless otherwise advised, use a significance level of 5%.
  • Marks may be deducted if these instructions are not strictly adhered to, and marks will certainly be deducted if the total report is of an unreasonable length, i.e. more than 10 pages including graphs and tables. You must include an appendix, that is in addition to the above page limit, which include all the R code. Although, the appendix will not be marked but if the R codes are not provided then marks will be deducted. The R codes are required should there be any question the markers have about the work you have submitted.
  • You may ask me (Abhinav Mehta) questions about this assignment up to 24 hours before the submission time. This will allow me enough time to respond to your questions. The tutors will not entertain any questions about the assignment other than troubleshooting R codes.
  • Late submissions are not allowed. If the assignment is not submitted by the due datethen you will receive a 0.
  • Extensions will usually be granted on medical or compassionate grounds on productionof appropriate evidence. You must have applied for an extension before the submission deadline for it to be considered. All extensions are to be applied via the extensions portal available on the wattle page for this course.

Question 1 [60 Marks]

The small island country of Eadrax mandates that every homeowner have building insurance. They are free to approach any private insurnace company to buy their insurance, but should all of these insurance companies deny insurance then the individual can approach the Eadrax government for guaranteed coverage. The Eadrax government wishes to study the impact of various factors on denial of insurance by private companies to provide better regulations. Data was collected on each postcode of Eadrax on various factors like income, race, fire, theft, age, side and the primary repsonse variable insured. In this report, we will only focus on one of the predictors, income, which represents the median family income ($ ’000s) in the given zipcode. We use this covariate to understand the uptake of insurance offered by the Eadrax government as measured by the response variable, insured. This variable records the percentage of households in a given zipcode which are insured by the Eadrax government rather than a private insurance company. You are tasked with undertaking this analysis. The dataset insurance.csv contains information on all 47 zipcodes of Eadrax for all the variables mentioned above. Write a report answering the questions asked in the various subparts below.

(a) [10 marks] Conduct an exploratory data analysis to assess whether the two variables, insured and income are associated with each other. Is there a statistically significant correlation between the variables? Use the cor.test() function to con duct a suitable hypothesis test. Clearly specify the hypotheses you are testing and interpret the results.

(b) [5 marks] Based on the exploratory analysis only, are the variables suited for a normal error regression model? If not, what transformations would you apply to our set of variables to meet the assumptions? Provide sufficient evidence to support your argument.

(c) [20 marks] Fit a simple linear regression (SLR) model with insured as the response variable and income as the predictor with your chosen transformations from part (b). Construct a plot of the residuals against the fitted values, a normal Q-Q plot of the residuals, a bar plot of the leverages for each observation and a bar plot of Cook’s distances for each observation. Use these plots (and other statistics necessary) to comment on the model assumptions and presence of any unusual observations.

(d) [10 marks] What are the estimated coefficients of the SLR model in part (c) and the standard errors associated with these coefficients? Interpret the values of these estimated coefficients in relation to the response variable. Perform t-tests to test whether or not these coefficients differ significantly from zero. What do you con clude as a result of these t-tests?

(e) [10 marks] Produce the ANOVA (Analysis of Variance) table for the SLR model and interpret the results of the F-test. What is the coefficient of determination for this model and how should you interpret this summary measure?

(f) [5 marks] What is the expected insured percentage for a zipcode where the median family income is $ 10,000. Construct an appropriate 99% interval estimate for this prediction.

发表评论

电子邮件地址不会被公开。 必填项已用*标注