Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
MTHM506 - Statistical Data Modelling
Individual assessment sheet
Marks achieved in this assignment will contribute towards 50% of the final module mark. You should attempt all questions on this sheet. Note that the questions are organised in the order we covered the topics, and not in order of difficulty. Therefore it is advised that you read through the questions first, and start working on those that you feel more comfortable with.
Deadline: Noon (12pm), on 28th February 2025
You should submit one pdf via ELE containing your solutions - it should be written up using word processing software (e.g. LaTeX, R Markdown, or Word). Solutions are expected to be concise, well structured and well presented. Commented R code (e.g. ‘model <- glm( . . .)’) and the outcomes/plots should form part of your solutions. Do not display too much raw R output (e.g. don’t display the full output of ‘summary(model)’), but edit this down to the essentials. Ensure to include justification for each step of your analyses, providing comments alongside your R code to explain what you are doing and add appropriate titles and labelled axes to your plots.
You are expected to work independently- strict disciplinary action will betaken for any plagiarism. Late submissions will also be penalised.
The data required for this assignment are in part of datasets.RData which can be downloaded from the ELE page and loaded into R using the load() function.
Question 1 [25 marks]
The dataframe nlmodel contains data on a response variable y and a single explanatory variable x. A scatter plot of y versus x suggests a strong non-linear relationship:
Suppose for these data we wish to consider the model
(a) [2 marks] Why can’t this model befit using a linear (regression) model?
(b) [2 marks] Write down the likelihood L(θ1 , θ2 , σ 2 ; y , x) and the log-likelihood ℓ(θ1 , θ2 , σ 2 ; y , x)
(c) [1 mark] Write an R function mylike() which evaluates the negative log-likelihood (i.e. −ℓ(θ1 , θ2 , σ ; y , x)) for any values of the three parameters
(d) [7 marks] Use the R function nlm() in association with your function mylike() to numeri- cally minimise the log-likelihood. Provide some evidence of how you chose sensible starting values. Report the maximum likelihood estimates of the parameters and superimpose a plot of the associated mean relationship on a scatter plot of y versus x.
(e) [6 marks] Report the standard errors for θ1 and θ2 , and use those to construct 95% confi- dence intervals.
(f) [3 marks] Test the hypothesis that θ2 = 0.08 at the 5% significance level (not using the confidence interval) and compute the associated p-value of the test.
(g) [4 marks] Use plug-in prediction to construct and plot 95% prediction intervals.
Question 2 [30 marks]
The dataframe aids data relates to the number of quarterly AIDS cases in the UK, yi, from January 1983 to March 1994. The variable cases is yi and date is time, symbolised here as xi. In this question we consider two competing models to describe the trend in the number of cases. Model 1 is
and Model 2 is
(a) [3marks] Plot yi against xi and comment on whether the two proposed models are sensible in terms of the distribution and the relationship of x with the mean.
(b) [5 marks] Fit the two models in R. Plot the estimated trends from each model (λˆi and µˆi) on top of the data with approximate 95% confidence intervals around the mean. Comment on the validity of each model (based on the plot). Obtain the AIC for each model and thus comment on which model is preferable.
(c) [3 marks] Produce the deviance residuals vs fitted values (λˆi and µˆi) plot for each model, comment appropriately and thus propose a way that the two models might be extended to improve the fit.
(d) [4 marks] Implement the proposed extensions to each model, to arrive at a final version for each of them (justified by appropriate hypothesis tests).
(e) [11 marks] On the basis of your answer to (a), analogous plots as in (b) and (c), but also on arguments of model fit based on the deviance and the AIC, comment on which (if any) of the two final models in (d) you would choose as the best. Mention at least one reason why either model is not ideal.
(f) [4 marks] Further extend your final Poisson model to a Negative Binomial model and comment on whether this model is preferable to the other two, on the basis of all the criteria used for comparison so far.