STATS 4C03/6C03:
Assignment # 2
Due Tuesday, February 13 at 6:00pm.
Instructions: Please write neatly and when asked, make sure to include your R code and output (in the appropriate places in your solution). Do not wait until the last minute to begin uploading your solutions and do not forget to click Submit. There is a total of 50 marks and a late penalty of 1% per minute.
1. Consider the Gamma distribution parametrized by α and β with probability density function
for α,β > 0 and where Γ(α) is the gamma function.
(a) Based on a random sample X1 ,..., Xn , there are no closed form expressions for the maximum likelihood estimators. Derive the normal equations.
Hint: Γ′ (z)/Γ(z) is the digamma function.
(b) Consider a sample of size n=20 from Gamma(α,β) as follows (posted as Assgt2Q1. csv):
0.091 0.848 0.810 1.224 0.440 1.334 1.943 0.482 0.341 0.017 0.457 0.126 0.146 0.059.
0.772 0.201 0.307 0.010 0.150 0.157.
(i) For the data given, use R’s function optim to find the maximum likelihood estimates of α and β; include your code and output.
(ii) For the data given, write your own code to implement the Newton-Raphson algorithm to find the maximum likelihood estimates. Include your code and a table showing the iterates and final estimates in your solution. (15 marks)
2. A car manufacturer instructed a market research company to analyze which families are going to buy a new car next year using a logistic regression model. The data stems from a random sample of 33 families. Variables to be assessed included theyearly household income (in $1000 USD) and the age of the oldest car in the family (in years). Twelve months later, interviewers assessed which families had bought a new car in the meantime. The data is available in the posted file car. csv.
(a) Perform a logistic regression, where the response is whether they bought a new car (purchase) and the covariates are yearly income (income) and age of oldest car in the family (age). Report the fitted regression equation. Include your code.
(b) Estimate exp(β(ˆ)income ) and exp(β(ˆ)age ), where exp denotes the exponential function.
(c) Estimated the probability that a family with a yearly household income of $50 000USD and whose oldest car is 3 years old will buy a new car?
(d) Is the variable age required in the model?
(e) Is there a significant interaction between income and age? (15 marks)
3. Kyphosis is a curvature of the upper spine resulting in physical deformity. The dataset in Kyphosis. csv consists of data on 81 children who had spinal surgery and indicates their age (in months), the number of vertebrae involved and the number of the start (topmost) vertebra operated on. The response is whether or not the child had kyphosis after the surgery.
(a) (i) Fit a logistic regression model to this data using all three covariates using the formula Kyphosis∼Age+Number+Start.
(ii) Give the estimated parameters and 95% confidence intervals for each. Comment on the interpretation of the intervals.
(iii) Based on the confidence intervals in (ii), which of the covariates do you believe has the strongest relationship with the response based on this model? Justify your answer.
(b) For each of the three covariates, conduct Wald and likelihood ratio tests of the hypotheses that the corresponding β=0 and comment on the results.
Hint: You may want to look into the drop1 function.
(c) (i) The anova function conducts sequential likelihood ratio tests. Do this for the model fitted in (a) and comment on the results.
(ii) Repeat the sequential analysis for the model with the formula Kyphosis ∼Start+Number+Age (reverse the order of the covariates) and comment on any differences.
(d) Conduct a likelihood ratio test comparing the model with 3 covariates to a model in which only the start vertebra number is included. (20 marks)