Assignment 3
STATS 3860B/9155B
Winter 2024
• Assingment 3 is due Friday, March 22, 2024, at 11:55 pm.
• You must write your answers and R code using Rmarkdown (template provided with Assignment 1) and generate a single PDF file. Submissions not generated by Rmarkdown will not be graded and receive zero marks.
• Submissions must be done via Gradescope. You must carefully assign questions to their corresponding pages. Submissions without questions assigned to pages will not be graded. Questions with no pages assigned to them will receive zero marks.
• Always show all your work and add comments to your code explaining what you are doing.
Question 1
The dataset melanoma gives data on a sample of patients suffering from melanoma (skin cancer) cross-classified by the type of cancer and the location on the body.
suppressMessages(library(faraway))
str(melanoma)
## ' data. frame ' : 12 obs . of 3 variables:
## $ count: num 22 16 19 11 2 54 33 17 10 115 . . .
## $ tumor: Factor w/ 4 levels "freckle","indeterminate",..: 1 4 3 2 1 4 3 2 1 4 . . .
## $ site : Factor w/ 3 levels "extremity","head",..: 2 2 2 2 3 3 3 3 1 1 . . .
a) Display the data in a two-way table. Make a mosaic plot and comment on the evidence of independence.
b) Check for independence between site and tumour type using a Chi-squared test.
c) Fit a Poisson GLM model and use it to check for independence.
d) Make a two-way table of the deviance residuals from the last model. Comment on your results.
Question 2
The hsb data was collected as a subset of the “High School and Beyond” study conducted by the National Education Longitudinal Studies program of the National Center for Education Statistics. The variables are gender, race, socioeconomic status (SES), school type, chosen high school program type, scores on reading, writing, math, science and social studies. The response variable is the chosen high school program type (prog), which is multinomial with 3 levels.
library(faraway)
library(nnet)
data("hsb")
hsb <- hsb[,-1] ## removing first column corresponding to student ID str(hsb)
## ' data. frame ' : 200 obs . of 10 variables:
## $ gender : Factor w/ 2 levels "female","male": 2 1 2 2 2 2 2 2 2 2 . . .
## $ race : Factor w/ 4 levels "african-amer",..: 4 4 4 4 4 4 1 3 4 1 . . .
## $ ses : Factor w/ 3 levels "high","low","middle": 2 3 1 1 3 3 3 3 3 3 . . .
## $ schtyp : Factor w/ 2 levels "private","public": 2 2 2 2 2 2 2 2 2 2 . . .
## $ prog : Factor w/ 3 levels "academic","general",..: 2 3 2 3 1 1 2 1 2 1 . . .
## $ read : int 57 68 44 63 47 44 50 34 63 57 . . .
## $ write : int 52 59 33 44 52 52 59 46 57 55 . . .
## $ math : int 41 53 54 47 57 51 42 45 54 52 . . .
## $ science: int 47 63 58 53 53 63 53 39 58 50 . . .
## $ socst : int 57 61 31 56 61 61 61 36 51 51 . . .
a) Fit a multinomial regression model for prog (with baseline level academic) and all nine predictors.
b) Interpret the coefficients corresponding to the five subjects (scores on reading, writing, math, science and social studies) in terms of odds.
c) Regarding to part b), identify which one of the five subjects gives unexpected results and suggest an explanation for this behavior. Any reasonable explanation will be accepted.
Question 3
Refer to Exercise 1 Chapter 8 of the textbook (page 171). Work on all parts - a) to e).
Question 4
This question refers to Exercise 4 of Chapter 8 of the Faraway textbook (page 172). Work on all parts - a) to g).
Question 5
The denim dataset concerns the amount of waste in material cutting for a jeans manufacturer due to five suppliers. Consider the code below to first remove two outliers from the dataset.
library(faraway)
data(denim)
denim <- denim[-which(denim$waste == max (denim$waste)),] #removing 2 outliers
denim <- denim[-which(denim$waste == max (denim$waste)),]
str(denim)
## ' data. frame ' : 93 obs . of 2 variables:
## $ waste : num 1.2 16.4 12.1 11.5 24 10.1 -6 9.7 10.2 -3.7 . . .
## $ supplier: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 . . .
## - attr(*, "na. action")= ' omit ' Named int [1:15] 70 75 80 85 90 95 98 99 100 103 . . .
## . . - attr(*, "names")= chr [1:15] "70" "75" "80" "85" . . .
a) Plot the data and comment.
b) Fit the linear fixed effects model. Is the supplier significant?
c) Analyze the data with supplier as a random effect. What are the estimated standard deviations of the effects?
d) Regarding the model fitted in c), test the significance of the supplier term. Compare with the results in b).
e) Compute confidence intervals for the random effects standard deviations. Compare with the results in d).