STAT 4620/5620 Data Analysis

STAT 4620/5620 WINTER 2024

Assignment 2: Due Thursday February 8 2024

1. (10 points) Your Research Question (RQ)  1, involved three variables, and was focused on the relationship between one response variable and two ex- planatory variables; at least one of the three variables had to be categorical. Prepare a graph using R that can be used to answer RQ1.

2. (4 points) Your RQ2 had a descriptive goal. Calculate a confidence interval to answer this question.

3. (6 points) Your RQ3 had a causative goal,  and involved an explanatory variable and a response variable, where the response variable had to be quantitative. Use a hypothesis test to answer RQ3.

4. (9 points) Describe the linear model estimators implemented in the lmRob function (in R). Contrast these with what is implemented by the lm function. Explain how to proceed with data analysis when you have good reason to believe that a linear model is reasonable for your data but that there may well be data recording errors. (250 words)

5. (4 points) Explain how the Akaike information criterion (AIC) is computed for a generalized linear model and how it is commonly utilized for model (or variable) selection purposes. (250 words)

6. (8 points) Let’s explore the negative binomial distribution.

(a) Write down its density function.

(b) Plot the density curves from a negative binomial distribution for a range of values of it’s two parameters using the dnbinom function in R.

(c) Suppose you elect to use the alternative parametrization (often used in ecology) where the mean is μ and the variance is μ + μ2 /k.  What distribution do you arrive at as the k parameter gets larger and larger? (150 words)

7. (3 points) Residual checking for GLMs is not always as straightforward as for linear models, and the problems are particularly acute in the case of binary responses. Explain why. (100 words)

8. (16 points) We are interested in a study concerning lung function in pa- tients with cycstic fibrosis (Altman (1991, p.338)).   Data are in the ISwR package and can be loaded into the workspace with data(cystfibr).

(a) Fit a model relating maximum expiratory pressure (pemax) to the ex- planatory variables contained in the dataset.

(b) Interpret results for the sex variable.

(c) Try using the step function and interpret results.

(d) Perform a complete examination of diagnostics.

(e) What can you reasonably conclude from your analysis?

GUIDELINES FOR SUBMISSION:

Submit the R markdown file (.RMD), the .csv file containing your dataset, AND the resulting knitted .PDF file to BrightSpace Assignments under Assignment 2.

发表评论

电子邮件地址不会被公开。 必填项已用*标注