MTH 542/642 Homework # 5

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

MTH 542/642 Fall 2021
Homework # 5

Problem 1: Problem 7.8, page 182. Follow the instructions below:

7.8.1:  Draw both of the plots indicated next to each other using the command par(mfrow=c(1,2)) (also see handout on scatterplots).

7.8.2.  There is a type-o. Here is the correction:  Var(Weight|Age) = SD2/n. You need to use wls (see handout on chapter 5) .Remember that Var(Y|X) depends on X and Var(Y | X = xi ) = σ2 / wi and wi are called weights. In R   you need to specify “weights”. For your problem you can setσ2 =1, so now you can easily figure out  “weights”. “Fit” the model refers to writing the fitted model using the coefficient estimates reported by R.

7.8.3. It refers to the fitted value of a new coin, so at age = 0, which is the estimated intercept.    Calculate a 95% confidence interval for the intercept in the model above and look where the “standard weight” falls. Use the R command confint.
7.8.4.  We want to estimate P(Wnew< 7.9379|Age) , where Wnew denotes the weight of an unsampled coin.
We need the conditional mean ofthe weight and the standard deviation. Use the fitted model to estimate each of these. The estimated conditional mean weight is given by the predicted value of weight given each age value; obtain the standard error of prediction as indicated in the hint ofthe problem.
To obtain the predicted values and the standard error for fitted values using a model m1 use the R command predict(m1, se.fit=TRUE). Then calculate the corresponding z-values and finally the desired probabilities using the R command pnorm.
7.8.5 – skip this part. Instead, conduct a test for lack of fit ofthe model in 7.8.2.

That is, H0: the model fits,  Ha: the model does not fit

Follow the handout on Chapter 7 WLS where the test for lack of fit is explained for the Forbes data.

Remember that RSS needed in this test for your model is the weighted residual sum of squares. (see page 157).

Problem 2: Problem 7.10, page 183.

Problem 3: The following problem uses the data on National Football League 1976 Team Performance: y = Games won (per 14-game season)

x1 = Rushing yards (season)

x2 = Passing yards

x3 = Punting average(yards/punt)

x4 = Field gold percentage (FGs made/FGs attempted)

x5 = Turnover differential (turnover acquired – turnover lost) x6 = Penalty yards (season)

x7 = Percent rushing (rushing plays/total plays)

x8 = Opponents’ rushing yards (season)

x9 = Opponents’ passing yards (season)

You will find the data following the commands below:

install.packages("MPV", dependencies=TRUE)

library(MPV)

attach(table.b1)

1. Obtain a scatterplot matrix and also the matrix of sample correlations for these data (You may use the R commands below). Then comment on the results obtained.

pairs(table.b1,gap=0.4,cex.labels=1.5)
cor(table.b1)

2. a)  Fit a multiple linear regression model relating the number of games won to all the 9 regressors given in the data above. Then fit a multiple linear regression model relating the number of games won to the  team’s passing yardage x2, the percentage of rushing plays x7 and the opponents’ yards rushing x8 . Obtain the R summary for each ofthese two models.

R hint: to obtain the model with all 9 predictors:    m1_9<-lm(y~.,data=table.b1) Also you may use the notation m278<-lm(y~x2+x7+x8)

b)  Discuss and compare the significance ofthe regressors in the two fitted models. What essential differences do you see? How can you explain these?

c)  What percent of the total variability in y is explained by the regression on all nine predictors? What percent of the total variability in y is explained by the regression on x2, x7, and x8?

d) For the multiple linear regression model ofy on x2, x7 and x8  (m278), show numerically that the square of the correlation coefficient between the observed values ofy and the fitted values, equals the coefficient of determination ofthe model.

3. Referring to the model m278, find:

a) A 95% confidence interval for the coefficient ofx7. Then explain its meaning.

b) A 95% confidence interval for the mean number ofgames won by a team when x2=2120, x7=58,  x8 = 2110. Explain your results to someone who is not familiar with statistical language.

c) A 95% prediction interval for the number ofgames won by a team when  x2=2120, x7=58, x8= 2110. Explain your results to someone who is not familiar with statistical language.

If you were to repeat part b) for x2=2300, x7=56, x8=2100, for which set of values would you obtain a narrower interval? Explain your answer. (you may need to perform additional computations to answer this  question).  Then check your answer by finding out this second interval.

d) Obtain a normal probability plot of the residuals. What model assumption is this plot addressing? What conclusion can you draw based on the plot?

R command:

qqnorm(rstandard(m278))

e) Obtain residual plots (residuals versus predicted response, residuals versus each ofthe regressors). What can you conclude based on these plots? What model assumptions are addressed by these plots?

f) Use various graphs to check the mean function assumption for model m278.  Use lowess. You may look over the handout on Chapter 9, on Leverage, page 9, for guidance. Explain every step ofyour work.

Comment on the plots obtained.

g) Are there any influential cases? Use the corresponding R command  influence.measures to answer this question. Explain briefly for each point detected by R why it is influential and what it means.

4. Referring back to the whole data:

a) Use the backward elimination method (as presented in the notes for chapter 10) to select a subset regression model.

b) Use the backward elimination algorithm to select a subset regression model. Use both criterion-based procedures AIC and BIC.  Do they yield the same result?

R command for AIC procedure: step(m1_9,k=2)

R command for BIC procedure: step(m1_9, k=log(28))

Ifthese procedures do not yield the same result, how do you decide which model to use?  Find a graphical way (an added variable plot may help). Also compare the summaries ofthe fitted models.

发表评论

电子邮件地址不会被公开。 必填项已用*标注