Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

BU.510.650 Data Analytics Assignment #2

All assignments should be submitted through our Canvas site. Please submit two files: one file (in format .pdf) includes your answers to all questions; the other file (in format.R) contains your R commands. File names should be LastName FirstName AssignmentNumber, e.g., for Assignment #1 your files should be named by “Thayaparan_Leann_1.docx” and “Thayaparan_Leann_1. R”.

1. Grade point average of 12 graduating MBA students, GPA, and their GMAT scores taken before entering the MBA program are given below. Use the GMAT scores as a predictor of GPA, and conduct a regression of GPA on GMAT scores.

x=GMAT	y=GPA
560 540 520 580 520 620 660 630 550 550 600 537	3.20 3.44 3.70 3.10 3.00 4.00 3.38 3.83 2.67 2.75 2.33 3.75

(a) Obtain and interpret the coefficient of determination R2 .

(b) Calculate the fitted value for the second person.

2. Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ, X3 = Gender (1 for Female and 0 for Male), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and Gender. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model, and get .

(a) Which answer is correct, and why?

i. For a fixed value of IQ and GPA, males earn more on average than females.

ii. For a fixed value of IQ and GPA, females earn more on average than males.

iii. For a fixed value of IQ and GPA, males earn more on average than females provided that the GPA is high enough.

iv. For a fixed value of IQ and GPA, females earn more on average than males provided that the GPA is high enough.

(b) Predict the salary of a female with IQ of 110 and a GPA of 4.0.

(c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect. Justify your answer.

3. This question involves the use of simple linear regression on the Auto data set (please download it from Canvas).

(a) Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results. Comment on the output. For example:

i. Is there a relationship between the predictor and the response?

ii. How strong is the relationship between the predictor and the response?

iii. Is the relationship between the predictor and the response positive or negative?

iv. What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?

(b) Plot the response and the predictor. Use the abline() function to display the least squares regression line.

(c) Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.

4. In this exercise you will create some simulated data and will fit simple linear regression models to it. Make sure to use command set.seed(1) prior to starting part (a) to ensure consistent results. (Hint: rnorm(n, mean = a, sd = b) generates n random variables with mean a, standard deviation b, e.g., rnorm(100, mean = 10, sd = 5) returns a vector with 100 values, each of which follows a normal distribution with mean 10 and standard deviation 5.)

(a) Using the rnorm() function, create a vector, x, containing 100 observations drawn from a N(0, 1) distribution. This represents a feature, X.

(b) Using the rnorm() function, create a vector, ϵ, containing 100 observations drawn from a N(0, 0.25) distribution, i.e., a normal distribution with mean zero and variance 0.25.

Y = −1 + 0.5X + ϵ (1)

What is the length of the vector y? What are the values of β0 and β1 in this linear Model?

(d) Create a scatterplot displaying the relationship between x and y. Comment on what you observe.

(e) Fit a least squares linear model to predict y using x. Comment on the model obtained. How do ? "0 and ?1 " compare to β0 and β1.

(f) Now fit a polynomial regression model that predicts y using x and x2 . Is there evidence that the quadratic term improves the model fit? Explain your answer.

(g) Repeat (a)-(f) after modifying the data generation process in such a way that there is less noise in the data. The model (1) should remain the same. You can do this by decreasing the variance of the normal distribution used to generate the error term ϵ in (b). Describe your results.

(h) Repeat (a)-(f) after modifying the data generation process in such a way that there is more noise in the data. The model (1) should remain the same. You can do this by increasing the variance of the normal distribution used to generate the error term ϵ in (b).

Describe your result.

(i) What are the confidence intervals for β0 and β1 based on the original data set, the noisier data set, and the less noisy data set? Comment on your results.

文章

BU.510.650 Assignment #2 Data Analytics

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

发表评论