Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
BU.510.650 Data Analytics Assignment #2
1. Grade point average of 12 graduating MBA students, GPA, and their GMAT scores taken before entering the MBA program are given below. Use the GMAT scores as a predictor of GPA, and conduct a regression of GPA on GMAT scores.
x=GMAT |
y=GPA |
560
540
520
580
520
620
660
630
550
550
600
537
|
3.20
3.44
3.70
3.10
3.00
4.00
3.38
3.83
2.67
2.75
2.33
3.75
|
2. Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ, X3 = Gender (1 for Female and 0 for Male), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and Gender. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model, and get .
i. For a fixed value of IQ and GPA, males earn more on average than females.
ii. For a fixed value of IQ and GPA, females earn more on average than males.
iii. For a fixed value of IQ and GPA, males earn more on average than females provided that the GPA is high enough.
iv. For a fixed value of IQ and GPA, females earn more on average than males provided that the GPA is high enough.
(c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect. Justify your answer.
3. This question involves the use of simple linear regression on the Auto data set (please download it from Canvas).
i. Is there a relationship between the predictor and the response?
ii. How strong is the relationship between the predictor and the response?
iii. Is the relationship between the predictor and the response positive or negative?
iv. What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?
4. In this exercise you will create some simulated data and will fit simple linear regression models to it. Make sure to use command set.seed(1) prior to starting part (a) to ensure consistent results. (Hint: rnorm(n, mean = a, sd = b) generates n random variables with mean a, standard deviation b, e.g., rnorm(100, mean = 10, sd = 5) returns a vector with 100 values, each of which follows a normal distribution with mean 10 and standard deviation 5.)