Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
STA 108 Spring 2025
Homework 1 - Due Friday, April 11th
“Hand written” Homework
These problems may be completed without the use of R
1. Assume you are working with the normal linear regres-sion model,
Yi = β0 + β1Xi + ϵi
Assume that ϵi ∼ N(0,
). Further, assume the popu-lation parameters are known, and β0 = 150, β1 = 5, and
= 225.
(a) Find E{Y } at the value X = 10.
(b) Find
at the value X = 10.
(c) Find the probability that a value for Y at X = 20 falls between 223 and 235.
(d) What is the probability we observe a value at least 2 standard deviations above the average (at any fixed level of Xi)?
2. The estimated regression line was calculated for 15 firms in the food industry, where their percentage increases in earnings per share was measured (Y ), and the firms age (X). The estimated regression is is:
ˆY = −17 + 0.617X
Where SSE = 412.602, and the minimum X value is 38.2 years, while the max is 51.6.
Data Source: “Statistical Methods and Data Analysis”, 7th edition, Cengage Learning, Ott & Longnecker
(a) Interpret the slope in terms of the problem (referenc-ing the units of the problem).
(b) Interpret the intercept in terms of the problem (if appropriate).
(c) A new firm aged 45 approaches you and asks for an estimate of their percentage increases in earnings per share. What is your estimate?
(d) If the firm from (c) had an actual percentage increase of 11.786, calculate the error associated with your prediction in (c). Did you over or underestimate their percentage increase in earnings per share?
(e) Another new firm aged 10 years approaches you for an estimate. Should you use the given model to give an estimate? Why or why not (explain).
(f) What is your estimate for the standard deviation of the errors? Interpret it in terms of the problem.
3. A small college selected 120 students at random from the new freshman class and measured their GPA at the end of their freshman year (Y), and their ACT score (X). The estimated regression line was:
ˆY = 2.114 + 0.03883X
Where SSE = 45.818. Data Source: “Applied Linear Statistical Models”, Kutner, Nachtsheim, Neter, & Li.
(a) Calculate se, the estimated standard deviation of the errors.
(b) Interpret the slope in terms of the problem.
(c) For a student who had an ACT score of 26 and a GPA of 3.084, calculate your estimated error.
(d) How many estimated standard deviations of the er-rors was your prediction in (c) away from the actual data point yi? Would you consider this error large, or small? Explain.
(e) A researcher wants to use this model on students at a large university. Would you recommend this? Why or why not (explain).
4. Answer the following questions with TRUE or FALSE. It is good practice to explain your answers.
(a) For a particular dataset, if we add data where the error from the regression line is zero, equation of the line will not change.
(b) If the sample estimate of β1 is zero, this suggests there is a strong linear relationship between X and Y.
(c) A calculated error ei has the units corresponding to X.
(d) If the sample data lies exactly on a straight line, then the value of SSE will be 0.
5. Show the following properties (Note: A review of prop-erties of the summation has been added online, under “Files”):
(a)
ei = 0
(b) If you plug ¯X (the sample mean) into the estimated regression line, the resulting estimate for Y will be ¯Y (the sample mean).
(c) Using the equation ei = Yi − ˆYi , find E{ei}.
(d) Show that
Yi =
ˆYi
R Homework
These problems must be completed using R. Attach an appendix of your commands that you used to get your results.
I. Online you will find the dataset Lung.csv, which has two columns:
Column 1: age: The childs age (X) in years.
Column 2: FEV: A measure of lung capacity (Forced Ex-halation Volume) in liters (Y ).
We believe that FEV has a linear relationship with a childs’ age.
Data Source: Kahn, Michael (2005). “An Exhalent Problem for Teaching Statistics”, The Journal of Sta-tistical Education, 13(2)
(a) Plot a scatterplot, as well as the estimated regression line.
(b) What is a notable feature of the plot from (a) based on the units of X?
(c) Does it appear that the variance of the data remains constant? Explain.
(d) Write down the estimated regression line.
(e) Find the value of se.
II. Continue with the Lungs dataset.
(a) What is the expected average difference in FEV for a child aged 10 and a child aged 15?
(b) Estimate the FEV for a child aged 8.
(c) Find the rows of the dataset which result in the high-est absolute errors (the top five).
(d) Someone states that according to regression, the av-erage error of many new predictions should be zero. Do you agree or disagree? Explain.
III. Online you will find the dataset fitness.csv, which has two columns:
Column 1: Tread: The typical amount of time training at high intensity on the treadmill (X).
Column 2: Run: The time it took to complete a 10 kilo-meter run (in minutes) (Y ).
The goal is to see if an athletes treadmill time has a linear relationship with their 10 kilometer run time. Data Source: “Statistical Methods and Data Analysis”, 7th edition, Cengage Learning, Ott & Longnecker
(a) Plot a scatter plot, as well as the estimated regres-sion line.
(b) Write down the equation for the estimated regres-sion line.
(c) Interpret the slope in terms of the problem.
(d) Based on your plot from (a), would it make sense to interpret the intercept in this case? Explain.
(e) Find the value of SSE.
IV. Continue with the Fitness dataset.
(a) If a subjects’ high intensity treadmill run time decreased by two minutes, what is the expected amount of time their 10 kilometer run would increase by?
(b) Find the value of
(c) Find the estimated error associated with the first row of your dataset.
(d) Estimate the average value of Y, when X = 8.5.