STA442H1: Methods of Applied Statistics I

STA442H1: Methods of Applied Statistics I

Homework 3

Due March 8, 2024 4:59pm ET

Please submit your assignment  as a pdf through Crowdmark.Ideally, make an R markdown document and output a pdf that shows your work. Please include your code  (with comments!)  with your assignment. Please list any references you used with a full citation. A friendly reminder that AI should not be used to help with this assignment.  Feel free to discuss the homework questions with others, but your work and write-up should be your own.  Ask questions about the homework on Piazza, and/or in the TA and instructor office hours.

Download the  (astronomy!)data file M sigma.csv that is posted on Quercus in the homework section (this example data set was compiled by Hible et al 2017). Each row in the data set is an observation of a galaxy, and each galaxy is measured only once.  The columns are as follows:

logsimga → log σ/σ0 , where σ is the central velocity  dispersion  of stars in the center region of the galaxy and where σ0 = 200km/s,

errlogsigma → measurement error for above

logMbh → log M./M⊙ , where M.  is the mass of the galaxy’s central, supermassive black hole and M⊙ is one solar mass unit.

errlogMbh → measurement error for above

Type → categorical variable for the type of galaxy

1.  For  the  following  questions,  define  the  covariate  as  x  =  log σ/σ0   and  the  response  as  y  = log M./M⊙ . Ignore the measurement uncertainties in x and ignore the type of galaxy.

(a) Fit a linear regression of y on x using lm.  Make a plot that shows the fit and the 90% confidence region, and show the output from the summary statistics.   You may use the errors in y, but not required.   (3 marks)

(b)  Set up a linear regression using a Bayesian approach that will estimate the posterior dis-tribution for the intercept and slope parameters (β0 ,β1 ). Modify the Metropolis code we wrote in class on Feb. 14 to accept a vector of parameters.  Write a target function for the linear regression.  Define a Gaussian likelihood assuming that the standard deviations of the errors in y are known.  Define the prior distribution and justify your choice.  If you would like to define an informative prior, then see seminal papers on the M. − σ relation e.g., Ferrarese et al 2000, Gerbhardt et al 2000, among others.    (5 marks)

(c)  Run  the Bayesian analysis using your code from the previous question.  Ensure that you sample at least neff  = 500.  Report the summary statistics for (β0 ,β1 ), and compare to the fit using lm. Discuss whether you think the chains have converged to the target distribution using diagnostics and traceplots.   (6 marks)

(d)  Plot  the  inferred  linear relationship from your Bayesian analysis using the mean of the posterior samples.  Add to this figure 100 lines (use a semi-transparent colour) from the posterior samples, to give some indication of the uncertainty in the fit.   (2 marks)

(e)  Plot  again the mean linear relationship along with the inferred 90% credible region.  The credible region for a function (e.g., a line) is often defined pointwise, that is, for any x, what is the credible region for the predicted y.   (3 marks)

发表评论

电子邮件地址不会被公开。 必填项已用*标注