Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Department of Mathematics and Statistics STAT3405/STAT4066
Important: This assignment is assessed. Your work for this assignment must be submitted by 9:00pm on Sunday, 3 November 2024.
The expectation for a submission are:
- The questions are answered in complete sentences. Marks will be awarded for the cor rectness of the answers and that they are given in complete sentences.
- The answers to the questions should be submitted via LMS.
- That numerical answers are rounded to an appropriate number of digits.
- Code used to answer the questions should be submitted as an attached R notebook (with extension .Rmd) in your LMS submission.
- the correctness of the answers and that they are given in complete sentences.
- the correctness of the code, i.e. how easy it is to read it1 ; and
- how easy it is to run your code, i.e. to turn your R notebook into a PDF file.
AI: Be reminded that the use of AI is not permitted for this assessment.
Plagiarism: You are encouraged to discuss assignments with other students and to solve problems together. However, the work that you submit must be your sole effort (i.e. not copied from anyone else). If you are found guilty of plagiarism you may be penalised. You are reminded of the University’s policy on ‘Academic Conduct’ and ‘Academic Misconduct’ (including plagiarism):
http://www.student.uwa.edu.au/learning/resources/ace/conduct
https://www.uwa.edu.au/students/study-success/studysmarter
Task 1. Here we revisit Task 1 from the second assignment.
Recall, the file Golf.csv, available from LMS, contains the number of attempts (m) and successes (y) of golf putts, by distance from the hole in feet (distance), for a sample of professional golfers.
After downloading this file to the directory in which your R notebook is, you should be able to read the file using the following command2 :
For this exercise we will model the observed yi as realisations of independent binomial dis tributed random variables Yi , i = 1, . . . , 19, where the success probability depends on the distance from the hole. We will denote this distance by xi below, but it is the variable distance in the data file.
In this exercise we will consider the following model for these data:
Here β0 and β1 are two regression parameters and we will refer to them jointly as β.
Hint: Consider how the odds change when the distance to the hole doubles.
(b) Implement the above model in your preferred probabilistic programming language. In the answer that you write into the submission window you should clearly state the priors that you put on β0 and β1. The code must be contained within the R notebook that you submit.
Comment in a sentence or two whether you think the model is adequate.
(e) Use the test quantity
• a replicate data set y rep from the posterior predictive distribution is drawn,• T(y, β) is evaluated; and• T(y rep , β) is evaluated.
The relevant code must be contained within the R notebook that you submit.
Based on this posterior predictive check, do you think the model is suitable for these data? Discuss in a sentence or two.
Task 2. The complete data set on the survey that was done on bicycle and other vehicular traffic in the neighbourhood of the campus of the University of California, Berkeley, is available on LMS in the file bicycles.csv.
Remember, these data are counts of bicycles and other vehicles in one hour in each of 10 city blocks in each of six categories. That is, sixty city blocks were selected at random; each block was observed for one hour, and the numbers of bicycles and other vehicles travelling along that block were recorded. The sampling was stratified into six types of city blocks: busy, fairly busy and residential streets (streets were classified before the data were gathered), and with and without bike routes. The data for two of the residential blocks were lost.
After downloading this file to the directory in which your R notebook is, you should be able to read the file using the following command3 :
The data frame dat should now contain the following variables:
(a) Define the following indicator variables:
write down R commands4 that calculate the vectors of observed x1, x2 and x3. Also
write down a command that determines mi , the total number of observed vehicles in each street.
(2) What are your Bayesian estimates for β0, β1, β2, β3, β4, β5 and σα?
(3) Looking at the signs of the estimates β1, β2, β3, β4 and β5, are they what you would have expected? Do these estimates make sense? Comment briefly.
Task 3. Here we revisit Task 1 from the second set of computer lab problems and Task 2 from the fourth set of computer lab problems.
Recall, the file pregnancies.csv, available from LMS, contains the information on women who got pregnant under planned pregnancies. The women were classified as smokers andnonsmokers and the cycle in which each woman fell pregnant was recorded. The data file contains the tabulated data.
After downloading this file to the directory in which your R notebook is, you should be able to read the file using the following command6 :
The aim of this task is to explore whether a beta-geometric model7 is appropriate for these data. To do so, and to handle the issue of censoring more easily, we will treat the data as following a multinomial distribution Mult13(π, N) where the vector with probabilities π is determined by a geometric model.
Then we will model y S and y NS as a realisation of a random vectors YS and YNS that follow multinomial distributions and are independent of each other. The full model specification is:
As each component of a multinomial random vector has marginally a binomial distribution, we might consider using a Pearson’s χ 2 style statistic to test whether our model is adequate. Specifically, consider the test quantity: