Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
STA220H1 The Practice of Statistics I (Fall 2024)
Assignment 2 Instructions
Due Date: Nov. 22 at 11:59pm on Crowdmark
Instructions
Submission Format and Instructions
Your PDF file will need to show (1) R code, (2) R output/figures, and (3) your written answers.
Here are some suggested ways you can create your final submission:
- Use Microsoft Word to type out your answers. Screenshot your R output and place these images throughout the document. For the R code, either copy/paste as text or screenshot.
- Use an app like Notability, OneNote, etc., where you can write/type your answers and include screenshots of your R code and output.
- Use RMarkdown and knit to a PDF. Alternatively, you can knit to an HTML file and then save it as a PDF.
How you create the final file is up to you, as long as it is clear and organized. You don’t want the TA to be frustrated while marking your work!
Use of Built-In Functions in R
Late Penalty
As described on the course syllabus, late work will be deducted 20% per day.
Data for this Assignment
The following variables are provided in the data:
In this assignment, you may wish to convert the format of the ‘price’ variable into numeric first, so that it can be proceeded for further analysis. For example, the following code create a new variable ‘price_numeric’ that deletes the dollar sign and comma from the ‘price’ variable and converts it into numeric format.
The calculations of the following questions are based on the dataset after removing missing values (i.e. ‘Airbnb_data_cleaned’ dataset).
Question 1 (10 marks)
We are interested in the proportion of listings that are instant bookable. Answer the following questions using the “Airbnb_data_cleaned” dataset. For calculations that you complete in R, show your code and output. (Please keep 3 decimal places for this question)
Question 2 (10 marks)
State the hypotheses, calculate the test statistic and p-value using R, and state the relevant conclusion. (4 marks)
c) Construct a 90% confidence interval for the average price of the listing per night and interpret the interval. Interpret the interval. Show your intermediate calculation results including the standard error and the critical values using R. (4 marks)
Question 3 (30 marks)
In this question you are going to write-up a short analysis based on the Airbnb dataset. The analysis should target a statistical question that you raise from the dataset.
• You are encouraged to use headings to organize your work.
Question 3 Rubric
|
|
Inadequate |
Fair |
Good |
Excellent |
|
Writing Quality (10 marks)
|
0-4 marks Some written components are not included. Writing is unclear. |
5-6 marks Most written components are provided. Written components contain major issues. The descriptions do not accurately describe the methods. Writing is somewhat unclear. |
7-8 marks
All the written components are provided and shows that student is able to properly communicate statistical concepts. Writing isgenerally clear.
|
9-10 marks All the written components are provided. Student exceeds expectations in statistical communication. Writing is clear and compelling. |
|
Plots (5 marks) |
0-2 marks Does not meet the requirement of 1+ plots. |
3 marks Required plots are provided, but plots do not highlight the important information related to parameters of interest |
4 marks Required plots are provided, and mostly shows that the student is able to create a plot relevant for the situation. Plots are labelled properly. |
5marks
Required plots are provided, and a lot of thought was put into creating the plot. Plots are interesting, compelling, and communicate well to the viewer. |
|
Hypothesis Tests and Confidence Intervals (5 marks)
|
0-2 marks Very few of the required hypothesis tests and confidence intervals are provided. Contains major errors. |
3 marks Some of the required hypothesis tests and confidence intervals are provided. Errors with the set-up, calculations, and/or interpretations. |
4 marks The required hypothesis tests and confidence intervals are provided. Interpretations are provided and correct. |
5 marks The required hypothesis tests and confidence intervals are provided. Interpretations are provided. Conclusions are well written and provide an interesting discussion to the analysis. |
|
Appendix, R code (5 marks)
|
0-2 marks R code is not shown or has many major errors. |
3 marks R code is somewhat provided but is difficult to follow. |
4 marks R code is provided but contains errors or is hard to follow. |
5 marks R code is provided. Appropriate functions and/or calculations are used. Useful comments are used to make them easy to read. |
|
Formatting and Organization (5 marks) |
0-2 marks Poorly organized and difficult to follow. |
3 marks Sometimes difficult to follow. Code may appear in body of the text. |
4 marks
Organized and formatted well. Code does not appear in the body of the text. |
5 marks Very well organized and presentable. Code does not appear in the body of the text. Proper headings are used. |