STATS 2DA3 Fall 2024 ASSIGNMENT 1

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

STATS 2DA3 Fall 2024
ASSIGNMENT 1

Submit through Crowdmark.
Due before 5pm on Tuesday, September 24th.

Assignments submitted up to 24 hours late will incur a 30% penalty.
Assignments submitted more than 24 hours late will receive a zero grade.
Answer all questions (read the “Assignment Standards” at the end of the assignment).
Not all questions carry equal marks.
All graphs must be labelled (including axes).

Plagarism detecting software, for detecting similarites to other students work, as well as to online sources, may be used.

Assignments should be done individually, i.e. do not collaborate with other students on your assignment.

1. (10 MARKS) Using the iris dataset which is available in R, answer the following questions:

  • Use one or two lines of R code to display how many rows and columns are in the dataset. (i.e. do not just output all observations in the dataset. Write some code that will output the required information).
  • Which variables are categorical and which are continuous?
  • Graph 1: Using the ggplot function, make a scatterplot of “Sepal.Length” against “Petal.Length” (putting “Sepal.Length” on the x-axis).
    • Make the data points blue.
    • Label the x-axis Sepal Length.
    • Label the y-axis Petal Length.
    • Label the graph Iris Data.
  • Graph 2: Use ggplot to make a bar chart (geom bar) displaying “Species”. “fill” using “Species” (i.e. each species of iris should be a different colour on the graph).
  • Display graphs 1 and 2 in one image using R code (i.e. do not just screen grab the 2 images and combine them).
2. (3 MARKS) Consider the plot below; it displays information on Vehicle Type and on the associated drive train. There are 3 different types of drive train : 

4 = four wheel drive, f = front wheel drive, r = rear wheel drive.

  • Which Vehicle Type has the least observations associated with it in the dataset?
  • For “suv” vehicles, what is the majority drive train type?
  • For “compact” vehicles, which of the 3 drive train types occurs least often?

3. (7 MARKS)

For the Arthritis dataset in the vcd package [there are 3 different levels of improvement (None, Some or Marked) that a patient can experience after receiving 1 of 2 medical treat ments (Placebo or Treated)], perform the following tasks:

  • Create a Double Decker plot, displaying “Improved” as a function of “Treatment” and “Sex”. (“Treatment” should be on the lowest x-axis.) Colour the “Improved” variable so that each level is a different colour. 
  • For female patients in the Treated group, what was the most reported level of improvement?
  • For male patients in the Treated group, what was the least reported level of improvement?
  • Using ggplot make a bar chart (geom bar) displaying “Treatment”. Colour (“fill”) the “Treatment” variable with respect to the “Improved” variable.

Assignment Standards

  • Answer each question. Do not just provide code. Any graphs must be rendered and reproduced in the report.
  • LATEX is strongly recommended but not strictly required. The use of Markdown in R studio is also recommended.
  • Submit your assignment as one .pdf document. All R code should be included and organized either at the end of the assignment or inline (if using R Markdown).
  • Approximately eleven-point font (times or similar) must be used with around 1.5 line spacing and margins of at least 1 inch all around.
  • Do not include a title page. The title and your name should be printed at the top of the first page.
  • Various tools, including publicly available internet tools, may be used by the instructor to check the originality of submitted work.
  • Students are not permitted to use generative AI in this course. In alignment with McMaster academic integrity policy, it “shall be an offence knowingly to . . . submit academic work for assessment that was purchased or acquired from another source”. This includes work created by generative AI tools. Also stated in the policy is the fol lowing, “Contract Cheating is the act of “outsourcing of student work to third parties” (Lancaster & Clarke, 2016, p. 639) with or without payment.” Using Generative AI tools is a form of contract cheating. Charges of academic dishonesty will be brought forward to the Office of Academic Integrity.

发表评论

电子邮件地址不会被公开。 必填项已用*标注