STATS 220 Data Technologies

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

STATS 220 Data Technologies

SEMESTER ONE, 2023

Anna obtained data about the Panopto lecture recordings for STATS 220 during Semester One

2023. She used R code to produce the output below with the data frame panopto_data.

Use the output above to answer the following questions:

The function used to produce the output above was

(view(), tibble(),

output(), print(), glimpse()).

How many of the variables in the data frame panopto_datare numeric?

Maximum marks: 2

2 Describe the goal of the code below and how the functions used achieve this goal.

Maximum marks: 2

Anna was interested in identifying the top ten recordings , in terms of the total minutes delivered. She used the data frame panopto_data to create a new data frame called total_minutes_delivered, which is shown below.

The code below provides the code that produces the data frame shown above, but some parts of the code have been replaced with numbers e.g. { 1}

Use the boxes below to select or enter the missing function, operator, argument name or value.

{1}

(panopto_data, total_minutes_delivered, total_minutes , session_name, data)

{2}

(<-, %>%, -> , + , ==)

{3}

(total_minutes , panopto_data, session_name, data,

total_minutes_delivered)

{4}

(count, select, group_by , summarise, filter)

{5}

(session_name, timestamp, minutes_delivered, student_num ,

lecture_recording)

{6}

(summarise, filter, count, slice, mutate)

{7}

(median, count, mean, max , sum)

{8}

(session_name, minutes_delivered, student_num , timestamp,

total_minutes_delivered)

{9}

(filter, sort, reorder, arrange, order)

{10}

(min, desc , sort, max , asc)

{11}

(mutate, select, slice, filter, rename)

{12}

Maximum marks: 6

The data frame panopto_data was manipulated to create a new data frame called students_per_recording, which is shown below.

The code used to create students_per_recording is shown below.

The data frame students_per_recording was then used to create the visualisation below.

The code below provides the code used to create the visualisation above, but some parts of the code have been replaced with numbers .

Use the boxes below to select or enter the missing function, operator, argument name or value.

{1}

(<-, + , %>%, -> , ==)

{2}

(tidyverse, facet_wrap, plot, ggplot, geom_bar)

{3}

(students_per_recording, panopto_data, num_students , session_name,

lecture_num)

{4}

(panopto_data, session_name, num_students , students_per_recording,

lecture_num)

{5}

(point, line, bar, boxplot, col)

{6}

(shade, size, colour, fill, length)

{7}

(mapping, aes , ggplot, filter, layer)

{8}

(ggtitle, main, heading, title, caption)

{9}

{10}

{11}

(guides , sidebox , fill, scales , legends )

{12}

(colour, length, fill, shade, size)

Maximum marks: 6

5 Making specific references to the code provided, explain how the names of the lectures were able to displayed in order from first (top bar) to last (bottom bar) in the visualisation from 1.4.

Maximum marks: 2

6 Using specific examples from Project 4, discuss TWO data-related responsibilities or issues you needed to consider in Project 4, due to the nature of the data used in.

Maximum marks: 2

7 Data was sourced from the iTunes API for tracks using the search term "number".

R code and functions from {jsonlite} and {dplyr} were then used to create a new data frame track_data.

The first 15 rows of track_data are shown below.

The code below provides the code used to create track_data but some parts of the code have been replaced with numbers .

Use the boxes below to enter the missing function, operator, argument name or value.

Maximum marks: 6

Which function from {lubridate} can be used to convert the variable releaseDate to dttm?

(ymd(), m dy_hms(), dmy_hms(), ymd_hms(), hms())

Which function can be used to find the number of values in track_data$wrapperType?

Maximum marks: 2

9 Describe two different questions you could answer that require manipulating the the data frame track_data.

Maximum marks: 2

10 Briefly describe the data manipulations required to answer each question described in 2.3. Be careful to clearly explain which of your two questions the data manipulations relate to.

Maximum marks: 2

11 Anna has the two tables of data sourced from CANVAS and Ed Discussion: tbl_students and tbl_ed.

The diagram below shows the structure and just a couple of rows from each table (there are hundreds more rows in both tables).

Briefly discuss if the column/variable student_email can be considered the primary key for both tbl_students and tbl_ed.

Maximum marks: 2

12 Anna used the SQL code shown below.

Briefly describe the output of this code and what {tidyverse} functions could produce a similar output.

Maximum marks: 2

13 Briefly describe one thing Anna could learn about STATS 220 students by using an inner join for tbl_students and tbl_ed. Be clear in your answer why an inner join is helpful/necessary.

Maximum marks: 2

14 Briefly describe one thing Anna could learn about STATS 220 students by using a left join for tbl_students and tbl_ed. Be clear in your answer why a left join is helpful/necessary.

Maximum marks: 2

15 The University of Auckland shares profiles about staff employed on a public website. Below is a   screenshot of some of the profiles that are displayed when the word "statistics" is used to search for profiles .

Before attempting to scrape data about profiles , a STATS 220 student checked the following file.

In no more than two short sentences , and with specific reference to the screenshots shown  above, discuss whether it is appropriate or responsible to scrape personal information about University of Auckland staff from their public profile pages .

Maximum marks: 2

16 Suppose Anna wanted to use functions from the package {rvest} to obtain the profile pictures of all staff from the Department of Statistics .

Which HTML element is used to display the profile picture?

(img, div , h1, p,

a)

What HTML attribute contains the link/URL to the profile picture?

(url, link ,

href, pic , src)

Maximum marks: 2

17 The University of Auckland makes digital course outlines available on the DCO website: https://courseoutline.auckland.ac.nz/dco.

A search for all courses offered by the Faculty of Science during Semester One 2023 returned

354 matches over 15 different pages , with up to 25 courses displayed on each page.

The URLs for the first three pages are shown below:

The relevant HTML for each course is shown below:

Anna used functions from the packages {tidyverse} and {rvest} and examples demonstrated in the STATS 220 lectures and labs to scrape the course prescriptions (descriptions) for all 354    courses , to create the data frame course_data.

The code below provides the code used to create course_data but some parts of the code have been replaced with numbers .

Use the boxes below to enter the missing function, operator, argument name or value.

Maximum marks: 10

18 Identify TWO additions you would make to the code for scraping data about courses from the DCO website. Briefly discuss why you would make each code addition.

Maximum marks: 2

19 Anna wants to use the course prescriptions (descriptions) to create "fake" course prescriptions for Science courses .

Briefly discuss how both a for() loop and a while() loop could be used as part of this process . Do not write any code for your answer.

Maximum marks: 2

20 Briefly discuss ONE issue you faced with completing Project 5 and describe how you resolved this issue. Be specific about what the issue was and what skills or understanding had to be applied to resolve the issue.

Maximum marks: 2


发表评论

电子邮件地址不会被公开。 必填项已用*标注