Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
STATS 101/108 - Chapter 5 Estimation|Whakatau tata: Task
Introduction
The internet changed the way most people get their news. Before the internet was a thing, many people had a newspaper subscription and regularly read articles in the sections they were interested in. Now, media companies compete for the attention of potential readers by carefully crafting headlines. Watch this short video to learn some of the basics of successful headlines.
In this task, we will explore how to communicate uncertainty when using estimates based on random samples of headlines.
Q1
For this question, you need to write a research question about a population proportion.
The New Zealand Herald (often abbreviated as “NZ Herald”) is the largest newspaper in this country and its website, nzherald.co.nz, is one of the most popular news websites in the country. In this task, you will investigate features of headlines on the NZ Herald website.
Later in this investigation, you will use an app to get a random sample of headlines from the NZ Herald website. You will be able to select headlines based on the section (Business, Entertainment, Lifestyle, NZ, Sport, World) and year (2017, 2018, 2019, 2020, 2021, 2022) they were published.
On nzherald.co.nz, click on the Menu button and have a look at headlines from the different sections outlined above.
Select one of the sections for your investigation. Write it down. Also write down the year you will take the sample from.
Scroll down the page of the section and have a look at the headlines. You need to identify a text feature that some of the headlines in this section have and some don’t have. This could be the use of specific words, punctuation, etc. Write down this feature. Note: The feature needs to be a categorical variable with two levels. Something like “the number of words” will not work.
Explain how the headlines are different based on this feature. Make sure that your explanation includes the two different levels of this feature (e.g. “with/include …”, “without/does not include …”).
In this investigation, you will use a random sample of headlines from a selected section and year in the Herald to make an inference of all the headlines from a selected section and year in the Herald.
A possible research question for the investigation is: “For all headlines from the [your chosen section] section of the New Zealand Herald in [your chosen year], what proportion have … [the feature you identified] in their headline?”. Write down your research question.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The section and year you will investigate:
The sentence about the feature that some headlines have and others do not have:
The research question:
Q2
For this question, you need to create a sample data set of headlines from the New Zealand Herald website.
Copy the feature, the section and the year you have chosen for your investigation in Q1.
Head to the Questionable headlines app and select this section and year. You will get a random sample of 40 headlines. Make a screenshot (snip) of the first few rows (at least 3) of your data and copy them into your answers.
This part of the investigation requires some work with Google Sheets. You can find explainer slides for the different skills needed in the support units of the course book.
In your Google Drive, create a new sheet and copy in your data from the Questionable headlines app (use the Copy button). Make sure that you copy the data into cell A1.
You need to click the cell A1 once. If you double-click it before pasting in the data, the Google sheet will not work properly.
Add a new variable called headline_feature to your sheet that contains the feature you have identified in Q1. Fill in the levels of this variable for all headlines in the sample. Note: While the app adds additional variables to each headline ( sentiment_score , num_words and num_chars ), you need to use text features based on the variable headline .
When you have allocated a level of headline_feature for all headlines, publish the sheet as a CSV document and copy the link into your answers. Note: This will only work in Google Sheets, not in other apps, such as Excel.
Import the data into iNZight Lite and create a plot for the variable you created, headline_feature .
Write a sentence stating the sample proportion for the level of headline_feature that you are focusing on for your research question. The information is in the Summary tab of iNZight Lite.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The feature, section and year:
The screenshot of the first few rows of the sample:
The link to the published data:
The plot:
The sentence stating the sample proportion:
Q3
For this question, you need to construct and interpret a bootstrap confidence interval.
Use the Bootstrap module of the VIT tab in iNZight Light to construct a confidence interval using the variable headline_feature . Make sure you choose the category to focus on that matches your research question. There are step-by-step instructions in the Making inferences with iNZight Lite support unit of the course book.
Paste the screenshot of all three plots (Data, Re-Sample and Bootstrap Distribution) into your answers. Make sure the plots include the confidence interval.
State your research question from Q1 again.
Answer your research question by interpreting the confidence interval in one sentence.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The screenshot of the three plots including the confidence interval:
The research question from Q1:
The interpretation of the confidence interval:
Q4
For this question, you need to evaluate whether a majority claim can be made.
Write down your confidence interval from Q3 in the form (a%, b%).
In one sentence evaluate the claim that the majority of all headlines in the [your section] section of the NZ Herald website in [your year] have [the feature identified in your research question], by determining if the claim is, or is not, supported by the confidence interval.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The confidence interval:
The sentence evaluation of a majority claim:
Q5
For this question, you need to explain the need for random sampling.
In Q1 you clicked on different sections on the NZ Herald website to get a first impression of what features headlines in a section might have. Explain why it was important to use the Questionable headlines app to generate a random sample of headlines rather than just picking headlines of one section from the first few headlines of the section on the website. Write two to three sentences.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
Your explanation:
Q6
For this question, you need to reflect on the learning focus for this chapter (Estimation).
Describe in your own words ONE important idea from this topic. Do not just copy one of the learning objectives or something from the notes or other learning resources. One sentence is enough, but you must write about your own personal reflection.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.