MAST10010: Data Analysis Assignment 2

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

MAST10010: Data Analysis

Assignment 2

Due Date: Friday September 20th, 11.59pm.

❼ Your assignment must be submitted to Gradescope by 11.59pm Friday 20th September.

❼ Assignments submitted late will incur a penalty of 5% per hour (or part thereof).

If you need an extension then you must apply through the link on the LMS.

❼ Tutors may not help you directly with assignment questions. They may, however, provide some appropriate guidance.

❼ Please ask on the discussion board if you need clarification on the wording of questions. Do not include partial answers on the discussion board.

❼ It is recommended to produce a single Word document which includes all the relevant graphs, statistics and comments. You will then need to Export as a PDF to upload to Gradescope. If you need to include formulas or calculations, you may include photos of handwritten notes (or use equation editor, or any other method).

This assignment consists of four (4) questions worth a total of 39 marks. It contributes 5% towards your final grade.

Instructions

Software:

You must use Minitab to produce any graphs, tables and descriptive statis-tics.

Graphs:

❼ must include your name/student number, which can be added by Edit-ing the graph, right-clicking and selecting Add → Footnote or Add → Subtitle.

❼ must be relevant. You may look at many graphs, but you should only include the most relevant graph for each question.

❼ should be clear: ensure that labels and titles are correct and appro-priate; you can add gridlines/change symbols/colour as appropriate to make the graph clearer. There are some marks awarded for improving upon the default from Minitab.

❼ Mac Users: you will need to use myUniApps in order to edit the graphs as required above.

Statistics:

Must be relevant: you will be penalised for including statistics which are not relevant to the questions asked.

Comments:

❼ must be in the context of the data.

❼ should be supported by relevant statistics where possible.

❼ should be concise and informative. Word limits, where given, must be strictly adhered to (all word limits are a maximum, you will be penalised for going over this limit!). You may use dot-points.

Question 1: Paired/unpaired study designs [3 + 3 + 3 = 9 marks]

For each of the following proposed studies, you should EITHER:

❼ Explain how a paired design could be used to study this question (you need to include sufficient detail about the pairing); OR

❼ Explain why this cannot be studied with a paired design, and should have an unpaired (independent samples) design instead (you should include sufficient detail about the reasoning why this is not possible).

Your answer for each of (a), (b) and (c) must be less than 60 words (there is no minimum).

(a). An environmental agency is planning on monitoring water quality in an area where a new mine has been approved. They plan to take sam-ples in 30 locations prior to the construction of the mine. Once the mine is operational, they will again take samples in 30 locations. They will measure the levels of pollutants, with a particular focus on heavy metals (eg lead, mercury).

(b). An Australian researcher is interested in measuring the additional cog-nitive load when studying in a different language. The researcher has colleagues in Germany, China, Japan (at institutions teaching in Ger-man, Mandarin and Japanese, respectively) who will carry out the same study on their own university students. In each case, they will compare students who speak the teaching language natively with those who speak it as an additional language. The response measured will be time spent studying in a week.

(c). A large hospital is investigating the potential benefit of a new post-surgery regime, for a range of common surgeries. The proposal is to increase the temperature (to 25◦C, by additional heating to the bed and room), in an effort to promote faster recovery after surgery. The surgery itself will not change, and this will be compared to standard recovery protocols (which includes temperature being maintained at 20◦C). 60% of surgeries performed at this hospital are for three com-mon conditions.

Question 2: Tetrahydrocannabinol (THC) and Multiple Scle-rosis [1 + 2 + 4 + 3 + 3 = 13 marks]

This question is based on simulated data for the study by Guido van Ameron-gen, Kawita Kanhai, Anne Catrien Baakman, Jules Heuberger, Erica Klaassen, Tim L. Beumer, Rob LM Strijers et al. (2018) ‘Effects on Spasticity and Neuropathic Pain of an Oral Formulation of ∆9-tetrahydrocannabinol in Patients With Progressive Multiple Sclerosis’ Clin Ther., Sep;40(9):1467–1482. You can find this article at:

https://www.clinicaltherapeutics.com/article/S0149-2918(17)30054-1/fulltext, also linked on the LMS.

You DO NOT need information from this article to answer the questions; it is provided for interest only.

Multiple sclerosis is a progressive illness which results in weakened nerves and eventually leading to loss of sensation/muscle control. This can be both painful and a cause of spasticity (eg muscle stiffness, uncontrolled muscle movements). The study investigated a particular formulation of cannabis and measured

This study, in part, examined two treatments:

Cannabis a dose of ∆9-THC administered as a tablet taken three times per day.

Placebo a pill which is identical to the cannabis pill in appearance, but containing no active ingredient, also taken three times per day.

The 24 patients took one of the treatments for four weeks, and then the other treatment for four weeks. The order of the treatments was randomised. The response variable we are considering is LTW25 (lower numbers are better), measured at the end of each treatment.

The data is available as Asst2 2024 data.csv on the LMS Assignment 2 page.

(a). Explain why the researchers chose to randomise the order of the treat-ments.

(b). Produce an appropriate graph showing LTW25.

(c). Conduct a Hypothesis Test (using α = 0.01) to determine if there is a difference in LTW25 between placebo and cannabis. Show all of your calculations and steps.

Your answer needs to (the 5 step process meets these requirements):

❼ State the hypotheses in terms of the parameter(s) of interest.

❼ Calculate se(estimator) — you need to show how this is cal-culated,but may use Minitab to obtain summary statis-tics (means, standard deviations).

❼ Calculate the test statistic, and give its distribution under the null hypothesis.

❼ Give the P-value for the test, using Minitab (you should not use Minitab for other parts of this question).

❼ State your conclusion in the context of the data.

(d). What assumptions have you made in conducting this hypothesis test? Were they satisfied? (You need to provide evidence as part of your answer.)

(e). Write the results of your hypothesis test conducted in 2(d), in the style of a research paper.

Your comments must be less than 40 words.

Question 3: Identifying AI Images [2 + 5 + 3 = 10 marks]

This question is inspired by Zeyu Lu, Di Huang, Lei Bai, Jingjing Qu, Chengyue Wu, Xihui Liu, and Wanli Ouyang (2024) ‘Seeing is not always believing: benchmarking human and model perception of AI-generated im-ages’, Advances in Neural Information Processing Systems, 36.

You can find this article at:

https://proceedings.neurips.cc/paper files/paper/2023/file/505df5ea30f6306 61074145149274af0-Paper-Datasets and Benchmarks.pdf, also linked on the LMS.

You DO NOT need information from this article to answer the questions; it is provided for context only.

Fake images are a common source of misinformation, particularly on social media. Lu et al. trained and tested a machine-learning algorithm to identify AI-generated photorealistic images. This was able to correctly identify images as genuine 86.3% of the time.

We are interested in whether humans are able to match this accuracy, in particular the proportion of people who are able to correctly identify at least 5 (out of 6) images as genuine/AI-generated.

(a). From a sample of 100 adults living the Melbourne metropolitan area, 71 were able to correctly identify at least 5 of the 6 images they were presented with. Calculate an approximate 90% confidence interval for the true proportion of adults within Melbourne who would be able to correctly identify most images as genuine/AI-generated.

(b). After reading about this research, someone decided to try a similar test in a rural township (Kaniva, Victoria). They were curious if people who used technology less frequently might have a different ability to distinguish AI-generated images.

From a sample of 5 people, 2 were able to correctly identify at least 5 out of 6 images. Conduct a hypothesis test to determine if the proportion is statistically different from 0.7 (you may use Minitab for this part). Your answer needs to (the 5 step process meets these requirements):

❼ State the hypotheses in terms of the parameter(s) of interest.

❼ Give the P-value for the test using Minitab.

❼ State your conclusion in the context of the data.

You will need to justify any choices with regards to using approximate tests.

(c). It is believed that the true proportion of of adults within Melbourne who would be able to correctly identify most images as genuine/AI-generated will be no less than 60%. Researchers would like to estimate this proportion using an 99% confidence interval based on a normal approximation, with a maximum margin of error of 0.05. What sample size would be required to achieve this? Show your calculations as well as your answer.

Question 4: Interpreting Research [3 + 2 = 5 marks]

This question requires you to interpret the following small section of the article: Moa Jederstr¨om, Sara Agnafors, Christina L. Ekegren, Kristina Fagher, H˚akan Gauffin, Laura Korhonen, Jennifer Park, Armin Spreco, and Toomas Timpka (2023) ‘A cross-sectional study of anxiety and depression caseness in female competitive figure skaters in Sweden’ BMJ Open Sport & Exercise Medicine 9, no. 1, e001491. You can find this article at:

https://bmjopensem.bmj.com/content/bmjosem/9/1/e001491.full.pdf, also linked on the LMS.

You DO NOT need information from this article to answer the questions; it is provided for context only.

The study examined Swedish figure skaters, and whether they were diag-nosed with anxiety (“anxiety caseness”), depression (“depression caseness”) or neither (“no caseness”).

“Skaters reporting no caseness were younger than those reporting only anxiety caseness (mean age difference −1.9 years; 95% CI −3.1 to −0.7; p=0.001). . . ”

(a). There is a P-value in the quote, clearly state the null and alternative hypotheses being tested. You may need to define appropriate param-eter(s).

(b). Identify two good aspects of the way the results are reported in the quote.

Relevance, Formatting & Submission                               [2 marks]

You can gain an additional 2 marks by:

❼ only including relevant material;

❼ submitting a clearly legible assignment (eg all pages correct orienta-tion);

❼ selecting correct page(s) for each part of each question (when you upload your assignment to Gradescope, it will ask you to select pages: you can select multiple pages for a question part, you can also select the same page for multiple parts).





发表评论

电子邮件地址不会被公开。 必填项已用*标注