MAST20031 Analysis of Biological Data - Assignment 2


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


MAST20031 Analysis of Biological Data - Assignment 2

Instructions

• This assignment contains 3 problems worth a total of 32 marks.
• Your assignment must be submitted to GradeScope by 11.59pm Friday 9th May.
• You will need to submit a .pdf - you can either “Knit to PDF” (this doesn’t always work);or “Knitto Word” and then convert your Word doc to PDF; or “Knit to HTML”, open the .html file (eg in Safari) and then “Print to PDF”.
• Assignments submitted in formats other than PDF (images, docx, etc) will not be marked.
• Assignments must have both input (codes) and outputs (results, numbers, stats, plots).
• Assignments submitted late will incur a penalty of 1% per hour (or part thereof). If you have exceptional circumstances that prevent you from meeting the deadline, please email MAST20031-info@unimelb.
edu.au, and we may be able to grant an extension.
• Tutors may not help you directly with assignment questions. They may, however, provide some appropriate guidance.
• There is a discussion board if you need clarification on the wording of questions.

General advice

• Please show your working/code. If you show your working, we can see when you’re using the right process even if you end up with the wrong numerical answer. We like awarding marks; help us to help you.
• We recommend using an R Markdown (we’ve included a template file) to keep your code, results and answers nicely formatted. But you can use any word processing program: in either case, your code and output from R should be part of your document.
• No graph is complete without appropriate labels, units and axes. Please do not submit hand drawn graphs. You can save or copy-paste graphs from R.
• You are encouraged to use internet resources (eg Google for “how to do X using R”) but need to submit your own work (don’t directly copy something from ChatGPT, for example).

Dexterity data

Problem 1 uses the data the class collected in Data Collection Exercise 2, available on the LMS. The data consist of 726 rows, with each row containing the results of a single dexterity trial. Each trial consists of a hand (left/right) and a number of grains of rice.
dce2 <- read.csv(file = "DCE2_2025.csv")

The rationale for collecting the data was to test the idea that manual dexterity is related to handedness. Higher values should indicate greater dexterity.

The following commands will reorganise the data in a way helpful to answering Problem 1.

# calculate the average nGrains across replicates for each user based on their "dominant hand" and "hand used" combination
dce2.agg <- aggregate(. ~ UserID+Hand+dominantHand,dce2, mean, na.action = na.omit)
# Select L and R trials and calculate differences
rTrials <- subset(dce2.agg, dce2.agg$Hand=="R")
rTrials <- rTrials[order(rTrials$UserID), ]
lTrials <- subset(dce2.agg, dce2.agg$Hand=="L")
lTrials <- lTrials[order(lTrials$UserID), ]
RL <- rTrials$nGrains-lTrials$nGrains
# Calculates average difference (right-left)
dom.hand <- rTrials$dominantHand
# Dominant hand of each person
# Creates a new data frame (dexterity.data) with the relevant variables calculated above
dexterity.data <- data.frame(user = rTrials$UserID, rHand = rTrials$nGrains,
lHand = lTrials$nGrains, difference = RL, dominantHand=dom.hand)

You will now have 110 rows of data giving within-student averages and the within-student difference between these averages.

Problem 1: Analysing Dexterity Differences [14 marks]

a. [3 marks] Use an appropriate plot to examine if the variable difference (calculated in the code above) is normally distributed or not. Examine the difference separately for both right and left dominant hand groups. (NOTE: You should make two plots and label them appropriately.)

b. [6 marks] Considering only the students who are Right-hand dominant, carry out a suitable hypothesis test at the 0.05 level of significance to test if there is a difference between their right and left hands. You should clearly state your hypotheses, test statistic, degrees of freedom (if relevant), p-value and a precise conclusion in the context of the question.

c. [2 marks] Explain why it is reasonable to conduct a hypothesis test (t-test) for the mean difference for Right-hand dominant individuals, even if they are not normal.

d. [3 marks] Conduct a suitable hypothesis test for the students who are left-hand dominant. In your analysis, carefully justify your choice of test, including a detailed examination of the assumptions required for the selected test. After conducting the test, you need to write a clear conclusion that is supported by relevant evidence. This conclusion should summarise the findings and explain how the evidence supports these outcomes.

Problem 3: Testing DCE1 Data [9 marks]

In this problem you will use the ‘demographics’ data that we collected from each of you right at the start of semester (DCE1), available on the LMS. The data set has 179 rows; each row contains answers to each question from one student. The file name for this dataset is “DCE1_2025.csv”. There is one row for each student that responded to the quiz:PredictFinal is the mark the student nominated at the beginning of

the course as their likely mark; Languages is the number of languages the student speaks; DominantHand is the student’s dominant hand. Student is whether students are domestic or international based on their country of origin. 

demographics <- read.csv(file = "DCE1_2025.csv")

a. [5 marks] We are interested in whether there is an association between the number of languages spoken and where people are from (metropolitan or regional). Treating Languages as a categorical variable, carry out an appropriate hypothesis test. You will need to ensure that you meet the assumptions for this test. In your answer:
• state your null hypothesis and your alternative hypothesis;
• list the assumptions and show they are met;
• state the test statistic, its distribution, and report the p-value
• describe in plain English what the results mean.
b. [4 marks] We are interested in whether there is a difference between the domestic and international students predicted final mark. Run an analysis of variance to test whether there is significant difference between the mean of predicted final mark for international and domestic students. Use PredictFinal and Studentcolumns in DCE1_2025 data set to run this analysis. Include your codes and the outputs.
(i) What were the hypotheses being tested here?
(ii) Explain (in details) what does these results indicate?

Problem 3: Chi-Squared Hypothesis Test [7 marks]

A researcher is investigating whether there is an association between study method and exam performance among university students. The study surveyed 120 students and recorded their preferred study method and whether they passed or failed the final exam.

The results are summarised in the following contingency table:

Table 1: Contingency Table: Study Method and Exam Performance

Study.Method
Passed
Failed
Total
Group Study
30
10
40
Solo Study
25
15
40
Online Resources
35
5
40
Total
90
30
120
 a. [1 marks] State the hypotheses: Clearly state the null and alternative hypotheses for the Chi-squaredtest of association.

b. [6 marks] Perform the chi-squared test.
• Calculate the expected counts for each cell.
• Compute the Chi-squared test statistic.
• Determine the degrees of freedom.
• At a significance level of alpha = 0.05, determine the critical value or compute the p-value.
• State whether you reject or fail to reject the null hypothesis. Why?

Problem 0: Organised Submission & participate in DCE2 [2 marks]

Your participation in DCE2 has already been recorded. You only need to submit a clearly legible assignment (pages correct way up, sensible font size, etc) with pages selected for each question part to get the additional mark.

发表评论

电子邮件地址不会被公开。 必填项已用*标注