Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CDS533 Assignment 2
(Statistics for Data Science)
The assignment may consist of either two parts—a problem set and an R component— or just one part. Please follow the guidelines provided below for each.
Problem Set:
1. Manually solve each question and clearly demonstrate all necessary steps.
2. You can either type out your answers in MS Word or provide scanned images of your written answers.
3. Ensure that your handwriting is clear and legible, and that the quality of scanned files is sufficient. Unclear submissions will not be accepted.
R Component:
1. Provide screenshots of R console outputs or insert R plots as required by the questions.
2. For questions involving analysis based on R outputs, please provide detailed explanations to showcase your understanding.
On final submission
Personal Information:
Ensure that your full name and student ID (SID) are written at the very beginning of the document.
Final Submission Format:
● Round your final answer in 3 decimals.
● Your final submission, including the answers from the problem set and R component, should be merged into a single Microsoft Word document (.docx).
Submission Deadline:
Please upload your work in Moodle by 1:30pm 24th Oct, 2024.
NOTE: USE P-VALUE METHOD FOR HYPOTHESIS TESING PROBLEMS.
1. Lengths of 9 randomly sampled oak seedlings from a given plantation are listed below:
2.58 2.43 1.98 2.62 2.40 2.96 2.36 2.77 2.54
Assume that the population of oak seedling lengths follows a normal distribution; let μ be the mean length for oak seedlings from this plantation and let σ 2 be the variance.
(a) Construct 90% confidence intervals for μ and interpret the result.
(b) Construct 95% confidence intervals for σ 2 and interpret the result.
(c) Suppose you obtained data on 36 seedlings. Suppose that the sample mean and variance are exactly the same as in (a). Construct a 95% confidence interval for the mean lengths of oak seedlings in that case. How does it compare to your answer for part (a)?
2. A veterinary researcher claims that a new drug will be 70% effective in improving the condition of sheep suffering from a particular illness. To test this claim, a veterinary clinic tries the drug on 80 sheep suffering from the illness.
(a) The results indicate that there was improvement in the condition of 50 sheep. Is there any evidence against the claim at α = 0.05?
(b) Suppose now the experiment had been conducted with 320 sheep and improvement was noted in the condition of 200 sheep. Test the claim in this circumstance at α = 0.05.
(c) Let p be the “true” effective rate of the drug. Using the data from part (a), find a 90% CI for p.
3. (a) Suppose we are sampling from a N(μ, 16) distribution. How large must n be so that a 90% CI for μ has length equal to 0.5?
(b) Suppose you have a random sample from a N(μ, σ 2 ) distribution with σ 2 unknown. Let n = 10 . Consider testing H0 ∶ μ = 22 versus Ha ∶ μ ≠ 22 . Suppose you observe ̅(x) = 20.7 and S 2 = 4.17. Consider testing this hypothesis by using confidence intervals. Do you reject H0 at α = 0.10?, at α = 0.05?
(c) Using the data in part (b) of this problem, perform the T-test in the usual fashion. Use the pt command to find the exact p-value in R. Is this consistent with your results in part (b)?
(d) Using the data in part (b) of this problem, use the qt command to find a 99.5% confidence interval for μ .
4. The following data are paired yields (in bushels) of two varieties of wheat grown on standard-sized plots. Each pair of plots was in a different location. The plots within a pair were immediately adjacent to one another.
Location |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Variety I |
42.1 |
36.8 |
49.4 |
28.5 |
51.0 |
32.9 |
39.4 |
43.7 |
37.5 |
27.6 |
Variety II |
44.3 |
38.1 |
49.4 |
30.5 |
52.8 |
33.7 |
38.2 |
47.8 |
39.1 |
28.5 |
(a) State (briefly) the assumptions you must make to proceed with an analysis of data of this form.
(b) Perform (without using the computer) a test to determine whether the mean yield of Variety II differs from the mean yield of Variety I. (State hypotheses, give p-value, etc.)
(c) Find a 99% CI for the difference between mean yields.
5. A researcher wishes to compare the mean egg weight of two related species of laboratory birds. Nine randomly selected eggs are obtained from birds of each species with data given below.
Species A |
4.25 |
4.87 |
5.13 |
4.85 |
3.95 |
5.09 |
4.36 |
5.57 |
4.81 |
Species B |
4.32 |
4.48 |
5.05 |
3.27 |
4.23 |
4.41 |
4.77 |
3.75 |
3.90 |
(a) State (briefly) the assumptions you must make to proceed with an analysis of this problem. Define all terms.
(b) Perform (without using the computer) a hypothesis test of the claim that the two species have the same mean egg weight (versus the two-sided alternative). (State the hypotheses, give p-value, etc.)
(c) Compute a 95% CI for the difference in mean egg weight between the two species.
(d) Test the hypothesis that the mean egg weight of Species B eggs equals the mean weight of Species A eggs plus 0.5 (versus the 2-sided alternative)
6. The data tobacco.csv uploaded in Moodle contains a data set on a genetics experiment for lengths of tobacco leaves.
(a) UseR to make aplot that effectively compares the distribution of flower lengths between the F1 and F2 generations.
(b) Use R to construct a 95% confidence interval for the difference in population mean flower lengths between the F2 and F1 generations.
(c) Interpret this interval in the context of the problem.
(d) Use R to test the hypothesis that mean flower length is equal for the F1 and F2 generations versus the alternative that is different.
(e) Interpret this test in the context of the problem.