STAT3621 (2023-2024 Semester 2)
Assignment 2
Please attach R code and the output together with your answers, and make sure your results are reproducible. Please combine the answers, R code and output into one file, preferably using R Markdown + knitr to produce a single HTML or PDF file.
1. Specifications are given for 387 new vehicles for the 2004 year. The variables recorded include price, measurements relating to the size of the vehicle, and fuel efficiency (cars.csv).
VARIABLE DESCRIPTIONS: Vehicle Name; Sports Car? (1=yes, 0=no); Sport Utility Vehicle? (1=yes, 0=no); Wagon? (1=yes, 0=no); Minivan? (1=yes, 0=no); Pickup? (1=yes, 0=no); All-Wheel Drive? (1=yes, 0=no); Rear-Wheel Drive? (1=yes, 0=no); Suggested Retail Price, what the manufacturer thinks the vehicle is worth, including adequate profit for the automaker and the dealer (U.S. Dollars); Dealer Cost (or "invoice price"), what the dealership pays the manufacturer (U.S. Dollars); Engine Size (liters); Number of Cylinders (=-1 if rotary engine); Horsepower; City Miles Per Gallon; Highway Miles Per Gallon; Weight (Pounds); Wheel Base (inches); Length (inches); Width (inches).
(a) Obtain a boxplot and histogram for suggested retail price and dealer cost respectively. Comment on your observations.
(b) Compare whether the median/distribution for suggested retail price and dealer cost differ or not. State the null and alternative hypothesis, test statistic, p-value and your conclusion clearly. Use α = 0.05.
2. A study was carried out on post-menopausal women in City A. Cases of women with endometrial cancer were identified from this city. A control group was selected matched to the case on age and length of residence in city A. The medical question was whether endometrial cancer was related to estrogen use. Answer this question from the viewpoints of hypothesis testing (including to state the null and alternative hypotheses, the name of the test, the test statistic, and the p-value) and confidence interval (at significance level α = 0.05).
3. The file lbw.csv involves the low birth weight (lbw) data. This data set contains 189 observations on the following 10 columns:
low: a binary variable, which indicates whether the birth weight of a baby is under 2500g (low=1), or at a normal weight (low=0).
smoke: 1=history of mother smoking; 0=mother nonsmoker
age: age of mother: 14-45
race: categorical 1-3: 1=white; 2-=black; 3=other
lwt: mother weight (lbs) at last menstrual period: 80-250 lbs
ptl: number of false of premature labors: 0-3
ht: 1=history of hypertension for mother; 0 =no hypertension for mother
ui: 1=uterine irritability for mother; 0 no irritability for mother
ftv: number of physician visits in 1st trimester: 0-6
bwt: birth weight in grams: 709 - 4990 gr
(a) In these 189 individuals, how many mothers have history of smoking (smoke=1)? How many are nonsmokers (smoke=0)?
(b) Among the mothers with a history of smoking, how many of their babies indicate low birth weight (low=1)? Among the mothers who are nonsmokers, how many of their babies indicate low birth weight (low=1)?
(c) Denote the proportion of their babies with low weight (low=1) among mothers with history of smoking as !!. Denote the proportion of babies with low weight (low=1) among mothers who are nonsmokers as !" . Use a two-sample z-test to test the hypothesis:
H0: p1 = p2 v.s. H1: p2 ≠ p2.
(d) Find the 95% confidence intervals for p1 − p2.
4. The motor trend car road test data comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). In R, please type data(mtcars) to load the data file. In “mtcars”, there are 32 observations on the following 11 (numeric) variables:
mpg Miles/(US) gallon
cyl Number of cylinders
disp Displacement (cu.in.)
hp Gross horsepower
drat Rear axle ratio
wt Weight (1000 lbs)
qsec 1/4 mile time
vs Engine (0 = V-shaped, 1 = straight)
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors
(a) Draw a scatter plot for mpg and wt. Report the Pearson correlation between mpg and wt?
(b) Check the normality of mpg by drawing the QQ plot. Then check the normality of mpg by performing Shapiro-Wilk’s test. Report the computed test statistic, p-value and your conclusion.
(c) Test whether the means of mpg are equal between the two Engine groups (i.e., vs =0 and =1). Report the computed test statistic, p-value and conclusion.
(d) Conduct the hypothesis test in (c) under the framework of one-way ANOVA model. Rewrite the hypothesis H0 and H1 in terms of ANOVA model parameters. Report the computed test-statistics, p-value and conclusion.