STAT 473 (Winter 2020)
Final Take-Home Examination
This final exam is due on April 17 by 9:30pm EST (time of Kingston Ontario).
Please submit a PDF file of your work to OnQ Final Exam Dropbox.
You must work on the exam independently!
Please list books, articles, on-line resources that you have cited as your references.
The department and I are taking the academic integrity regulations seriously. The steps and paths of your answers, analysis, writing and code will be checked carefully for evidence of your independent work. Any violations noticed will be investigated formally.
Part I. A Problem
1. [20 marks] The data in Table 1 contain numbers of insurance policies, n, and numbers of claims, y, for cars in various insurance categories, CAR, tabulated by age of policy holder, AGE, and district where the policy holder lived (DIST=1, for major cities, and DIST=0, otherwise). Find a suitable log-linear model to explain the number of claims in terms of the AGE, DIST and/or CAR variables. Describe your models and analysis, present the important results in tables and/or figures, explain and interpret your findings and clearly state your conclusions.
Table 1: Car insurance data.
DIST=0 DIST=1
CAR AGE y n y n
1 1 65 317 2 20
1 2 65 476 5 33
1 3 52 486 4 40
1 4 310 3259 36 316
2 1 98 486 7 31
2 2 159 1004 10 81
2 3 175 1355 22 122
2 4 877 7660 102 724
3 1 41 223 5 18
3 2 117 539 7 39
3 3 137 697 16 68
3 4 477 3442 63 344
4 1 11 40 0 3
4 2 35 148 6 16
4 3 39 214 8 25
4 4 167 1019 33 114
Part II. Project and Report
Requirements:
• Analyze the data and write a concise but clear report. The suggested length is no more than 3 pages (text).
• Exploration of different types of models are encouraged. Demonstrate that you understand and are able to use the relevant methods and models from this course.
Present selected important results and evidence leading to your conclusions.
• Good scientific writing and clear explanation are highly valued. You can describe the data and variables; describe your initial exploration and the models you consider, include selected results and figures for model fit, model assessment and comparison; and interpret your final model (or models) and explain what knowledge is achieved from your analysis in the application context.
• Write your report as an article that explains your thoughts, it should not look like patches of analysis output. Your report should NOT include any code or output directly from R. Include tables in the report to summarize the analysis and results if necessary. Attach the R code and output at the end of your report as a record and proof of your independent work. These R code and output will not be considered for marks.
• Please type your report.
Marking Scheme:
Total marks: 80;
40 marks on statistical analysis;
40 marks on report writing.
Table 2 is obtained in a study of the relationship between a particular type of heart disease, cholesterol level and blood pressure. The total number of individuals is fixed in the study. The individuals are cross-classified according to three variables: the heart disease status (Present, Absent), the serum cholesterol (mg/100 cc) at 4 levels, and systolic blood pressure (mm Hg) at 4 levels. Use the methods and models in this course to analyze the data.
Table 2: Study of a type of heart disease.
Heart Serum cholesterol Systolic blood pressure
disease (mg/100 cc) < 127 127− 146 147− 166 167+
Present < 200 2 3 3 4
200− 219 3 2 0 3
220− 259 8 11 6 6
≥ 260 7 12 11 11
Absent < 200 117 121 47 22
200− 219 85 98 43 20
220− 259 119 209 68 43
≥ 260 67 99 46 33