BIOS 525 - 2023 Final Exam
Instructions:
1. Submit report by noon (12 pm) on Tuesday, December 12th at 5:00pm
2. Include all analytic code in an appendix.
3. Do not include any output from statistical software directly in the report.
4. All work must be COMPLETELY independent.
5. Email all questions to the instructor.
6. Reports will be evaluated on presentation and clarity.
Question 1: Diarrheal Prevalence |
The dataset diarrhea.csv contains across-sectional survey of children at different schools. The objective is to estimate the difference in diarrhea prevalence between schools with and without programs for
safe water, adequate sanitation, and improved hygiene. We are particularly interested in whether associations between hygiene programs and diarrhea prevalence were modified by age and sex.
Variable Codebook:
1. School School ID
2. Sex Student’s sex: 0 = boy, 1 = girl
3. Age Student’sage at the time of survey
4. Z Indicator variable for whether the child had diarrhea over the last three days
5. treat Indicator variable for the presence of a school-level program to improve sanitation and hygiene.
Address the scientific questions using a generalized estimation equation approach. Only describe a single statistical model you decide to use.
a) Report relevant descriptive statistics (1 paragraph, tables/figure optional).
b) Write down your statistical model. Justify why and how the model is useful for answering the scientific questions.
c) Summarize your findings and relevant model parameters (1-2 paragraphs, tables/figure optional).
Question 2: Chlamydia Incidence |
The dataset (CDC.RData) contains annual number of chlamydia cases reported at the county-level during years 2003 to 2010. Here we will only analyze data from counties with population greater than 500,000. Several variables on county-level population characteristics are also obtained from Census 2000. We are interested in identifying population characteristics that are associated with chlamydia incidence rates.
Variable Codebook:
1. FIPS Unique county identifier (Federal Information Processing Standards)
2. Area |
County name |
3. State |
Two-letter state abbreviation |
4. Year |
Reporting year (see note*) |
5. Population |
At-risk population size |
6. Cases |
Reported cases of chlamydia |
7. HHIncome2000 |
Median household income (in $1,000) from Census 2000 |
8. PBlack2000 |
Percent black population from Census 2000 |
9. PHisp2000 |
Percent Hispanic population from Census 2000 |
*Using the year variable directly may result in convergence issues due to scaling (i.e. coefficients being too small). Consider reparametrizing the time variable (e.g. centering or subtracting a reference year) when fitting the regression model.
Address the scientific questions using a random-intercept modeling approach. Only describe a single statistical model you decide to use.
a) Report relevant descriptive statistics (1 paragraph, tables/figure optional).
b) Write down your statistical model. Justify why and how the model is useful for answering the scientific questions.
c) Summarize your findings and relevant model parameters (1-2 paragraphs, tables/figure optional).