BIOS 525 - 2023 Final Exam

BIOS 525 - 2023 Final Exam


1.    Submit report by noon (12 pm) on Tuesday, December 12th  at 5:00pm

2.    Include all analytic code in an appendix.

3.    Do not include any output from statistical software directly in the report.

4.    All work must be COMPLETELY independent.

5.    Email all questions to the instructor.

6.    Reports will be evaluated on presentation and clarity.

Question 1: Diarrheal Prevalence

The dataset diarrhea.csv contains across-sectional survey of children at different schools. The objective is to estimate the difference in diarrhea prevalence between schools with and without programs for

safe water, adequate sanitation, and improved hygiene. We are particularly interested in whether associations between hygiene programs and diarrhea prevalence were modified by age and sex.

Variable Codebook:

1. School School ID

2. Sex Student’s sex: 0 = boy, 1 = girl

3. Age Student’sage at the time of survey

4. Indicator variable for whether the child had diarrhea over the last three days

5. treat Indicator variable for the presence of a school-level program to improve sanitation and hygiene.

Address the scientific questions using a generalized estimation equation approach. Only describe a single statistical model you decide to use.

a) Report relevant descriptive statistics (1 paragraph, tables/figure optional).

b) Write down your statistical model. Justify why and how the model is useful for answering the scientific questions.

c) Summarize your findings and relevant model parameters (1-2 paragraphs, tables/figure optional).

Question 2: Chlamydia Incidence

The dataset (CDC.RData) contains annual number of chlamydia cases reported at the county-level during years 2003 to 2010.  Here we will only analyze data from counties with population greater than 500,000.  Several variables on county-level population characteristics are also obtained from Census 2000.  We are interested in identifying population characteristics that are associated with chlamydia incidence rates.

Variable Codebook:

1.    FIPS                              Unique county identifier (Federal Information Processing Standards)

2.    Area

County name

3. State

Two-letter state abbreviation

4.    Year

Reporting year (see note*)

5. Population

At-risk population size

6. Cases

Reported cases of chlamydia

7. HHIncome2000

Median household income (in $1,000) from Census 2000

8. PBlack2000

Percent black population from Census 2000

9. PHisp2000

Percent Hispanic population from Census 2000

*Using the year variable directly may result in convergence issues due to scaling (i.e. coefficients being too small). Consider reparametrizing the time variable (e.g. centering or subtracting a reference year) when fitting the regression model.

Address the scientific questions using a random-intercept modeling approach. Only describe a single statistical model you decide to use.

a) Report relevant descriptive statistics (1 paragraph, tables/figure optional).

b) Write down your statistical model. Justify why and how the model is useful for answering the scientific questions.

c) Summarize your findings and relevant model parameters (1-2 paragraphs, tables/figure optional).


电子邮件地址不会被公开。 必填项已用*标注