IIDS69061 Programming for Health Data Science


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


IIDS69061 Programming for Health Data Science
Task Description

Assignment Title: Exploring Global Health Data

Objective:

This assignment involves exploring a dataset obtained from the World Health Organisation (WHO) on noncommunicable diseases and risk factors. The dataset contains information on health-related factors, including lifestyle behaviours (e.g., smoking and physical activity), treatment access (e.g., hypertension treatment), and health outcomes (e.g., probability of dying from chronic diseases), across various demographic and income groups. Using Python, your task is to perform a thorough analysis of the dataset and derive meaningful insights.

Instructions:

1. Dataset Overview

Begin by loading the dataset in Python and exploring its structure. Familiarise yourself with the columns, data types, and key statistics. Provide overall insights from the data, highlighting the type of information it contains.

2. Data Exploration and Analysis
Use Python libraries such as Pandas, NumPy, Matplotlib, Seaborn to process and analyse the data. Focus on data cleaning, summarisation, and visualisation to uncover trends and relationships. Select one topic for in-depth exploration and analysis. Here are some exemplar topics, but you are not restricted to these topics. Feel free to choose your own topic of interest:
  • Trends Over Time: Investigate how a specific factor (e.g., smoking prevalence) has changed over the years globally or within specific income group.
  • Income Group Analysis: Compare the prevalence of health factors across different income groups.
  • Gender Disparities: Explore differences in health estimates between males and females.
  • Country-Specific Insights: Identify countries with the highest or lowest estimates for a particular factor and analyse the reasons behind these extremes.
3. Statistical Analysis I

Explore the relationship between lifestyle factors (e.g., alcohol consumption, smoking prevalence, and physical activity) and disease prevalence (e.g., obesity, hypertension, and chronic disease mortality). You may answer questions such as: How do lifestyle factors such as alcohol consumption, smoking prevalence, and physical activity levels correlate with the prevalence of non-communicable diseases (e.g., obesity, hypertension, or chronic disease mortality) across different income groups?

4. Statistical Analysis II

Analyse the relationship between health behaviours and treatments and the probability of dying between the ages of 30 and 70 from major diseases, such as cardiovascular disease, cancer, diabetes, and chronic respiratory diseases. Explore the impact of changes in factors like smoking, physical activity, obesity, hypertension, and access to treatments on mortality risk.

Answer questions such as: How would the probability of dying from these diseases change if smoking rates decreased by 5%, 10%, or more? What impact would a 10% reduction in insufficient physical activity have on mortality risk? How might increased access to hypertension treatment affect mortality? What effect would reducing obesity prevalence have on mortality? How would decreasing alcohol consumption influence mortality? While these questions are provided to guide your investigation, you are free to explore other variables and create your own questions based on the data.

Submission:
Your submitted report should include:
  • A brief introduction to the dataset and the chosen topic.
  • Key steps of your analysis (e.g., data cleaning, visualisation, and computation).
  • Insights gained from your analysis, supported by visualisations or summaries.
  • Recommendations or conclusions based on the insights.
Please submit a Jupyter Notebook (printed in PDF) containing:
  • Well-documented Python code that performs the analysis
  • Markdown cells explaining your approach and outlining your insights and key findings.

发表评论

电子邮件地址不会被公开。 必填项已用*标注