Linear Regression in Python

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Assignment 2

Before working on this assignment, please read the following tutorial carefully –

1. Overfitting vs. Underfitting

2. Linear Regression in Python

In this assignment, you will use the Life Expectancy Data (from last assignment) to investigate regression problems. In case you don’t have this data, you can access it via: https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who

Task 1:

Int this task, we will use different kinds of models to explore the relationships between economic status and life expectancy. For Afghanistan for instance, as the following table shows, we can use older data (from 2000 to 2013) to train models and use the trained models to predict life expectancy of 2014 and 2015. The model input can be GDP number and the model output will be life expectancy for that year.

 

Country

Year

GDP

Life expectancy

Afghanistan

2015

584.2592

???

Afghanistan

2014

612.6965

???

Afghanistan

2013

631.745

59.9

Afghanistan

2012

669.959

59.5

Afghanistan

2011

63.53723

59.2

Afghanistan

2010

553.3289

58.8

Afghanistan

2009

445.8933

58.6

Afghanistan

2008

373.3611

58.1

Afghanistan

2007

369.8358

57.5

Afghanistan

2006

272.5638

57.3

Afghanistan

2005

25.29413

57.3

Afghanistan

2004

219.1414

57

Afghanistan

2003

198.7285

56.7

Afghanistan

2002

187.846

56.2

Afghanistan

2001

117.497

55.3

Afghanistan

2000

114.56

54.8

 

Please train 4 functions, Linear Function, Quadratic Function, Cubic Function, and Quartic Function, to fit this data (only using Afghanistan data), and then calculate RMSE and R2 scores. Please fill the following table:

RMSE Scores (for 2014 and 2015 data):

 

Linear Function

Quadratic Function

Cubic Function

Quartic Function

Training Data (2000 - 2013)

 

 

 

 

Testing Data (2014 and 2015)

 

 

 

 

R2 Scores (for 2014 and 2015 data):

 

Linear Function

Quadratic Function

Cubic Function

Quartic Function

Training Data (2000 - 2013)

 

 

 

 

Testing Data (2014 and 2015)

 

 

 

 

 

Please submit your code (named calculate_Afghanistan.py). Please explain which model can be the best to predict this small dataset? why?: __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Task 2:

Please repeat this process for all the countries in this dataset. Then, you can average the RMSE and R2 scores for all the developing and developed countries. Please fill the following table:

RMSE Scores:

Developing Countries

Linear Function

Quadratic Function

Cubic Function

Quartic Function

Training Data (2000 - 2013)

 

 

 

 

Testing Data (2014 and 2015)

 

 

 

 

Developed Countries

Linear Function

Quadratic Function

Cubic Function

Quartic Function

Training Data (2000 - 2013)

 

 

 

 

Testing Data (2014 and 2015)

 

 

 

 

 

R2 Scores:

Developing Countries

Linear Function

Quadratic Function

Cubic Function

Quartic Function

Training Data (2000 - 2013)

 

 

 

 

Testing Data (2014 and 2015)

 

 

 

 

Developed Countries

Linear Function

Quadratic Function

Cubic Function

Quartic Function

Training Data (2000 - 2013)

 

 

 

 

Testing Data (2014 and 2015)

 

 

 

 

 

Please submit your code (named calculate_all_country.py). Please explain which model(s) can be the best to predict developing and developed countries; why?: __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Task 3:

For this task, we will use 5 variables - Adult Mortality, Alcohol, BMI, GDP, Schooling – to build regression models (Multiple Linear Regression) to predict the life expectancy of the target country for a specific year, e.g., use a model to predict “Libya’s life expectancy in year 2010”. We can train two different models (developing country model and developed country model) to predict the data. Similarly, we can use older data (from 2000 to 2013) to train models and use the trained models to predict life expectancy of 2014 and 2015.

Please fill this table (for testing with 2014 and 2015 data):

 

RMSE

R2

Developing Country

 

 

Developed Country

 

 

 

Please fill the following table with the “regression coefficients” (for each variable):

 

Adult Mortality

Alcohol

 BMI

GDP

Schooling

Developing Country

 

 

 

 

 

Developed Country

 

 

 

 

 

 

Please submit your code (named calculate_regression.py). Comparing developing and developed countries (two models that you build), can you find some interesting results?:  __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

 

Task 4:

For task 3, we used the Linear Regression model to address the prediction problem. Please tell us the limitation(s) of the model, and can you improve it?

__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Submission: Your code files and a PDF file (containing your solution for tasks and the results to report).

发表评论

电子邮件地址不会被公开。 必填项已用*标注