Data Analysis for Decision Making

Data Analysis for Decision Making

Assignment: Build a scorecard from scratch

Deadline to submit: November 19, 2023, at 11:59PM (Paris time)


The main goal of this assignment is to write a report that explains the approach you have  considered   to    build    a    scorecard    by    your    own.    We    will    consider    the    dataset  “dataassignment.xlsx” . In this dataset, a bank has collected the information about payment  incident for a lending portfolio of 2850 French borrowers. Our goal is to make prediction of  the probability a borrower experiences a payment default on his credit conditionally to his/her  characteristics. The variable we want to predict is the variable “incident” that is equal to “Yes” is  the  borrower  has  experienced  a   payment  default,  or  “No”   if  the   borrower  has  not  experienced a payment default on his credit.

Using the Chapter 2 on probability, we want to forecast P(incident=”Yes”) using the concept of  probability  with  multiple  conditioning  events  as  done  in  Example  10  of  Chapter  2  for instance.

To predict the variable “incident”, you have at your disposal 8 variables.

“income” : monthly income of the borrower (in euros).

“duration”: credit duration of the borrower (in months).

“amount”: credit amount of the borrower (in euros).

“family”: “Single” if the borrower is single, “Married” if the borrower is married.

“seniority” : number of months since the borrower has been a bank customer.

credcard: “Yes” if the borrower ownsacredit card, “No” otherwise.

“age” : age of the borrower (in years).

“depbirth” : “Department 1”, “Department 2”, or “Department 3” according to the department of birth of the borrower.

To help the construction of the scorecard, have a look on the examples 10 and 11 of Chapter 2 that will be very useful.

In this project, I expect you to:

1. Perform a univariate analysis of the variables to get a better idea of the data. (see Chapter 1 for performing this step.)

a.   Give the list of numerical variables and categorical variables in the dataset.

b.   Give a table displaying some numerical descriptive measures for the   numerical variables (such as the mean, standard deviation, skewness, kurtosis, …).


c.   Give frequency tables for the categorical variables.

2. Bin the numerical variables into categorical variables using the binning method of

your choice. (see Chapter 2 for performing this step.). Note that it is also possible to perform the binning step after the step 3 of variableselection. Note also that it is

possible to consider several binning methods and to compare them in order to select the method that maximizes the dependence between the conditional variables and    the variable incident”.

3. Select a subset of 3 or 4 variables (out of the 8 variables) that will be used to predict the variable “incident”. (see Chapter 5 for performing this step.)

a.   You should retain the variables that display a significant dependence with the variable “incident”. Use the statistical tests of Chapter 5: Advanced statistical  tests to that end:

i. Pearson test of correlation if you are analyzing the dependence between two numerical variables.

ii. Chi-square test of independence if you are analyzing the dependence between two categorical variables.

iii. One-way ANOVA test if you are analyzing the dependence between 1 numerical and 1 categorical variables.

b.   Then, out of those selected variables, select the ones that are most highly

correlated/associated with the variable “incident” using the measures of

correlation/association of Chapter 5: Advanced statistical tests (Cramer’V or Pearson correlation coefficient).

4. Build n-ways contingency tables and calculate the probability the borrower

experiences a payment default conditional to the variables that has been selected in the former step 3. (See example 10 of Chapter 2 for an example). At the end of this   step, you must display the probability of payment default in a prediction table.

5. Build and display your final scorecard. (See example 11 of Chapter 2 for this step). It   would be interesting to give some examples on how to use it for non-experts’ people working in the bank such as bank advisor staff. Also it would bean added value to

identify from the scorecard what are the main drivers explaining payment default experienced by borrowers.

A report of about 10 pages is expected explaining your approach and displaying the important results. Clarity,completeness, innovation potential, use of data, technical details, and overall presentation will be evaluated.

The  assignment  submission   must  include  a  written   report  of  about   10  pages  including screenshots of the SPSS outputs to support your explanations.

发表评论

电子邮件地址不会被公开。 必填项已用*标注