Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
SEMESTER 1 2024/25
COURSEWORK BRIEF:
Module Code: |
MANG6556 |
Assessment: |
Individual Coursework |
Weighting: |
100 |
|||
Module Title: |
Credit Risk & Data Analytics |
This assessment relates to the following module learning outcomes:
3 |
Coursework Brief:
Question 1 (70 marks)
The dataset ‘Credit data.xlsx’ contains data on 10,000 borrowers and whether they subsequently experienced serious delinquency (see variable ‘SeriousDlqin2yrs’). Assume the lender now wishes to use this data to build a credit scoring model that predicts serious delinquency based on the other variables. The dataset contains the following variables:
Variable Name |
Description |
|
SeriousDlqin2yrs |
Person experienced 90 days past due delinquency or worse |
|
RevolvingUtilizationOfUnsecuredLines |
Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits |
|
age |
Age of borrower in years |
|
NumberOfTime30-59DaysPastDueNotWorse |
Number of times borrower has been 30-59 days past due but no worse in the last 2 years. |
|
DebtRatio |
Monthly debt payments, alimony,living costs divided by monthy gross income |
|
MonthlyIncome |
Monthly income |
|
NumberOfOpenCreditLinesAndLoans |
Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) |
|
NumberOfTimes90DaysLate |
Number of times borrower has been 90 days or more past due. |
|
NumberRealEstateLoansOrLines |
Number of mortgage and real estate loans including home equity lines of credit |
|
NumberOfTime60-89DaysPastDueNotWorse |
Number of times borrower has been 60-89 days past due but no worse in the last 2 years. |
|
NumberOfDependents |
Number of dependents in family excluding themselves (spouse, children etc.) |
1.1 Carefully pre-process the data set by considering the following activities (35 marks):
• exploratory data analysis
• missing value handling (if any)
• outlier detection and treatment (if any)
• categorisation of the continuous variables (if deemed useful)
• Weights of Evidence coding (note that some additional coarse classification might be needed).
• Splitting the data set into a training and test set.
1.2 Estimate a scorecard using a logistic regression classifier and report the following (35 marks):
• The most important variables
• The impact of the variables on the target
• The performance of the model. Use various performance metrics and discuss their relationship if any.
• Result of scorecard.
• Compare this scorecard with the results of a Random Forest. Discuss your results.
• Why do banks typically use Logistic Regression as their base classifier? What do banks win and lose by doing this?
Please carefully report the various steps of your methodology and discuss your results in a rigorous way!
NOTE: It is unlikely that different students will come up with the exact same parameter estimates. Special consideration will be given to submissions whose estimates are identical.
Question 2 (30 marks)
Find an academic paper published in 2021 or later (based on online or print publication date) discussing a real-life application of data analytics. It is important that the dataset analysed in the paper consists of real-life (not artificial) data. The publication outlets in which to look for a suitable paper are:
• Management Science
• Operations Research
• INFORMS Journal on Computing
• INFORMS Journal on Applied Analytics
• Journal of Machine Learning Research
• European Journal of Operational Research
• Production and Operations Management
• Manufacturing & Service Operations Management
• ICDM (The IEEE International Conference on Data Mining)
• NeurlPS (Conference on Neural Information Processing Systems)
• KDD (ACM SIGKDD Conference on Knowledge Discovery and Data Mining)
The other journals which are not on the list are not acceptable.
2.1 Once you have found an appropriate paper, report the following in separate subsections (15 marks):
• Title, authors, and complete citation (e.g., journal name, volume/issue, year, …)
• The data mining problem considered
• The data mining techniques used
• The results reported
• A critical discussion of the model and results (assumptions made, shortcomings, limitations, …)
2.2 Apply the methodology you reviewed into the dataset of ‘Credit data.xlsx’ and report the analytic steps, model performance, and business implications. (15 marks)
Make sure you demonstrate that you understand what the article is all about and are able to provide a critical discussion.
Do not copy and paste from the article. Using Turnitin, this will be easily detected!
NOTE: The reviewed methodology should be different from methods applied in Question 1.