Unformatted Attachment Preview
AttributeID
Row
1 ID
2
5
6
7
8
9
10
11
12
13
14
15
16
17
18
TARGET
NAME_CONTRACT_TYPE
CODE_GENDER
FLAG_OWN_CAR
FLAG_OWN_REALTY
CNT_CHILDREN
AMT_INCOME_TOTAL
AMT_CREDIT
AMT_ANNUITY
AMT_GOODS_PRICE
NAME_TYPE_SUITE
NAME_INCOME_TYPE
NAME_EDUCATION_TYPE
NAME_FAMILY_STATUS
NAME_HOUSING_TYPE
19
20
21
22
REGION_POPULATION_RELATIVE
DAYS_BIRTH
DAYS_EMPLOYED
DAYS_REGISTRATION
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
DAYS_ID_PUBLISH
OWN_CAR_AGE
FLAG_MOBIL
FLAG_EMP_PHONE
FLAG_WORK_PHONE
FLAG_CONT_MOBILE
FLAG_PHONE
FLAG_EMAIL
OCCUPATION_TYPE
CNT_FAM_MEMBERS
REGION_RATING_CLIENT
REGION_RATING_CLIENT_W_CITY
WEEKDAY_APPR_PROCESS_START
HOUR_APPR_PROCESS_START
REG_REGION_NOT_LIVE_REGION
REG_REGION_NOT_WORK_REGION
LIVE_REGION_NOT_WORK_REGION
REG_CITY_NOT_LIVE_CITY
REG_CITY_NOT_WORK_CITY
LIVE_CITY_NOT_WORK_CITY
ORGANIZATION_TYPE
EXT_SOURCE_1
EXT_SOURCE_2
46 EXT_SOURCE_3
47 APARTMENTS_AVG
48 BASEMENTAREA_AVG
49 YEARS_BEGINEXPLUATATION_AVG
50 YEARS_BUILD_AVG
51 COMMONAREA_AVG
52 ELEVATORS_AVG
53 ENTRANCES_AVG
54 FLOORSMAX_AVG
55 FLOORSMIN_AVG
56 LANDAREA_AVG
57 LIVINGAPARTMENTS_AVG
58 LIVINGAREA_AVG
59 NONLIVINGAPARTMENTS_AVG
60 NONLIVINGAREA_AVG
61 APARTMENTS_MODE
62 BASEMENTAREA_MODE
63 YEARS_BEGINEXPLUATATION_MODE
64 YEARS_BUILD_MODE
65 COMMONAREA_MODE
66 ELEVATORS_MODE
67 ENTRANCES_MODE
68 FLOORSMAX_MODE
69 FLOORSMIN_MODE
70 LANDAREA_MODE
71 LIVINGAPARTMENTS_MODE
72 LIVINGAREA_MODE
73 NONLIVINGAPARTMENTS_MODE
74 NONLIVINGAREA_MODE
75 APARTMENTS_MEDI
76 BASEMENTAREA_MEDI
77 YEARS_BEGINEXPLUATATION_MEDI
78 YEARS_BUILD_MEDI
79 COMMONAREA_MEDI
80 ELEVATORS_MEDI
81 ENTRANCES_MEDI
82 FLOORSMAX_MEDI
83 FLOORSMIN_MEDI
84 LANDAREA_MEDI
85 LIVINGAPARTMENTS_MEDI
86 LIVINGAREA_MEDI
87 NONLIVINGAPARTMENTS_MEDI
88 NONLIVINGAREA_MEDI
89 FONDKAPREMONT_MODE
90 HOUSETYPE_MODE
91 TOTALAREA_MODE
92 WALLSMATERIAL_MODE
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
EMERGENCYSTATE_MODE
OBS_30_CNT_SOCIAL_CIRCLE
DEF_30_CNT_SOCIAL_CIRCLE
OBS_60_CNT_SOCIAL_CIRCLE
DEF_60_CNT_SOCIAL_CIRCLE
DAYS_LAST_PHONE_CHANGE
FLAG_DOCUMENT_2
FLAG_DOCUMENT_3
FLAG_DOCUMENT_4
FLAG_DOCUMENT_5
FLAG_DOCUMENT_6
FLAG_DOCUMENT_7
FLAG_DOCUMENT_8
FLAG_DOCUMENT_9
FLAG_DOCUMENT_10
FLAG_DOCUMENT_11
FLAG_DOCUMENT_12
FLAG_DOCUMENT_13
FLAG_DOCUMENT_14
FLAG_DOCUMENT_15
FLAG_DOCUMENT_16
FLAG_DOCUMENT_17
FLAG_DOCUMENT_18
FLAG_DOCUMENT_19
FLAG_DOCUMENT_20
FLAG_DOCUMENT_21
AMT_REQ_CREDIT_BUREAU_HOUR
120 AMT_REQ_CREDIT_BUREAU_DAY
121 AMT_REQ_CREDIT_BUREAU_WEEK
122 AMT_REQ_CREDIT_BUREAU_MON
123 AMT_REQ_CREDIT_BUREAU_QRT
124 AMT_REQ_CREDIT_BUREAU_YEAR
Description
ID of loan in our sample
Target variable (1 - client with payment difficulties: he/she had late payment more than X days on at least one of
the first Y installments of the loan in our sample, 0 - all other cases)
Identification if loan is cash or revolving
Gender of the client
Flag if the client owns a car
Flag if client owns a house or flat
Number of children the client has
Income of the client
Credit amount of the loan
Loan annuity
For consumer loans it is the price of the goods for which the loan is given
Who was accompanying client when he was applying for the loan
Clients income type (businessman, working, maternity leave,…)
Level of highest education the client achieved
Family status of the client
What is the housing situation of the client (renting, living with parents, ...)
Normalized population of region where client lives (higher number means the client lives in more populated
region)
Client's age in days at the time of application
How many days before the application the person started current employment
How many days before the application did client change his registration
How many days before the application did client change the identity document with which he applied for the
loan
Age of client's car
Did client provide mobile phone (1=YES, 0=NO)
Did client provide work phone (1=YES, 0=NO)
Did client provide home phone (1=YES, 0=NO)
Was mobile phone reachable (1=YES, 0=NO)
Did client provide home phone (1=YES, 0=NO)
Did client provide email (1=YES, 0=NO)
What kind of occupation does the client have
How many family members does client have
Our rating of the region where client lives (1,2,3)
Our rating of the region where client lives with taking city into account (1,2,3)
On which day of the week did the client apply for the loan
Approximately at what hour did the client apply for the loan
Flag if client's permanent address does not match contact address (1=different, 0=same, at region level)
Flag if client's permanent address does not match work address (1=different, 0=same, at region level)
Flag if client's contact address does not match work address (1=different, 0=same, at region level)
Flag if client's permanent address does not match contact address (1=different, 0=same, at city level)
Flag if client's permanent address does not match work address (1=different, 0=same, at city level)
Flag if client's contact address does not match work address (1=different, 0=same, at city level)
Type of organization where client works
Normalized score from external data source
Normalized score from external data source
Normalized score from external data source
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE
suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators,
number of entrances, state of the building, number of floor
How many observation of client's social surroundings with observable 30 DPD (days past due) default
How many observation of client's social surroundings defaulted on 30 DPD (days past due)
How many observation of client's social surroundings with observable 60 DPD (days past due) default
How many observation of client's social surroundings defaulted on 60 (days past due) DPD
How many days before application did client change phone
Did client provide document 2
Did client provide document 3
Did client provide document 4
Did client provide document 5
Did client provide document 6
Did client provide document 7
Did client provide document 8
Did client provide document 9
Did client provide document 10
Did client provide document 11
Did client provide document 12
Did client provide document 13
Did client provide document 14
Did client provide document 15
Did client provide document 16
Did client provide document 17
Did client provide document 18
Did client provide document 19
Did client provide document 20
Did client provide document 21
Number of enquiries to Credit Bureau about the client one hour before application
Number of enquiries to Credit Bureau about the client one day before application (excluding one hour before
application)
Number of enquiries to Credit Bureau about the client one week before application (excluding one day before
application)
Number of enquiries to Credit Bureau about the client one month before application (excluding one week before
application)
Number of enquiries to Credit Bureau about the client 3 month before application (excluding one month before
application)
Number of enquiries to Credit Bureau about the client one day year (excluding last 3 months before application)
Special
normalized
time only relative to the application
time only relative to the application
time only relative to the application
time only relative to the application
rounded
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
normalized
CIS9557 GROUP PROJECT:
HOME CREDIT DEFAULT RISK
INTRODUCTION
Many people struggle to get loans due to insufficient or non-existent credit histories. And,
unfortunately, this population is often taken advantage of by untrustworthy lenders. Home
Credit strives to broaden financial inclusion for the unbanked population by providing a
positive and safe borrowing experience. In order to make sure this underserved population
has a positive loan experience, Home Credit makes use of a variety of alternative data-including telco and transactional information--to predict their clients' repayment abilities
and ensure that clients capable of repayment are not rejected and that loans given with a
principal, maturity, and repayment calendar that will empower their clients to be successful.
DATASET
The data contains the following attributes. The target variable (‘TARGET’) refers to whether
someone defaults or not the loan (based on historical data).
ID
TARGET
NAME_CONTRACT_TYPE
CODE_GENDER
FLAG_OWN_CAR
FLAG_OWN_REALTY
CNT_CHILDREN
AMT_INCOME_TOTAL
AMT_CREDIT
AMT_ANNUITY
AMT_GOODS_PRICE
NAME_TYPE_SUITE
NAME_INCOME_TYPE
NAME_EDUCATION_TYPE
NAME_FAMILY_STATUS
NAME_HOUSING_TYPE
REGION_POPULATION_RELATIVE
DAYS_BIRTH
DAYS_EMPLOYED
DAYS_REGISTRATION
DAYS_ID_PUBLISH
OWN_CAR_AGE
FLAG_MOBIL
FLAG_EMP_PHONE
FLAG_WORK_PHONE
FLAG_CONT_MOBILE
FLAG_PHONE
FLAG_EMAIL
OCCUPATION_TYPE
CNT_FAM_MEMBERS
REGION_RATING_CLIENT
REGION_RATING_CLIENT_W_CITY
WEEKDAY_APPR_PROCESS_START
HOUR_APPR_PROCESS_START
REG_REGION_NOT_LIVE_REGION
REG_REGION_NOT_WORK_REGION
LIVE_REGION_NOT_WORK_REGION
REG_CITY_NOT_LIVE_CITY
REG_CITY_NOT_WORK_CITY
LIVE_CITY_NOT_WORK_CITY
ORGANIZATION_TYPE
EXT_SOURCE_1
EXT_SOURCE_2
EXT_SOURCE_3
APARTMENTS_AVG
BASEMENTAREA_AVG
YEARS_BEGINEXPLUATATION_AVG
YEARS_BUILD_AVG
COMMONAREA_AVG
ELEVATORS_AVG
ENTRANCES_AVG
FLOORSMAX_AVG
FLOORSMIN_AVG
LANDAREA_AVG
LIVINGAPARTMENTS_AVG
LIVINGAREA_AVG
NONLIVINGAPARTMENTS_AVG
NONLIVINGAREA_AVG
APARTMENTS_MODE
BASEMENTAREA_MODE
YEARS_BEGINEXPLUATATION_MODE
YEARS_BUILD_MODE
COMMONAREA_MODE
ELEVATORS_MODE
ENTRANCES_MODE
FLOORSMAX_MODE
FLOORSMIN_MODE
LANDAREA_MODE
LIVINGAPARTMENTS_MODE
LIVINGAREA_MODE
NONLIVINGAPARTMENTS_MODE
NONLIVINGAREA_MODE
APARTMENTS_MEDI
BASEMENTAREA_MEDI
YEARS_BEGINEXPLUATATION_MEDI
YEARS_BUILD_MEDI
COMMONAREA_MEDI
ELEVATORS_MEDI
ENTRANCES_MEDI
FLOORSMAX_MEDI
FLOORSMIN_MEDI
LANDAREA_MEDI
LIVINGAPARTMENTS_MEDI
LIVINGAREA_MEDI
NONLIVINGAPARTMENTS_MEDI
NONLIVINGAREA_MEDI
FONDKAPREMONT_MODE
HOUSETYPE_MODE
TOTALAREA_MODE
WALLSMATERIAL_MODE
EMERGENCYSTATE_MODE
OBS_30_CNT_SOCIAL_CIRCLE
DEF_30_CNT_SOCIAL_CIRCLE
OBS_60_CNT_SOCIAL_CIRCLE
DEF_60_CNT_SOCIAL_CIRCLE
DAYS_LAST_PHONE_CHANGE
FLAG_DOCUMENT_2
FLAG_DOCUMENT_3
FLAG_DOCUMENT_4
FLAG_DOCUMENT_5
FLAG_DOCUMENT_6
FLAG_DOCUMENT_7
FLAG_DOCUMENT_8
FLAG_DOCUMENT_9
FLAG_DOCUMENT_10
FLAG_DOCUMENT_11
FLAG_DOCUMENT_12
FLAG_DOCUMENT_13
FLAG_DOCUMENT_14
FLAG_DOCUMENT_15
FLAG_DOCUMENT_16
FLAG_DOCUMENT_17
FLAG_DOCUMENT_18
FLAG_DOCUMENT_19
FLAG_DOCUMENT_20
FLAG_DOCUMENT_21
AMT_REQ_CREDIT_BUREAU_HOUR
AMT_REQ_CREDIT_BUREAU_DAY
AMT_REQ_CREDIT_BUREAU_WEEK
AMT_REQ_CREDIT_BUREAU_MON
AMT_REQ_CREDIT_BUREAU_QRT
AMT_REQ_CREDIT_BUREAU_YEAR
TASK I: DETERMININING DEFAULT
The task is to develop a classifier that is able to determine whether a customer will
default the loan payments. The students should build a predictive model on the
training set and then apply their predictive model to the scoring set.
The winning team is the team with the highest F-measure value.
𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 =
2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
For each of the models you train, create a table that includes the model name,
accuracy, precision, recall, and f-measure. Out of all the models highlight the one you
consider your top model. You will use this model to score the scoring dataset that
was provided to you.
TASK II: DETERMININING CHARACTERISTICS OF CUSTOMERS
THAT DEFAULT
In task 2, the teams will conduct a customer segmentation that describes different
personas that default. Choose an optimal number of segments and label them in a
meaningful way.
WRITTEN REPORT
Description:
By the final week, students will be required to submit a full written report of the
machine learning project. The project will serve as a practical learning experience in
understanding various aspects in machine learning applied in a business setting.
Groups will report significant discoveries that were found and should be able to
describe the ML process and the potential benefits of expected findings.
The document should summarize the findings and archive the processes and
method used:
1. Identifying the business problem
2. Data Understanding and data Cleaning
3. Feature Selection and Feature Engineering
4. Model Building and Evaluation (I will expect to see the results of your 3 best
models)
5. Scoring the Dataset
6. Create a business strategy based on the insights found
Grading:
The grade will be based on the clarity, completeness, and demonstrated
understanding of the written report. Each team member will be expected to
participate equally in preparing the report. Students are expected to use graphs and
charts where useful.
•
Data Understanding and Cleansing
o Exploratory data analysis
▪ Frequency of the target variable
▪ Missing values, Duplicates
▪ Relationship between variables
▪ Outliers
•
Feature selection/engineering:
o Different methods for feature selection were used
o Engineering of new features
o List of top features were included
•
Model evaluation
o Multiple algorithms (classifiers) were used
o Which parameters were important in improving the different models
(optimization – parameter tuning)
•
Scoring the dataset
o Student submitted the scoring set with their predictions
•
Project Manual
o What is the business problem
o Process followed
o Describe the different components above
o What interesting insights/recommendations would you report to HomeCredit
based on your findings?
The Oral presentation grade will be based on instructor grading, which will consider
client feedback and peer evaluations.
GRADING RUBRIC FOR PROJECT
22/25 for project, 3 points for peer evaluation
Team peer evaluation
2 – Excellent
1 – Average
0 – Poor
Area
Data Understanding
and Cleansing
Points Assigned
4
Feature
4
Selection/Engineering
Model Evaluation
3
Scoring of Dataset
3
Presentation
3
User’s Manual
including Business
problem and strategy
5
Total
22
Points
Ranging from 4
excellent to 0, not
completed
Ranging from 4
excellent to 0, not
completed
Ranging from 3
excellent to 0, not
completed
Ranging from 3 all
submitted to 0, not
completed
Ranging from 5
excellent to 0, not
completed
Ranging from 5
Professional to 0
Unprofessional
Score
Three points are reserved for team evaluation
Tools:
The student is allowed to use any of the tools described in class. Make sure you
include screenshots at the different stages of the process in the documentation.