rapidminer model improvement

jraqllll
timer Asked: Dec 1st, 2018

Question Description

already have one model , need to improve the f measure .

can't upload the file because it's too big .

Unformatted Attachment Preview

AttributeID Row 1 ID 2 5 6 7 8 9 10 11 12 13 14 15 16 17 18 TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY AMT_GOODS_PRICE NAME_TYPE_SUITE NAME_INCOME_TYPE NAME_EDUCATION_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE 19 20 21 22 REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 DAYS_ID_PUBLISH OWN_CAR_AGE FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE FLAG_PHONE FLAG_EMAIL OCCUPATION_TYPE CNT_FAM_MEMBERS REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY WEEKDAY_APPR_PROCESS_START HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION REG_REGION_NOT_WORK_REGION LIVE_REGION_NOT_WORK_REGION REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY ORGANIZATION_TYPE EXT_SOURCE_1 EXT_SOURCE_2 46 EXT_SOURCE_3 47 APARTMENTS_AVG 48 BASEMENTAREA_AVG 49 YEARS_BEGINEXPLUATATION_AVG 50 YEARS_BUILD_AVG 51 COMMONAREA_AVG 52 ELEVATORS_AVG 53 ENTRANCES_AVG 54 FLOORSMAX_AVG 55 FLOORSMIN_AVG 56 LANDAREA_AVG 57 LIVINGAPARTMENTS_AVG 58 LIVINGAREA_AVG 59 NONLIVINGAPARTMENTS_AVG 60 NONLIVINGAREA_AVG 61 APARTMENTS_MODE 62 BASEMENTAREA_MODE 63 YEARS_BEGINEXPLUATATION_MODE 64 YEARS_BUILD_MODE 65 COMMONAREA_MODE 66 ELEVATORS_MODE 67 ENTRANCES_MODE 68 FLOORSMAX_MODE 69 FLOORSMIN_MODE 70 LANDAREA_MODE 71 LIVINGAPARTMENTS_MODE 72 LIVINGAREA_MODE 73 NONLIVINGAPARTMENTS_MODE 74 NONLIVINGAREA_MODE 75 APARTMENTS_MEDI 76 BASEMENTAREA_MEDI 77 YEARS_BEGINEXPLUATATION_MEDI 78 YEARS_BUILD_MEDI 79 COMMONAREA_MEDI 80 ELEVATORS_MEDI 81 ENTRANCES_MEDI 82 FLOORSMAX_MEDI 83 FLOORSMIN_MEDI 84 LANDAREA_MEDI 85 LIVINGAPARTMENTS_MEDI 86 LIVINGAREA_MEDI 87 NONLIVINGAPARTMENTS_MEDI 88 NONLIVINGAREA_MEDI 89 FONDKAPREMONT_MODE 90 HOUSETYPE_MODE 91 TOTALAREA_MODE 92 WALLSMATERIAL_MODE 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 EMERGENCYSTATE_MODE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE DAYS_LAST_PHONE_CHANGE FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_6 FLAG_DOCUMENT_7 FLAG_DOCUMENT_8 FLAG_DOCUMENT_9 FLAG_DOCUMENT_10 FLAG_DOCUMENT_11 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_14 FLAG_DOCUMENT_15 FLAG_DOCUMENT_16 FLAG_DOCUMENT_17 FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 AMT_REQ_CREDIT_BUREAU_HOUR 120 AMT_REQ_CREDIT_BUREAU_DAY 121 AMT_REQ_CREDIT_BUREAU_WEEK 122 AMT_REQ_CREDIT_BUREAU_MON 123 AMT_REQ_CREDIT_BUREAU_QRT 124 AMT_REQ_CREDIT_BUREAU_YEAR Description ID of loan in our sample Target variable (1 - client with payment difficulties: he/she had late payment more than X days on at least one of the first Y installments of the loan in our sample, 0 - all other cases) Identification if loan is cash or revolving Gender of the client Flag if the client owns a car Flag if client owns a house or flat Number of children the client has Income of the client Credit amount of the loan Loan annuity For consumer loans it is the price of the goods for which the loan is given Who was accompanying client when he was applying for the loan Clients income type (businessman, working, maternity leave,…) Level of highest education the client achieved Family status of the client What is the housing situation of the client (renting, living with parents, ...) Normalized population of region where client lives (higher number means the client lives in more populated region) Client's age in days at the time of application How many days before the application the person started current employment How many days before the application did client change his registration How many days before the application did client change the identity document with which he applied for the loan Age of client's car Did client provide mobile phone (1=YES, 0=NO) Did client provide work phone (1=YES, 0=NO) Did client provide home phone (1=YES, 0=NO) Was mobile phone reachable (1=YES, 0=NO) Did client provide home phone (1=YES, 0=NO) Did client provide email (1=YES, 0=NO) What kind of occupation does the client have How many family members does client have Our rating of the region where client lives (1,2,3) Our rating of the region where client lives with taking city into account (1,2,3) On which day of the week did the client apply for the loan Approximately at what hour did the client apply for the loan Flag if client's permanent address does not match contact address (1=different, 0=same, at region level) Flag if client's permanent address does not match work address (1=different, 0=same, at region level) Flag if client's contact address does not match work address (1=different, 0=same, at region level) Flag if client's permanent address does not match contact address (1=different, 0=same, at city level) Flag if client's permanent address does not match work address (1=different, 0=same, at city level) Flag if client's contact address does not match work address (1=different, 0=same, at city level) Type of organization where client works Normalized score from external data source Normalized score from external data source Normalized score from external data source Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor How many observation of client's social surroundings with observable 30 DPD (days past due) default How many observation of client's social surroundings defaulted on 30 DPD (days past due) How many observation of client's social surroundings with observable 60 DPD (days past due) default How many observation of client's social surroundings defaulted on 60 (days past due) DPD How many days before application did client change phone Did client provide document 2 Did client provide document 3 Did client provide document 4 Did client provide document 5 Did client provide document 6 Did client provide document 7 Did client provide document 8 Did client provide document 9 Did client provide document 10 Did client provide document 11 Did client provide document 12 Did client provide document 13 Did client provide document 14 Did client provide document 15 Did client provide document 16 Did client provide document 17 Did client provide document 18 Did client provide document 19 Did client provide document 20 Did client provide document 21 Number of enquiries to Credit Bureau about the client one hour before application Number of enquiries to Credit Bureau about the client one day before application (excluding one hour before application) Number of enquiries to Credit Bureau about the client one week before application (excluding one day before application) Number of enquiries to Credit Bureau about the client one month before application (excluding one week before application) Number of enquiries to Credit Bureau about the client 3 month before application (excluding one month before application) Number of enquiries to Credit Bureau about the client one day year (excluding last 3 months before application) Special normalized time only relative to the application time only relative to the application time only relative to the application time only relative to the application rounded normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized normalized CIS9557 GROUP PROJECT: HOME CREDIT DEFAULT RISK INTRODUCTION Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders. Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data-including telco and transactional information--to predict their clients' repayment abilities and ensure that clients capable of repayment are not rejected and that loans given with a principal, maturity, and repayment calendar that will empower their clients to be successful. DATASET The data contains the following attributes. The target variable (‘TARGET’) refers to whether someone defaults or not the loan (based on historical data). ID TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY AMT_GOODS_PRICE NAME_TYPE_SUITE NAME_INCOME_TYPE NAME_EDUCATION_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH OWN_CAR_AGE FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE FLAG_PHONE FLAG_EMAIL OCCUPATION_TYPE CNT_FAM_MEMBERS REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY WEEKDAY_APPR_PROCESS_START HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION REG_REGION_NOT_WORK_REGION LIVE_REGION_NOT_WORK_REGION REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY ORGANIZATION_TYPE EXT_SOURCE_1 EXT_SOURCE_2 EXT_SOURCE_3 APARTMENTS_AVG BASEMENTAREA_AVG YEARS_BEGINEXPLUATATION_AVG YEARS_BUILD_AVG COMMONAREA_AVG ELEVATORS_AVG ENTRANCES_AVG FLOORSMAX_AVG FLOORSMIN_AVG LANDAREA_AVG LIVINGAPARTMENTS_AVG LIVINGAREA_AVG NONLIVINGAPARTMENTS_AVG NONLIVINGAREA_AVG APARTMENTS_MODE BASEMENTAREA_MODE YEARS_BEGINEXPLUATATION_MODE YEARS_BUILD_MODE COMMONAREA_MODE ELEVATORS_MODE ENTRANCES_MODE FLOORSMAX_MODE FLOORSMIN_MODE LANDAREA_MODE LIVINGAPARTMENTS_MODE LIVINGAREA_MODE NONLIVINGAPARTMENTS_MODE NONLIVINGAREA_MODE APARTMENTS_MEDI BASEMENTAREA_MEDI YEARS_BEGINEXPLUATATION_MEDI YEARS_BUILD_MEDI COMMONAREA_MEDI ELEVATORS_MEDI ENTRANCES_MEDI FLOORSMAX_MEDI FLOORSMIN_MEDI LANDAREA_MEDI LIVINGAPARTMENTS_MEDI LIVINGAREA_MEDI NONLIVINGAPARTMENTS_MEDI NONLIVINGAREA_MEDI FONDKAPREMONT_MODE HOUSETYPE_MODE TOTALAREA_MODE WALLSMATERIAL_MODE EMERGENCYSTATE_MODE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE DAYS_LAST_PHONE_CHANGE FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_6 FLAG_DOCUMENT_7 FLAG_DOCUMENT_8 FLAG_DOCUMENT_9 FLAG_DOCUMENT_10 FLAG_DOCUMENT_11 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_14 FLAG_DOCUMENT_15 FLAG_DOCUMENT_16 FLAG_DOCUMENT_17 FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR TASK I: DETERMININING DEFAULT The task is to develop a classifier that is able to determine whether a customer will default the loan payments. The students should build a predictive model on the training set and then apply their predictive model to the scoring set. The winning team is the team with the highest F-measure value. 𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 For each of the models you train, create a table that includes the model name, accuracy, precision, recall, and f-measure. Out of all the models highlight the one you consider your top model. You will use this model to score the scoring dataset that was provided to you. TASK II: DETERMININING CHARACTERISTICS OF CUSTOMERS THAT DEFAULT In task 2, the teams will conduct a customer segmentation that describes different personas that default. Choose an optimal number of segments and label them in a meaningful way. WRITTEN REPORT Description: By the final week, students will be required to submit a full written report of the machine learning project. The project will serve as a practical learning experience in understanding various aspects in machine learning applied in a business setting. Groups will report significant discoveries that were found and should be able to describe the ML process and the potential benefits of expected findings. The document should summarize the findings and archive the processes and method used: 1. Identifying the business problem 2. Data Understanding and data Cleaning 3. Feature Selection and Feature Engineering 4. Model Building and Evaluation (I will expect to see the results of your 3 best models) 5. Scoring the Dataset 6. Create a business strategy based on the insights found Grading: The grade will be based on the clarity, completeness, and demonstrated understanding of the written report. Each team member will be expected to participate equally in preparing the report. Students are expected to use graphs and charts where useful. • Data Understanding and Cleansing o Exploratory data analysis ▪ Frequency of the target variable ▪ Missing values, Duplicates ▪ Relationship between variables ▪ Outliers • Feature selection/engineering: o Different methods for feature selection were used o Engineering of new features o List of top features were included • Model evaluation o Multiple algorithms (classifiers) were used o Which parameters were important in improving the different models (optimization – parameter tuning) • Scoring the dataset o Student submitted the scoring set with their predictions • Project Manual o What is the business problem o Process followed o Describe the different components above o What interesting insights/recommendations would you report to HomeCredit based on your findings? The Oral presentation grade will be based on instructor grading, which will consider client feedback and peer evaluations. GRADING RUBRIC FOR PROJECT 22/25 for project, 3 points for peer evaluation Team peer evaluation 2 – Excellent 1 – Average 0 – Poor Area Data Understanding and Cleansing Points Assigned 4 Feature 4 Selection/Engineering Model Evaluation 3 Scoring of Dataset 3 Presentation 3 User’s Manual including Business problem and strategy 5 Total 22 Points Ranging from 4 excellent to 0, not completed Ranging from 4 excellent to 0, not completed Ranging from 3 excellent to 0, not completed Ranging from 3 all submitted to 0, not completed Ranging from 5 excellent to 0, not completed Ranging from 5 Professional to 0 Unprofessional Score Three points are reserved for team evaluation Tools: The student is allowed to use any of the tools described in class. Make sure you include screenshots at the different stages of the process in the documentation.
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

This question has not been answered.

Create a free account to get help with this and any other question!

Related Tags

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors