Q1. There are 3 columns V1 (Categorical) V2 (Binary) and V3 (Numerical). Find the values of Gini Index and Information Gain?
V1 V2 V3
A 0 33
A 0 54
A 0 56
A 0 42
A 1 50
B 1 55
B 1 31
B 0 4
B 1 77
B 0 49
- Create and read the above data set in R
- Write a process to calculate Gini Index and Find the values of V3 >= 32 and V3 < 65?
- Write a process to calculate Information Gain and Find the values of V3 >=32 and V3 < 65?
Hint: Check all the possibility with V2. The lower the value, the better the results.
Example: Find the value for V3 >= 50
If V2= 0 & V3 >= 50 =?
If V2= 1 & V3 >= 50 =?
Q2. Implement Decision Tree in R.
Create this dataset (Copy this below piece of code and paste it in your R)
data <- data.frame(
InsuranceID = c(1,2,3,4,5,6,7,8,9,10),
Vehicle_Damage = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
Self_Injury = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
Claimer_Frequency = factor(c("active", "very active", "very active", "inactive", "very inactive"
, "inactive", "very inactive", "active", "active", "very active"),
levels=c("very inactive", "inactive", "active", "very active"),
Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
This data set is used by an Insurance Company to check whether the Claim is fraud or not.
Insurance ID: The Claim ID.
Vehicle Damage: Is the vehicle damage or not?
Self-Injury: Is the person injured or not?
Claimer Frequency: Frequency of the person claiming the Insurance.
Fraud: If this is fraud or not?
Q1. Implement Decision Trees (Hint: use Rpart)
Q2. Find whether the fraud is done or not for Insurance ID 2 and 7?
Q3. Show the Decision Trees by using visualization.
Q4. Implement Naïve Bayes using ‘Hair Eye Colour’ dataset in R
Q1. Implement Naïve Bayes (Hint: use e1071)
Q2. Show the Naïve Bayes by using visualization.
Q3. Predict and Create Confusion Matrix.
Q4. Find Accuracy, Precision, Recall based on Confusion matrix.