# Clustering in R

*label*Programming

*timer*Asked: Apr 11th, 2017

**Question description**

** Classification **

Q1. There are 3 columns V1 (Categorical) V2 (Binary) and V3 (Numerical). Find the values of Gini Index and Information Gain?

V1 V2 V3

A 0 33

A 0 54

A 0 56

A 0 42

A 1 50

B 1 55

B 1 31

B 0 4

B 1 77

B 0 49

- Create and read the above data set in R
- Write a process to calculate Gini Index and Find the values of V3 >= 32 and V3 < 65?
- Write a process to calculate Information Gain and Find the values of V3 >=32 and V3 < 65?

Hint: Check all the possibility with V2. The lower the value, the better the results.

Example: Find the value for V3 >= 50

If V2= 0 & V3 >= 50 =?

If V2= 1 & V3 >= 50 =?

Q2. Implement Decision Tree in R.

Create this dataset (Copy this below piece of code and paste it in your R)

data <- data.frame(

InsuranceID = c(1,2,3,4,5,6,7,8,9,10),

Vehicle_Damage = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),

Self_Injury = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),

Claimer_Frequency = factor(c("active", "very active", "very active", "inactive", "very inactive"

, "inactive", "very inactive", "active", "active", "very active"),

levels=c("very inactive", "inactive", "active", "very active"),

ordered=TRUE),

Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))

This data set is used by an Insurance Company to check whether the Claim is fraud or not.

Insurance ID: The Claim ID.

Vehicle Damage: Is the vehicle damage or not?

Self-Injury: Is the person injured or not?

Claimer Frequency: Frequency of the person claiming the Insurance.

Fraud: If this is fraud or not?

Q1. Implement Decision Trees (Hint: use Rpart)

Q2. Find whether the fraud is done or not for Insurance ID 2 and 7?

Q3. Show the Decision Trees by using visualization.

Q4. Implement Naïve Bayes using ‘Hair Eye Colour’ dataset in R

data("HairEyeColor")

Q1. Implement Naïve Bayes (Hint: use e1071)

Q2. Show the Naïve Bayes by using visualization.

Q3. Predict and Create Confusion Matrix.

Q4. Find Accuracy, Precision, Recall based on Confusion matrix.