# Clustering in R

Anonymous
timer Asked: Apr 11th, 2017

Question description

Classification

Q1. There are 3 columns V1 (Categorical) V2 (Binary) and V3 (Numerical). Find the values of Gini Index and Information Gain?

V1 V2 V3

A 0 33

A 0 54

A 0 56

A 0 42

A 1 50

B 1 55

B 1 31

B 0 4

B 1 77

B 0 49

• Create and read the above data set in R
• Write a process to calculate Gini Index and Find the values of V3 >= 32 and V3 < 65?
• Write a process to calculate Information Gain and Find the values of V3 >=32 and V3 < 65?

Hint: Check all the possibility with V2. The lower the value, the better the results.

Example: Find the value for V3 >= 50

If V2= 0 & V3 >= 50 =?

If V2= 1 & V3 >= 50 =?

Q2. Implement Decision Tree in R.

Create this dataset (Copy this below piece of code and paste it in your R)

data <- data.frame(

InsuranceID = c(1,2,3,4,5,6,7,8,9,10),

Vehicle_Damage = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),

Self_Injury = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),

Claimer_Frequency = factor(c("active", "very active", "very active", "inactive", "very inactive"

, "inactive", "very inactive", "active", "active", "very active"),

levels=c("very inactive", "inactive", "active", "very active"),

ordered=TRUE),

Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))

This data set is used by an Insurance Company to check whether the Claim is fraud or not.

Insurance ID: The Claim ID.

Vehicle Damage: Is the vehicle damage or not?

Self-Injury: Is the person injured or not?

Claimer Frequency: Frequency of the person claiming the Insurance.

Fraud: If this is fraud or not?

Q1. Implement Decision Trees (Hint: use Rpart)

Q2. Find whether the fraud is done or not for Insurance ID 2 and 7?

Q3. Show the Decision Trees by using visualization.

Q4. Implement Naïve Bayes using ‘Hair Eye Colour’ dataset in R

data("HairEyeColor")

Q1. Implement Naïve Bayes (Hint: use e1071)

Q2. Show the Naïve Bayes by using visualization.

Q3. Predict and Create Confusion Matrix.

Q4. Find Accuracy, Precision, Recall based on Confusion matrix.

Studypool has helped 1,244,100 students
flag Report DMCA

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors