Clustering in R

Anonymous
timer Asked: Apr 11th, 2017

Question description

Classification

Q1. There are 3 columns V1 (Categorical) V2 (Binary) and V3 (Numerical). Find the values of Gini Index and Information Gain?

V1 V2 V3

A 0 33

A 0 54

A 0 56

A 0 42

A 1 50

B 1 55

B 1 31

B 0 4

B 1 77

B 0 49

  • Create and read the above data set in R
  • Write a process to calculate Gini Index and Find the values of V3 >= 32 and V3 < 65?
  • Write a process to calculate Information Gain and Find the values of V3 >=32 and V3 < 65?

Hint: Check all the possibility with V2. The lower the value, the better the results.

Example: Find the value for V3 >= 50

If V2= 0 & V3 >= 50 =?

If V2= 1 & V3 >= 50 =?




Q2. Implement Decision Tree in R.

Create this dataset (Copy this below piece of code and paste it in your R)

data <- data.frame(

InsuranceID = c(1,2,3,4,5,6,7,8,9,10),

Vehicle_Damage = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),

Self_Injury = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),

Claimer_Frequency = factor(c("active", "very active", "very active", "inactive", "very inactive"

, "inactive", "very inactive", "active", "active", "very active"),

levels=c("very inactive", "inactive", "active", "very active"),

ordered=TRUE),

Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))

This data set is used by an Insurance Company to check whether the Claim is fraud or not.

Insurance ID: The Claim ID.

Vehicle Damage: Is the vehicle damage or not?

Self-Injury: Is the person injured or not?

Claimer Frequency: Frequency of the person claiming the Insurance.

Fraud: If this is fraud or not?

Q1. Implement Decision Trees (Hint: use Rpart)

Q2. Find whether the fraud is done or not for Insurance ID 2 and 7?

Q3. Show the Decision Trees by using visualization.

Q4. Implement Naïve Bayes using ‘Hair Eye Colour’ dataset in R

data("HairEyeColor")

Q1. Implement Naïve Bayes (Hint: use e1071)

Q2. Show the Naïve Bayes by using visualization.

Q3. Predict and Create Confusion Matrix.

Q4. Find Accuracy, Precision, Recall based on Confusion matrix.

Studypool has helped 1,244,100 students
flag Report DMCA
Similar Questions
Hot Questions
Related Tags

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors