Description
Week 9: Lab - Using aRules on the Titanic dataset
[Name]
[Date]
Instructions
Use the Titanic dataset to explore descriptive statistics, functions, and association rules. Download the titanic dataset titanic.raw.rdata
from one of two locations:
- https://github.com/ethen8181/machine-learning/tree/master/association_rule/R
- https://sites.google.com/a/rdatamining.com/www/data/titanic.raw.rdata?attredirects=1
Note that it is not a cvs file, but rather an RData workspace. So, to load the data (assuming you saved it to the project’s data folder), one would do:
load("data/titanic.raw.rdata")
You need to look at titanic.raw (the name of the R dataset)
t <- titanic.raw
Now that you have the datafile, do some descriptive statistics, getting some extra practice using R. Take the Quiz for Step 1 and 2. However your submission MUST include the process of you calculating the following values in Step 1 and 2.
# Add your library below.
Step 0 - Load the data
Using the instructions above, load the dataset and save it as t
.
# Write your code below.
Step 1 - Descriptive stats (0.5 point for each answer)
- Compute the percentage of people that survived.
- Compute the percentage of people that were children.
- Compute the percentage of people that were female.
- Finally, compute the percentage of people that were in first class.
# Write your code below.
Step 2 - More descriptive stats (0.5 point for each answer)
- What percentage of children survived? Your answer should be written such as # 13.75% of children survived
- What percentage of female survived?
- What percentage of first-class people survived?
- What percentage of third-class people survived?
# Write your code below.
Step 3 - Writing a function (0.5 point for each answer)
Step 3.1 - Function 1
Write a function that returns a new dataframe of people that satisfies the specified criteria of sex, age, class and survived as parameters. I’m giving you the answer for this question:
myfunction1 <- function(a,b,c,d){
df1 <- t[t$Class == a,] # filter the data that satisfied the criteria that "Class" = a
df2 <- df1[df1$Sex == b,] # filter the data that satisfied the criteria that "Sex" = b
df3 <- df2[df2$Age == c,] # filter the data that satisfied the criteria that "Age" = c
df4 <- df3[df3$Survived == d,] # filter the data that satisfied the criteria that "Survived" = d
return(df4)}
# test the function with a sample data
myfunction1("1st","Female","Adult","No")
# Write your code below.
Step 3.2 - Function 2
Write a function, using the previous function, that calculates the percentage (who lives, who dies) for a specified (parameters) of class, gender and age considering the entire number of data. The function passes four arguments. Include the following code properly in your function by improvising names of objects.
p <- nrow(df)/nrow(t) # calculate the percentage
# Write your code below.
Step 3.3 - Use the function (male)
Use the function to compare age and third-class male survival rates.
# Write your code below.
People in which category are more likely to survive?
[Type your answer here]
Step 3.4 - Use the function (female)
Use the function to compare age and first-class female survival rates.
# Write your code below.
People in which category are more likely to survive?
[Type your answer here]
Step 4 - Use aRules (0.5 point for each answer)
- Use aRules to calculate some rules (clusters) for the titanic dataset.
- Visualize the results.
- Pick the three most interesting and useful rules. Explain these rules using natural language. Answer this in the space provided below.
- How does this compare to the descriptive analysis we did on the same data set? Think critically. What was possible using one method that was not possible using the other method? Answer this in the space provided below.
# Write your code below.
Answer part 3 and 4 below.
[Type your answers here]