Description
I have an Data Science project where I attempt to predict the likelihood that a name is gender neutral. I have a logistic regression in my project but I need another algorithm and I cannot decide if I can use Random Forest or Naive Bayes or any other one. I need to demonstrate another algorithm, two more if possible. Code is in R.
install.packages('devtools')
library(devtools)
install_github("https://github.com/hadley/babynames")
library(babynames)
library(dplyr)
library(tidyr)
data(babynames)
neutral_names <- babynames %>%
select(-prop) %>%
#filter only names between years 1930 and 2012
filter(year >= 1930, year <= 2012) %>%
#get the number of female and male for each name per year
spread(key = sex, value = n, fill = 0) %>%
#Calculate the measure of gender-neutrality
mutate(prop_F = 100 * F / (F+M), se = (50 - prop_F)^2) %>%
group_by(name) %>%
#per name, find the total number of babies and measure of gender-neutrality
summarise(n = n(), female = sum(F), male=sum(M), total = sum(F + M),
mse = mean(se)) %>%
#take only names that occurs every year and occurs greater than 9000 times
filter(n == 83, total > 9000) %>%
#sort by gender neutrality
arrange(mse) %>%
#get only the top 10
head(10)
neutral_names
Explanation & Answer
View attached explanation and answer. Let m...
Review
Review
24/7 Homework Help
Stuck on a homework question? Our verified tutors can answer all questions, from basic math to advanced rocket science!