Harvard University R Programming Machine Learning Data Science Project

User Generated

Fhzzrelobb

Programming

Harvard University

Description

I have an Data Science project where I attempt to predict the likelihood that a name is gender neutral. I have a logistic regression in my project but I need another algorithm and I cannot decide if I can use Random Forest or Naive Bayes or any other one. I need to demonstrate another algorithm, two more if possible. Code is in R.

install.packages('devtools')

library(devtools)

install_github("https://github.com/hadley/babynames")

library(babynames)

library(dplyr)

library(tidyr)

data(babynames)

neutral_names <- babynames %>%

select(-prop) %>%

#filter only names between years 1930 and 2012

filter(year >= 1930, year <= 2012) %>%

#get the number of female and male for each name per year

spread(key = sex, value = n, fill = 0) %>%

#Calculate the measure of gender-neutrality

mutate(prop_F = 100 * F / (F+M), se = (50 - prop_F)^2) %>%

group_by(name) %>%

#per name, find the total number of babies and measure of gender-neutrality

summarise(n = n(), female = sum(F), male=sum(M), total = sum(F + M),

mse = mean(se)) %>%

#take only names that occurs every year and occurs greater than 9000 times

filter(n == 83, total > 9000) %>%

#sort by gender neutrality

arrange(mse) %>%

#get only the top 10

head(10)

neutral_names

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and answer. Let m...


Anonymous
This is great! Exactly what I wanted.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags