Python recommender system

timer Asked: Nov 23rd, 2014

Question description

I need someone who actually has python programming experience...

Official instrutions here...extra details below

Item-Based Joke Recommendation [Dataset:]

  1. For this problem you will use a modified version of the item-based recommender algorithm from Ch. 14 of Machine Learning in Action and use it on joke ratings data based on Jester Online Joke Recommender System. The modified version of the code is provided in the module  Most of the module will be used as is, but you will add some additional functionality.

The data set contains two files. The file "modified_jester_data.csv" contains the ratings on 100 jokes by 1000 users (each row is a user profile). The ratings have been normalized to be between 1 and 21 (a 20-point scale), with 1 being the lowest rating. A zero indicated a missing rating. The file "jokes.csv" contains the joke ids mapped to the actual text of the jokes.

Your tasks in this problem are the following (please also see comments for the function stubs in the provided module):

  1. Load in the joke ratings data and the joke text data into appropriate data structures.
  2. Complete the definition for the function "test". This function iterates over all users and for each perform cross-validation on items (by calling the provided "cross_validate_user" function), and returns the error information necessary to compute Mean Absolute Error (MAE). Use this function to perform 5-fold cross-validation (i.e., 20% test-ratio) comparing MAE results using standard item-based collaborative filtering (based on the rating prediction function "standEst") with results using the SVD-based version of the rating item-based CF (using "svdEst" as the prediction engine). [Note: See comments provided in the module for hints on accomplishing these tasks.]
  3. Write a new function "print_most_similar_jokes" which takes the joke ratings data, a query joke id, a parameter k for the number of nearest neighbors, and a similarity metric function, and prints the text of the query joke as well as the texts of the top k most similar jokes based on user ratings. [Note: For hints on how to accomplish this task, please see comments at the end of the provided module as well as comments for the provided stub function.]

The file needs to be finished. I partially finished it but I need someone to help me with the rest. It should be fairly simple to do. I need to create 2 functions that are located in the file called test and print similar jokes

here are the other files.

Basically i need to fix that file so it will do the output as noted below.

here is some code that may help


I need this done in a iPython notebook. At the bottom of this matrix factorization code you will see some usfull code of iterating through the users and adding up the error etc.

here is the files with my chnges parially entered.

Here are the instructions:

def test(dataMat, test_ratio, estMethod):
    # Write this function to iterate over all users and for each perform cross-validation on items by calling
# the above cross-validation function on each user.
# MAE will be the ratio of total error across all test cases to the total number of test cases, for all users

The print similar jokes instructions are here and in the file

def print_most_similar_jokes(dataMat, jokes, queryJoke, k, metric=pearsSim):
# Write this function to find the k most similar jokes (based on user ratings) to a queryJoke
# The queryJoke is a joke id as given in the 'jokes.csv' file (an corresponding to the a column in dataMat)
# You must compare ratings for the queryJoke (the column in dataMat corresponding to the joke), to all
# other joke rating vectors and return the top k. Note that this is the same as performing KNN on the 
    # columns of dataMat. The function must retrieve the text of the joke from 'jokes.csv' file and print both
# the queryJoke text as well as the text of the returned jokes.

Basically at the end of the file it states what he wants to see when running this code

# dataMat = genfromtxt('modified_jester_data.csv',delimiter=',')
# test(dataMat, 0.2, svdEst)
# test(dataMat, 0.2, standEst)
# jokes = load_jokes('jokes.csv')
# print_most_similar_jokes(dataMat, jokes, 3, 5, pearsSim)
''' See example output below:
Selected joke: 
Q. What's the difference between a man and a toilet? A. A toilet doesn't follow you around after you use it.
Top 5 Recommended jokes are :
Q: What's the difference between a Lawyer and a Plumber? A: A Plumber works to unclog the system. 
What do you call an American in the finals of the world cup? "Hey Beer Man!" 
Q. What's 200 feet long and has 4 teeth? <P>A. The front row at a Willie Nelson Concert. 
A country guy goes into a city bar that has a dress code and the maitred' demands he wear a tie. Discouraged the guy goes to his car to sulk when inspiration strikes: He's got jumper cables in the trunk! So he wrapsthem around his neck sort of like a string tie (a bulky string tie to be sure) and returns to the bar. The maitre d' is reluctant but says to the guy "Okay you're a pretty resourceful fellow you can come in... but just don't start anything"!  
What do you get when you run over a parakeet with a lawnmower? <P>Shredded tweet. 

Studypool has helped 1,244,100 students
flag Report DMCA
Similar Questions
Hot Questions
Related Tags

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors