Official instrutions here...extra details below
Item-Based Joke Recommendation [Dataset: jokes.zip]
- For this problem you will use a modified version of the item-based recommender algorithm from Ch. 14 of Machine Learning in Action and use it on joke ratings data based on Jester Online Joke Recommender System. The modified version of the code is provided in the module itemBasedRec.py. Most of the module will be used as is, but you will add some additional functionality.
The data set contains two files. The file "modified_jester_data.csv" contains the ratings on 100 jokes by 1000 users (each row is a user profile). The ratings have been normalized to be between 1 and 21 (a 20-point scale), with 1 being the lowest rating. A zero indicated a missing rating. The file "jokes.csv" contains the joke ids mapped to the actual text of the jokes.
Your tasks in this problem are the following (please also see comments for the function stubs in the provided module):
- Load in the joke ratings data and the joke text data into appropriate data structures.
- Complete the definition for the function "test". This function iterates over all users and for each perform cross-validation on items (by calling the provided "cross_validate_user" function), and returns the error information necessary to compute Mean Absolute Error (MAE). Use this function to perform 5-fold cross-validation (i.e., 20% test-ratio) comparing MAE results using standard item-based collaborative filtering (based on the rating prediction function "standEst") with results using the SVD-based version of the rating item-based CF (using "svdEst" as the prediction engine). [Note: See comments provided in the module for hints on accomplishing these tasks.]
- Write a new function "print_most_similar_jokes" which takes the joke ratings data, a query joke id, a parameter k for the number of nearest neighbors, and a similarity metric function, and prints the text of the query joke as well as the texts of the top k most similar jokes based on user ratings. [Note: For hints on how to accomplish this task, please see comments at the end of the provided module as well as comments for the provided stub function.]
The file itembasedrec.py needs to be finished. I partially finished it but I need someone to help me with the rest. It should be fairly simple to do. I need to create 2 functions that are located in the itembasedrec.py file called test and print similar jokes
here are the other files.
Basically i need to fix that itemBasedRec.py file so it will do the output as noted below.
here is some code that may help
I need this done in a iPython notebook. At the bottom of this matrix factorization code you will see some usfull code of iterating through the users and adding up the error etc.
here is the itembasedrec.py files with my chnges parially entered.
Here are the instructions:
def test(dataMat, test_ratio, estMethod):
# Write this function to iterate over all users and for each perform cross-validation on items by calling
# the above cross-validation function on each user.
# MAE will be the ratio of total error across all test cases to the total number of test cases, for all users
The print similar jokes instructions are here and in the file
def print_most_similar_jokes(dataMat, jokes, queryJoke, k, metric=pearsSim):
# Write this function to find the k most similar jokes (based on user ratings) to a queryJoke
# The queryJoke is a joke id as given in the 'jokes.csv' file (an corresponding to the a column in dataMat)
# You must compare ratings for the queryJoke (the column in dataMat corresponding to the joke), to all
# other joke rating vectors and return the top k. Note that this is the same as performing KNN on the
# columns of dataMat. The function must retrieve the text of the joke from 'jokes.csv' file and print both
# the queryJoke text as well as the text of the returned jokes.
Basically at the end of the file it states what he wants to see when running this code
# dataMat = genfromtxt('modified_jester_data.csv',delimiter=',')
# test(dataMat, 0.2, svdEst)
# test(dataMat, 0.2, standEst)
# jokes = load_jokes('jokes.csv')
# print_most_similar_jokes(dataMat, jokes, 3, 5, pearsSim)
''' See example output below:
Q. What's the difference between a man and a toilet? A. A toilet doesn't follow you around after you use it.
Top 5 Recommended jokes are :
Q: What's the difference between a Lawyer and a Plumber? A: A Plumber works to unclog the system.
What do you call an American in the finals of the world cup? "Hey Beer Man!"
Q. What's 200 feet long and has 4 teeth? <P>A. The front row at a Willie Nelson Concert.
A country guy goes into a city bar that has a dress code and the maitred' demands he wear a tie. Discouraged the guy goes to his car to sulk when inspiration strikes: He's got jumper cables in the trunk! So he wrapsthem around his neck sort of like a string tie (a bulky string tie to be sure) and returns to the bar. The maitre d' is reluctant but says to the guy "Okay you're a pretty resourceful fellow you can come in... but just don't start anything"!
What do you get when you run over a parakeet with a lawnmower? <P>Shredded tweet.