I have a project that I am working on that involves building a recommender system using movie ratings data. I am look for someone to help me with this and that HAS experience with machine learning and data mining techniques in Python using K nearest neighbor functions, clustering, collaborative filtering, etc. I believe numpy, scikit learn, and sklearn will be used here...a couple others will be used too. I have a lot of sample code that can be used for this and I believe I have all the resources necessary to complete it but low on time and understanding. If you can include similar work with your bid to prove that you have done something like this before, it will be helpful when picking out bids. It's hard just to take someones word for it since this is a specific assignment and not just regular python coding.
1. I want this to be coded using iPython Notebook.
2. This system will import a set of ratings from the data set provided below. Split the data set in order to have training data and test data, i think the moveilens data is already split up into groups so i guess it might not need to be split up. I don't need to use any demographic data. I just need to focus on ratings for movies. Here is the Data I need to use: MovieLens Ratings
3. From here I want to train the system with the training data and then import the test data and predict ratings based on the test data. Then compare the test ratings with the predictions and do some analysis on that. The data may need to be cleaned up, not sure.
3. Predictions can be done with K Nearest Neighbors or if you have a better suggestion...KNN Code provided here in iPython Format: KNN.ipynb It uses a module included here: kNN.py and using this file: video_store_2.csv In the KNN example earlier it states the user picked this, the prediction says this kind of thing...
4. Recommender system article recommender-systems-eml2010.pdf
5. I also have an item based recommender system code provided here: itemBasedRec.py Some of the code is missing but it may be useful. This code can be useful too and also applies to filling in the blanks of the itemBasedRec.py code but it may or may not help with this project. Matrix Factorization.ipynb
6. Do some simple regression analysis to see how the ratings predicted to the new ratings given. Some sample code on regression analysis is here: IPython Notebook.html
Summary: Import movielens data, train the system, import more data or "users" with their ratings and predict what they will rate each of the movies. Do some analysis on the data. Some regression analysis at the end.
Any questions just ask. See below for full descriptions.
Deliverables for Application Development projects include the following.
- A detailed project report, including a description of the system system (including specific techniques and algorithms you used), and the interaction between the components (make references to code segments, modules, methods, functions, as necessary). Your write up should also include description of the evaluation of your system demonstrating its correctness, functionality, accuracy, etc., and the description of the data set(s) used in evaluating your system.
- Appendices to the main report should include:
- Complete (actual) sample runs of your program with descriptions, illustrating how your system works, along with any intermediate input or output used for the sample runs.
- Fully documented code for the system, including any programs and scripts used for data preparation and analysis.
- Binary files (e.g., executables, DLLs, Class files) or other components necessary to run your program.
- Samples or detailed descriptions of the data sets used.
- Readme file containing instructions on how to compile, install, and/or run your program, if appropriate, including relevant references to outside resources used.
Here are the official Instructions:
Application Development: The development and evaluation of an original application using machine learning and data mining techniques. The goal of this type of project is not to perform a full analysis of a given data set, but rather to perform useful tasks in a given application domain. The application must be tested and evaluated using a specific data set. The application must also involve the use of one or more of the modeling techniques relevant to the course topics. Your application may also include a significant extension of an existing application discussed in class materials or other sources (in this case, the application must be extended to include additional or more sophisticated types of modeling and analysis). The deliverable for the project must include the fully documented code, distribution files, including any third party sources, installation/deployment documents (including examples, screen shots of test runs, etc.), data used for the application, and a project report providing a description of the components of the application and the results of any evaluation. Many different types of applications are possible, but some examples of such applications include (but are not limited to):
Recommender Systems: applications that learn from user profiles to provide personalized recommendations for items in a given domain such as movies, books, products, documents, stocks, twitter feeds, etc.
Here are the deliverables for the project:
- Correctness and Completeness:
- System provides the necessary functionality to meet all of the project objectives (including those specified in the project proposal).
- The application and all of its components work correctly and as expected.
- Algorithms and techniques used are efficient and scalable.
- The systems includes all of the relevant components necessary for the underlying application domain or task.
- The use of sound design principles in the system and its components so that the overall architecture of the application is clear and logical.
- Appropriate use of established software design methodologies to enhance modularity and avoid redundancy.
- Effective use of interaction design, including appropriate I/O design and effective user interfaces.
- Different components and pieces are integrated appropriately (so that from the user perspective, there is seamless interaction with the system) .
- The system has been evaluated thoroughly using appropriate data sets.
- The evaluation methodologies used are appropriate for the specific application domain and the system objectives.
- The results of the evaluation are clearly presented and discussed, and if necessary compared to baseline systems or algorithms.
- Adequate evidence (e.g., sample runs on different test inputs) is provided to demonstrate the system functionality as a whole and the functionality of different components.
- The source code is fully documented.
- The report provides detailed documentation of the system architecture as well as a description of different components and their interactions.