ITMG 516 Intro to Data Analytics/Business Data Mining
Question Description
You will mine text consisting of reviews for a particular product or service. Refer to Chapter 12 of Data Mining for the Masses for help on how to perform text mining using RapidMiner. Perform the following tasks:
1. Install the Text Processing 7.5.0 extension in RapidMiner. To do this, open RapidMiner, click on Extensions in the menu bar, then Marketplace (Updates and Extensions). Search for the Text Processing extension and install it.
2. Using your favorite search engine, locate a website or forum on the Internet where people have posted reviews for a particular product or service. 3. Copy and paste at least ten of these posts or comments into a text editor, saving each one as its own text document with a unique name.
4. Open a new, blank process in RapidMiner, and using the Read Documents operator, open each of your ten (or more) text documents containing the customer reviews you found.
5. Process these documents in RapidMiner. Be sure you tokenize and use other handlers in your sub-process as you deem appropriate/necessary. Experiment with n-grams and stems.
6. Use a k-means cluster to group your documents into two, three, or more clusters. Output your word list as well. Take three screenshots: (1) final process stream, (2) clustering results, and (3) resulting word list of tokens and frequencies.
7. In your interpretation of results, answer the following:
a) Based on your word list, what seems to be the most common terms in your documents? Why do you think that is?
b) Based on your word list, are there some terms or phrases that show up in all or most of your documents? Why do you think these are so common? c) Based on your clusters, what groups did you get? What are the common themes in each of your clusters?
d) How might the company who sold this product or service use your model to their advantage?
This question has not been answered.
Create a free account to get help with this and any other question!
Brown University
1271 Tutors
California Institute of Technology
2131 Tutors
Carnegie Mellon University
982 Tutors
Columbia University
1256 Tutors
Dartmouth University
2113 Tutors
Emory University
2279 Tutors
Harvard University
599 Tutors
Massachusetts Institute of Technology
2319 Tutors
New York University
1645 Tutors
Notre Dam University
1911 Tutors
Oklahoma University
2122 Tutors
Pennsylvania State University
932 Tutors
Princeton University
1211 Tutors
Stanford University
983 Tutors
University of California
1282 Tutors
Oxford University
123 Tutors
Yale University
2325 Tutors