ITMG 516 Intro to Data Analytics/Business Data Mining

zqzqzqz
timer Asked: Dec 5th, 2017

Question Description

You will mine text consisting of reviews for a particular product or service. Refer to Chapter 12 of Data Mining for the Masses for help on how to perform text mining using RapidMiner. Perform the following tasks:

1. Install the Text Processing 7.5.0 extension in RapidMiner. To do this, open RapidMiner, click on Extensions in the menu bar, then Marketplace (Updates and Extensions). Search for the Text Processing extension and install it.

2. Using your favorite search engine, locate a website or forum on the Internet where people have posted reviews for a particular product or service. 3. Copy and paste at least ten of these posts or comments into a text editor, saving each one as its own text document with a unique name.

4. Open a new, blank process in RapidMiner, and using the Read Documents operator, open each of your ten (or more) text documents containing the customer reviews you found.

5. Process these documents in RapidMiner. Be sure you tokenize and use other handlers in your sub-process as you deem appropriate/necessary. Experiment with n-grams and stems.

6. Use a k-means cluster to group your documents into two, three, or more clusters. Output your word list as well. Take three screenshots: (1) final process stream, (2) clustering results, and (3) resulting word list of tokens and frequencies.

7. In your interpretation of results, answer the following:

a) Based on your word list, what seems to be the most common terms in your documents? Why do you think that is?

b) Based on your word list, are there some terms or phrases that show up in all or most of your documents? Why do you think these are so common? c) Based on your clusters, what groups did you get? What are the common themes in each of your clusters?

d) How might the company who sold this product or service use your model to their advantage?

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

This question has not been answered.

Create a free account to get help with this and any other question!

Related Tags

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors