In this homework, you will perform cluster analysis using a data set of your choice. Perform the following for this homework:

  • Find a different data set on the web. You may not use any data sets built into RapidMiner. The data set either must be all numeric, or you will have to select only relevant numeric attributes.
  • Import and load your data set into RapidMiner. Take a screenshot of the data set.
  • Write a research question you want answered from performing cluster analysis on this data set.
  • Perform cluster analysis. Try different values for k (i.e., the number of clusters). Take a screenshot of your final process stream.
  • Take screenshots of both the resulting cluster model and centroid table.
  • Interpret your results. What do the clusters mean? How do these clusters help you answer your research question?

Submission Instructions:

Please type up your homework using the homework template posted on Blackboard under Assignments. You should include at least four screenshots: (1) data set loaded in RapidMiner, (2) final process stream, (3) cluster model, and (4) centroid table. Remember to interpret your results.

