The purpose of this assignment is to
demonstrate steps performed in a K-Means Cluster analysis.
Review the "k-MEANS CLUSTERING
ALGORITHM" section in Chapter 4 of the Sharda et. al. textbook for
Use Excel to perform the following
- Plot the data on a scatter plot.
- Determine the ideal number of clusters.
- Choose random center points (centroids) for each
cluster. (Note: Each student will select a different random set of
- Using a standard distance formula measure the distance
from each data point to each center point.
- Assign each data point to an initial cluster region
based on closeness.
- For each cluster calculate new center points.
- Repeat steps 4 through 6.
You will use Excel to help with
calculations, but only standard functions should be used (i.e. don't use a
plug-in to perform the analysis for you.) You need to show your work
doing this analysis the long way. If you were to repeat steps 4 through
6, what will likely happen with the cluster centroids? The rubric for
this assignment can be viewed when clicking on the assignment link.
Here is a link to an example
spreadsheet using a smaller data set. It contains two tabs. The
first tab is the raw data. The second tab contains the analysis that was
performed. Make sure that you use a different starting center points from