# K-Means Tutorial

May 5th, 2015
The k-means algorithm (Notes from: Tan, Steinbach, Kumar + Ghosh) K-Means Algorithm -K = # of clusters (given); one "mean" per cluster Interval data

K-Means TutorialThe k-means algorithm(Notes from: Tan, Steinbach, Kumar + Ghosh)K-Means Algorithm K = # of clusters (given); one mean per cluster Interval dataInitialize means (e.g. by picking k samples at random) Iterate: (1) assign each point to nearest mean (2) move mean to center of its cluster.(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 20022Assignment Step; Means Update(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 20023Convergence after another iterationComplexity: O(k . n . # of iterations)(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 20024K-means J. MacQueen, Some methods for classification and analysis of multivariate observations," Proc. of the Fifth Berkeley Symp. On Math. Stat. and Prob., vol. 1, pp. 281-296, 1967. E. Forgy, Cluster analysis of multivariate data: efficiency vs. interpretability of classification," Biometrics, vol. 21, pp. 768, 1965. D. J. Hall and G. B. Ball, ISODATA: A novel method of data analysis and pattern classification," Technical Report, Stanford Research Institute, Menlo Park, CA, 1965. The history of k-means type of algorithms (LBG Algorithm, 1980) R.M. Gray and D.L. Neuhoff, "Quantization," IEEE Transactions on Information Theory, Vol. 44, pp. 2325-2384, October 1998. (Commemorative Issue, 1948-1998)ICDM: Top Ten Data Mining AlgorithmsK-meansDecember, 20065K-means Clustering DetailsComplexity is O( n * K * I * d )

