CMP 627 Data MiningMidterm ExamCMP 627 Data MiningMidterm Exam1CMP 627 Data MiningMidterm Exam1. [28] Data pre-processing.(a) [5] It is not straightforward to visualize k-dimensional data for k >3.Name 5 visualization techniques that can visualize 6-dimensional data effectively.Answer: The 5 visualization techniques that can visualize 6-dimensional data effectively areStick figureChernoff faceDimension stackingParallel coordinatesMulti-dimensional scatter plot(b) [6] for each of the following similarity measures, give one good application example.i. Cosine measureAnswer: If d1 and d2 are two vectors (e.g., term-frequency vectors), thencos(d1,d2) = (d1 d2)/ d1 d2where indicates vector dot product, d indicates the length of the vector d.Good application example for cosine measure is measuring the text similarity.ii. Jaccard coecientAnswer: It is a similarity measure for asymmetric binary variables. Computing similarity of aset of medical tests is an example of Jaccard coefficient.iii. Minkowski distance for k = 12CMP 627 Data MiningMidterm ExamAnswer: It is a popular distance measure. Computing Manhattan (city block) distance.(c) [9] Distinguishing the following concepts or measuresi. Pearson correlation coefficient vs. covarianceAnswer: Pearson correlation coefficients are standardized and measures only linearrelationships, hence a linear relation results in the

