Data Mining Anamoly Detection Part 2

User Generated

mzna2712

Computer Science

Intro to Data Mining

University of New Haven

Description

Please take a look at the attached questions. I will also share the video soon.

Thanks

Unformatted Attachment Preview

Chapter 9B Problems YOUR ANSWERS MUST APPEAR WITHIN THIS PROBLEM DOCUMENT. YOU MUST WRITE USING YOUR OWN WORDS. ANSWERS TAKEN FROM THE INTERNET OR ANSWERS THAT MATCH ANOTHER STUDENTS WILL RECEIVE ZERO (0) POINTS. 10% WILL BE DEDUCTED IF YOU CREATE A NEW OR SEPARATE DOCUMENT. 10% WILL BE DEDUCTED IF YOU CREATE A “TITLE PAGE” TYPE OF DOCUMENT. 1. Many statistical tests for outliers were developed in an environment in which a few hundred observations was a large data set. We explore the limitations of such approaches. (a) For a set of 1,000,000 values, how likely are we to have outliers according to the test that says a value is an outlier if it is more than three standard deviations from the average? (Assume a normal distribution.) ANSWER: (b) Does the approach that states an outlier is an object of unusually low probability need to be adjusted when dealing with large data sets? If so, how? ANSWER: 2. Consider the (relative distance) K-means scheme for outlier detection described in Section 10.5 and the accompanying figure (a) The points at the bottom of the compact cluster shown in the figure have a somewhat higher outlier score than those points at the top of the compact cluster. Why? ANSWER: (b) Suppose that we choose the number of clusters to be much larger, e.g., 10. Would the proposed technique still be effective in finding the most extreme outlier at the top of the figure? Why or why not? ANSWER: (c) The use of relative distance adjusts for differences in density. Give an example of where such an approach might lead to the wrong conclusion. ANSWER:
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Check this out. Please let me know if you have any questions. Thanks :)

Chapter 9B Problems

YOUR ANSWERS MUST APPEAR WITHIN THIS PROBLEM DOCUMENT.
YOU MUST WRITE USING YOUR OWN WORDS. ANSWERS TAKEN FROM THE INTERNET OR
ANSWERS THAT MATCH ANOTHER STUDENTS WILL RECEIVE ZERO (0) POINTS.
10% WILL BE DEDUCTED IF YOU CREATE A NEW OR SEPARATE DOCUMENT.
10% WILL BE DEDUCTED IF YOU CREATE A “TITLE PAGE” TYPE OF DOCUMENT.

1. Many statistical tests for outliers were developed in an environment in which a few hundred
observations was a large data set. We explore the limitations of such approaches.
(a) For a set of 1,000,000 values, how likely are we to have outliers according to the test that
says a value is an outlier if it is more than three stand...


Anonymous
Great! Studypool always delivers quality work.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags