University of The Cumberlands Data Mining and Information Sytems Questions

User Generated

XX225

Writing

University of the Cumberlands

Description

Question 1

Suppose that you are employed as a data mining consultant for an Internet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied.

Question 2

Identify at least two advantages and two disadvantages of using color to visually represent information.

Question 3

Consider the XOR problem where there are four training points: (1, 1, −),(1, 0, +),(0, 1, +),(0, 0, −). Transform the data into the following feature space:

Φ = (1, √ 2x1, √ 2x2, √ 2x1x2, x2 1, x2 2).

Find the maximum margin linear decision boundary in the transformed space.

Question 4

Consider the following set of candidate 3-itemsets: {1, 2, 3}, {1, 2, 6}, {1, 3, 4}, {2, 3, 4}, {2, 4, 5}, {3, 4, 6}, {4, 5, 6}

Construct a hash tree for the above candidate 3-itemsets. Assume the tree uses a hash function where all odd-numbered items are hashed to the left child of a node, while the even-numbered items are hashed to the right child. A candidate k-itemset is inserted into the tree by hashing on each successive item in the candidate and then following the appropriate branch of the tree according to the hash value. Once a leaf node is reached, the candidate is inserted based on one of the following conditions:

Condition 1: If the depth of the leaf node is equal to k (the root is assumed to be at depth 0), then the candidate is inserted regardless of the number of itemsets already stored at the node.

Condition 2: If the depth of the leaf node is less than k, then the candidate can be inserted as long as the number of itemsets stored at the node is less than maxsize. Assume maxsize = 2 for this question.

Condition 3: If the depth of the leaf node is less than k and the number of itemsets stored at the node is equal to maxsize, then the leaf node is converted into an internal node. New leaf nodes are created as children of the old leaf node. Candidate itemsets previously stored in the old leaf node are distributed to the children based on their hash values. The new candidate is also hashed to its appropriate leaf node.

How many leaf nodes are there in the candidate hash tree? How many internal nodes are there?

Consider a transaction that contains the following items: {1, 2, 3, 5, 6}. Using the hash tree constructed in part (a), which leaf nodes will be checked against the transaction? What are the candidate 3-itemsets contained in the transaction?

Question 5

Consider a group of documents that has been selected from a much larger set of diverse documents so that the selected documents are as dissimilar from one another as possible. If we consider documents that are not highly related (connected, similar) to one another as being anomalous, then all of the documents that we have selected might be classified as anomalies. Is it possible for a data set to consist only of anomalous objects or is this an abuse of the terminology?

Requirements

Answer each question in 150 - 200words, APA format. References from scholarly articles preferably from google scholar. No plagiarism. Link to text book : https://ebooksshelf.com/?download_file=8978&order=wc_order_tANBeQGUDhTAx&email=saigowthamreddy303%40gmail.com&key=cf7033af-5ea2-4f12-8992-4dc4f09653c2

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hi, here is your assignment with plagiarism result ;). If you have concerns, please let me know okay ;)

1

Question 1
Suppose that you are employed as a data mining consultant for an Internet search engine
company. Describe how data mining can help the company by giving specific examples of how
techniques, such as clustering, classification, association rule mining, and anomaly detection
can be applied.
Answer:
Data mining is outlined as a method accustomed to extract usable knowledge from a
bigger set of any data. It implies analyzing knowledge patterns in giant batches of information
exploitation one or additional package (Bharati& Ramageri, 2010). data processing has
applications in multiple fields, like science and analysis. As associate degree application of
information mining, businesses will learn additional concerning their customers and develop
simpler methods associated with numerous business functions and successively leverage
resources in a very additional optimum and perceptive manner. Classification is supervised
learning, where an entity may be classified based on known labels (Bharati& Ramageri,
2010). Association rule mining is the rules containing if then else statement which helps in
finding the relationship between the data items in the relational database. These association rule
of mining uses the machine learning algorithm to find different patterns in the non-numerical
data. The association rule mining is a process that finds the frequent occurrence of pattern, their
correlation, and association among the datasets in the database. clustering of customers based on
the income to suggest a product and classifying the customers whether they may purchase a
parti...


Anonymous
Really useful study material!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags