Please answer the questions related to Data Science and Big data analysis, I have attached the question in the ppt and related material in the zip file

User Generated

zrrgu

Computer Science

ITS 836

University of the Cumberlands

ITS

Description

Please answer the questions related to Data Science and Big data analysis,  

Unformatted Attachment Preview

School of Computer & Information Sciences ITS 836 Data Science and Big Data Analytics ITS 836 1 Lecture 05 – HW05 Association Rules • HW05 Exercise 1 – review apriori algorithm and create slides: – https://www.hackerearth.com/blog/machine-learning/beginners-tutorial-apriori-algorithm-data-mining-rimplementation/ • HW05 Exercise 2: Grocery Dataset with R • HW05 Exercise 3: Apply priori Rules to “marketbasket.csv” • HW05 Exercise 4: “R for Data Science” Module 4 ITS 836 2 Exercise 1 • Summarize the aPriori algorithm – • • • • https://www.hackerearth.com/blog/machine-learning/beginners-tutorial-apriori-algorithm-data-mining-rimplementation/ Support Confidence Lift How the apriori algorithm works? ITS 836 3 Exercise 2: Grocery Store Transactions from textbook > Packages -> Install -> arules, arulesViz # don’t enter next line install.packages(c("arules", "arulesViz")) # appears on console library('arules') library('arulesViz') data(Groceries) summary(Groceries) # indicates 9835 rows Class of dataset Groceries is transactions, containing 3 slots 1. 2. 3. transactionInfo transactions itemInfo data # data frame with vectors having length of # data frame storing item labels # binary evidence matrix of labels in transactions Groceries@itemInfo[1:10,] apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", ")) Exercise 2 Grocery Store Transactions Section - 5.5.2 Frequent Itemset Generation To illustrate the Apriori algorithm, the code below does each iteration separately. Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total First, get itemsets of length 1 itemsets inspect(head(sort(itemsets,by="support"),10)) # lists top 10 > summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items Exercise 2 Grocery Store Transactions 5.5.3 Rule Generation and Visualization The Apriori algorithm will now generate rules. Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules. > rules summary(rules) # finds 2918 rules > plot(rules) # displays scatterplot The scatterplot shows that the highest lift occurs at a low support and a low confidence. Exercise 2 Grocery Store Transactions 5.5.3 Rule Generation and Visualization Exercise 2 Grocery Store Transactions 5.5.3 Rule Generation and Visualization Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules plot(rules@quality) # displays scatterplot matrix Lift is proportional to confidence with several linear groupings. Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y). Exercise 2 Grocery Store Transactions 5.5.3 Rule Generation and Visualization Compute the 1/Support(Y) which is the slope > slope unlist(lapply(split(slope,f=slope),length)) Display the top 10 rules sorted by lift > inspect(head(sort(rules,by="lift"),10)) Rule {Instant food products, soda} -> {hamburger meat} has the highest lift of 19 (page 154) Exercise 2 Grocery Store Transactions 5.5.3 Rule Generation and Visualization Visualize the top 5 rules with the highest lift. > highLiftRules plot(highLiftRules,method="graph",control=list(type= "items")) In the graph, the arrow always points from an item on the LHS to an item on the RHS. For example, the arrows that connects ham, processed cheese, and white bread suggest the rule {ham, processed cheese} -> {white bread} Size of circle indicates support and shade represents lift Exercise 3 - Apply priori Rules to “marketbasket.csv” • Use the data – apply Exercise 2 method to the data • You can also use the following references – https://www.kaggle.com/xvivancos/market-basket-analysis/report – https://www.kaggle.com/swapnil2129/exploratory-data-analysismarket-basket-analysis ITS 836 11 Exercise 4: Model & Graphics https://r4ds.had.co.nz/model-intro.html 28 Graphics 28.2.1 28.3.1 28.4.4 23 Model Basics 23.2.1 23.3.1 23.4.5 24 Model Building 24.2.3 24.3.5 25 Many Models 25.2.5 25.4.5 25.5.3 ITS 836 12 Questions? ITS 836 13 ”R for Data Science” 5 Modules I Explore II Wrangle III Program IV Model R for Data Science, Garrett Grolemund & Hadley Wickham https://r4ds.had.co.nz/index.html ITS 836 V Communicate 14
Purchase answer to see full attachment
Explanation & Answer:
2 Questions
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

SUMMARY OF APRIORI
ALGORITHM
STUDENT’S NAME
INSTITUTIONAL AFFILIATION

APRIORI ALGORITHM
• This algorithm has been designed to work on
databases that contain many transactions.
• It tries to find common subsets to a least minimum
number.

HOW IT WORKS
• It works on two assumptions; all subsets of a
frequent itemset must be frequent and for an
infrequent itemset all its supersets mus...


Anonymous
Super useful! Studypool never disappoints.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags