Lab 10

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by DeanTigerMaster997

Report
11/2/23, 7:43 PM Lab10a.knit file:///C:/Users/mahaj/Downloads/Lab10a.html 1/7 Intro to Data Science - Lab 10 Copyright 2023, Jeffrey Stanton and Jeffrey Saltz Please do not post online. Week 10 - Association Rules Mining # Enter your name here: Swapnil Deore Please include nice comments. Instructions: Run the necessary code on your own instance of R-Studio. Attribution statement: (choose only one and delete the rest) # 1. I did this lab assignment by myself, with help from the book and the professor. Association rules mining , also known as market basket analysis , is an unsupervised data mining technique that discovers patterns in the form of if-then rules. The technique is ** unsupervised ** in the sense that there is no prediction or classification happening. We are simply trying to find interesting patterns . In addition to working with baskets of objects, association rules mining is good at working with any kind of data that can be expressed as lists of attributes . For example, a trip to Washington DC might consist of the following attributes: train, July, morning departure, afternoon arrival, Union Station, first class, express. In these exercises we will work with a built in data set called groceries . Make sure to library the arules and arulesViz packages before running the following: data (Groceries) # Load data into memory myGroc <- Groceries # Make a copy for safety #install.packages("arules") #install.packages("arulesViz") library (arules) ## Loading required package: Matrix ## ## Attaching package: 'arules'
11/2/23, 7:43 PM Lab10a.knit file:///C:/Users/mahaj/Downloads/Lab10a.html 2/7 ## The following objects are masked from 'package:base': ## ## abbreviate, write library (arulesViz) data (Groceries) myGroc <- Groceries # here i am loading the dataset and taking a backup 1. Examine the data structure that summary() reveals. This is called a sparse matrix and it efficiently stores a set of market baskets along with meta-data. Report using R comments about some of the item labels. summary(myGroc) ## transactions as itemMatrix in sparse format with ## 9835 rows (elements/itemsets/transactions) and ## 169 columns (items) and a density of 0.02609146 ## ## most frequent items: ## whole milk other vegetables rolls/buns soda ## 2513 1903 1809 1715 ## yogurt (Other) ## 1372 34055 ## ## element (itemset/transaction) length distribution: ## sizes ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46 ## 17 18 19 20 21 22 23 24 26 27 28 29 32 ## 29 14 14 9 11 4 6 1 1 1 1 3 1 ## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.000 2.000 3.000 4.409 6.000 32.000 ## ## includes extended item information - examples: ## labels level2 level1 ## 1 frankfurter sausage meat and sausage ## 2 sausage sausage meat and sausage ## 3 liver loaf sausage meat and sausage # Labels are frankfurter, sausage and liver loaf. 2. Use the itemFrequency(myGroc) command to generate a list of item frequencies. Save that list in a new data object. Run str( ) on the data object and write a comment describing what it is. Run sort( ) on the data object and save the results. Run head( ) and tail( ) on the sorted object to show the most and least frequently occurring items. What s the most frequently purchased item?
11/2/23, 7:43 PM Lab10a.knit file:///C:/Users/mahaj/Downloads/Lab10a.html 3/7 itemF <- itemFrequency(myGroc) str(itemF) ## Named num [1:169] 0.05897 0.09395 0.00508 0.02603 0.02583 ... ## - attr(*, "names")= chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ... sorted <- sort(itemF) head(sorted) ## baby food sound storage medium preservation products ## 0.0001016777 0.0001016777 0.0002033554 ## kitchen utensil bags frozen chicken ## 0.0004067107 0.0004067107 0.0006100661 tail(sorted) ## bottled water yogurt soda rolls/buns ## 0.1105236 0.1395018 0.1743772 0.1839349 ## other vegetables whole milk ## 0.1934926 0.2555160 # str gives the structure, it has 1 to 169 items along with frequency. # Whole milk is most purchased item and kitchen utensils the least. 3. Create a frequency plot with itemFrequencyPlot(myGroc, topN=20) and confirm that the plot shows the most frequently purchased item with the left-most bar. Write a comment describing the meaning of the Y- axis. itemFrequencyPlot(myGroc, topN=20)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/2/23, 7:43 PM Lab10a.knit file:///C:/Users/mahaj/Downloads/Lab10a.html 4/7 # Whole milk is shown on extreme left which is correct as seen in the above step. Y-axis shows t he frequency of item. 4. Create a cross table with ct <- crossTable(myGroc, sort=TRUE) . Examine the first few rows and columns of ct by using the square brackets subsetting technique. For example, the first two rows and first three columns would be ct[1:2, 1:3] . Write a comment describing one of values. Write a comment describing what is on the diagonal of the matrix. ct <- crossTable(myGroc, sort=TRUE) ct[1:5, 1:5] ## whole milk other vegetables rolls/buns soda yogurt ## whole milk 2513 736 557 394 551 ## other vegetables 736 1903 419 322 427 ## rolls/buns 557 419 1809 377 338 ## soda 394 322 377 1715 269 ## yogurt 551 427 338 269 1372 # The diagonal tells for how many transactions the particular item was involved. 5. Run the following analysis:
11/2/23, 7:43 PM Lab10a.knit file:///C:/Users/mahaj/Downloads/Lab10a.html 5/7 rules1 <- apriori(myGroc, parameter=list(supp=0.0008, conf=0.55), control=list(verbose=F), appearance=list(default="lhs",rhs=("bottled beer"))) rules1 <- apriori(myGroc, parameter=list(supp=0.0008, conf=0.55), control=list(verbose=F), appearance=list(default="lhs",rhs=("bottled beer"))) 6. Examine the resulting rule set with inspect( ) and make sense of the results. There should be four rules in total. inspect(rules1) ## lhs rhs support confidence ## [1] {liquor, red/blush wine} => {bottled beer} 0.0019318760 0.9047619 ## [2] {soda, liquor} => {bottled beer} 0.0012201322 0.5714286 ## [3] {red/blush wine, napkins} => {bottled beer} 0.0008134215 0.5714286 ## [4] {soda, liquor, red/blush wine} => {bottled beer} 0.0008134215 1.0000000 ## coverage lift count ## [1] 0.0021352313 11.23527 19 ## [2] 0.0021352313 7.09596 12 ## [3] 0.0014234875 7.09596 8 ## [4] 0.0008134215 12.41793 8 7. Adjust the support parameter to a new value so that you get more rules. Anywhere between 10 and 30 rules would be fine. Examine the new rule set with inspect( ) . Does your interpretation of the situation still make sense? rules2 <- apriori(myGroc, parameter=list(supp=0.0005, conf=0.55), control=list(verbose=F), appearance=list(default="lhs",rhs=("bottled beer"))) inspect(rules2)
11/2/23, 7:43 PM Lab10a.knit file:///C:/Users/mahaj/Downloads/Lab10a.html 6/7 ## lhs rhs support confidence coverage lift c ount ## [1] {liquor (appetizer), ## dishes} => {bottled beer} 0.0006100661 0.8571429 0.0007117438 10.643939 6 ## [2] {liquor, ## red/blush wine} => {bottled beer} 0.0019318760 0.9047619 0.0021352313 11.235269 19 ## [3] {soda, ## liquor} => {bottled beer} 0.0012201322 0.5714286 0.0021352313 7.095960 12 ## [4] {red/blush wine, ## napkins} => {bottled beer} 0.0008134215 0.5714286 0.0014234875 7.095960 8 ## [5] {soda, ## liquor, ## red/blush wine} => {bottled beer} 0.0008134215 1.0000000 0.0008134215 12.417929 8 ## [6] {whole milk, ## soups, ## bottled water} => {bottled beer} 0.0005083884 0.8333333 0.0006100661 10.348274 5 ## [7] {yogurt, ## pastry, ## flower (seeds)} => {bottled beer} 0.0005083884 0.8333333 0.0006100661 10.348274 5 ## [8] {whole milk, ## yogurt, ## flower (seeds)} => {bottled beer} 0.0005083884 0.7142857 0.0007117438 8.869949 5 ## [9] {other vegetables, ## salt, ## margarine} => {bottled beer} 0.0005083884 0.7142857 0.0007117438 8.869949 5 ## [10] {soda, ## red/blush wine, ## napkins} => {bottled beer} 0.0005083884 0.8333333 0.0006100661 10.348274 5 ## [11] {citrus fruit, ## oil, ## bottled water} => {bottled beer} 0.0005083884 0.5555556 0.0009150991 6.898850 5 ## [12] {root vegetables, ## herbs, ## other vegetables, ## bottled water} => {bottled beer} 0.0006100661 0.6000000 0.0010167768 7.450758 6 ## [13] {whole milk, ## butter, ## rolls/buns, ## napkins} => {bottled beer} 0.0005083884 0.5555556 0.0009150991 6.898850 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/2/23, 7:43 PM Lab10a.knit file:///C:/Users/mahaj/Downloads/Lab10a.html 7/7 ## [14] {pork, ## whole milk, ## domestic eggs, ## rolls/buns} => {bottled beer} 0.0005083884 0.5555556 0.0009150991 6.898850 # As support is reduced, confidence is also reduced. 8. Power User (not required): use mtcars to create a new data frame with factors (e.g., cyl attribute). Then create an mpg column with good or bad (good MPG is above 25). Convert the data frame to a transactions dataset and then predict rules for having bad MPG. mycars <- mtcars mycars <- data.frame(cyl=as.factor(mtcars$cyl), goodMpg=as.factor(mtcars$mpg>25)) mycars$goodMpg ## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [13] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE ## [25] FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE ## Levels: FALSE TRUE rules3 <- apriori(mycars, parameter=list(supp=0.0005, conf=0.55), control=list(verbose=F), appearance=list(default="lhs",rhs=("goodMpg=FALSE"))) inspect(rules3) ## lhs rhs support confidence coverage lift count ## [1] {} => {goodMpg=FALSE} 0.81250 0.8125 1.00000 1.000000 26 ## [2] {cyl=6} => {goodMpg=FALSE} 0.21875 1.0000 0.21875 1.230769 7 ## [3] {cyl=8} => {goodMpg=FALSE} 0.43750 1.0000 0.43750 1.230769 14