Assignment4Fall2023 (1)

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

6220

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by CommodoreKnowledgeFox36

CS 6220 Data Mining — Assignment 4 Association Rules To complete this assignment you will need to download the following resources: 1. You may find a notebook with potential helper functions for implementing the Apriori https://goo.gl/1HtnFQ . Add your portion of the assignment solution to the end of this file and submit it. 2. The dataset you’ll be using grocery.csv can be found at https://www.dropbox.com/s/mz6j5glhnd3skfs/grocery.csv?d Download the file and place it in the same directory as your notebook. Objectives: 1. Define association rules and state their usefulness 2. Programmatically apply association rules to the given dataset and analyze the results Submission: Through the assignment submission portal on Canvas, submit your ipynb with a pdf of your assignment solution; no need to zip the files. This is a norm for almost all assignments Grading Criteria: Follow the instructions in the pdf, and complete each task. You will be graded on the ap- plication of the modules’ topics, the completeness of your answers to the questions in the assignment notebook, and the clarity of your writing and code.

What You Need to Do Part 1 - Apriori [40 Points]: The dataset above contains nearly 10 thousand transactions recorded from a grocery story. Each row in the dataset refers to a given transaction, where the items purchased are sepa- rated by commas. For example, on the second row we have a transaction with three items: tropical fruit, yogurt, and coffee. The attached notebook file (first download link above) contains a helper function that allows you to quickly load that file into a format that can be easily processed in Python. Task 1: Your task here is to make use of the provided functions to generate candidate item- sets, select those that are frequent using Apriori, and subsequently list association rules derived from these. [Note that because we have thousands of transactions, it may be hard to find itemsets with high supports (e.g., 20%), so in order to see interesting results, make sure you experiment with lower min support parameters. Make sure to document your code and leave some com- mentary on the results you obtained, which you will further discuss on the Collaborative Activity for this lesson.] Task 2: We can find a relationship between the confidence level and number of rules found for a certain support value. For this, plot the number of rules found on y-axis and confidence levels on x-axis for different support values. Use 10%, 20%, 30%, 40%, 50% confidence levels for each of 2%, 3%, 4%, 5% support levels in the same figure. Plot a separate line for each support level. Part 2 - FPgrowth [30 Points]: Repeat the above process but this time use FP-growth. You may use the code provided at https://goo.gl/Rv8KAa , or some other Python implementation that you might find online (just be sure to cite your sources). Part 3 - Interest Factor [30 Points]: Use either Apriori or FPgrowth algorithm with 2% support and 30% confidence to generate the rules. Now, calculate interest factor for all the rules. Prepare three sets of rules sorted in descending order by - support, confidence, and interest factor, respectively. Select and print the top-5 rules in each list. Compare and mention if any rules are common in those.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

ICE 10.pdf

ENSF310-F21-Project-3.pdf

DADEDBE1-CABF-4C76-8B24-FA1073B52C65

cap5605assignment8_sb23w (3).pdf

WD A1 DEV ENV+HTML SU23.pdf

66C3D374-60FC-4B94-8EA9-854A8EFE43A3

Assignment8Fall2023.pdf

Data 118 Quiz 3 complete (1).pdf

Assignment2Fall2023.pdf

14.6 join and where dataworld CIS.xlsx

CIS dataworld asses 13.11.xlsx

week4assignmentforensics.docx

Recommended textbooks for you

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

EBK JAVA PROGRAMMING

Computer Science

ISBN:9781337671385

Author:FARRELL

Publisher:CENGAGE LEARNING - CONSIGNMENT

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning