Objective The purpose of the project is to write a program to generate all association rules whose support is greater than a user-supplied minimum support and whose confidence is greater than a user supplied minimum confidence. You need to implement all the steps of the Apriori algorithm. Your code should be in a file named my_rules(.py). Your program should take as a command line option (or set them at the beginning of your code) five parameters in the following order: (i) minimum support, (ii) minimum confidence, (iii) input file name, (iv) output name. The output name will be used to identify the output files from each run. When minconf=-1, do not generate rules. Input file format The input file, small.txt, consists of a set of lines, each line containing two numbers. The first number is the transaction ID, and the second number is the item ID. The lines in the file are ordered in increasing transaction ID order. Note that a transaction will be derived by combining the item IDs of all the lines that correspond to the same transaction ID. The input file is provided.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
Objective
The purpose of the project is to write a program to generate all association rules whose support is greater
than a user-supplied minimum support and whose confidence is greater than a user supplied minimum
confidence. You need to implement all the steps of the Apriori algorithm.
Your code should be in a file named my_rules(.py). Your program should take as a command line option (or set
them at the beginning of your code) five parameters in the following order: (i) minimum support, (ii) minimum
confidence, (iii) input file name, (iv) output name. The output name will be used to identify the output files
from each run. When minconf=-1, do not generate rules.
Input file format
The input file, small.txt, consists of a set of lines, each line containing two numbers. The first number is the
transaction ID, and the second number is the item ID. The lines in the file are ordered in increasing transaction
ID order. Note that a transaction will be derived by combining the item IDs of all the lines that correspond to
the same transaction ID. The input file is provided.
Output file format
You need to generate three different output files.
1) <output_name>_items.txt: This output file will contain as many lines as the number of frequent
itemsets, with their support count. The format of each line will be *exactly* as follows:
ITEMSETS|SUPPORT_COUNT
ITEMSETS will correspond to the items (space delimited). E.g., “item1 item2 item3|10”.
Notice that there are no other spaces other than the ones that separate the items.
2) <output_name>_rules.txt: This output file will contain as many lines as the number of high-confidence
frequent rules that you found. The format of each line will be *exactly* as follows:
LHS|RHS|SUPPORT|CONFIDENCE
Both LHS and RHS will contain the items that make up the left- and right-hand side of the rule in a
space-delimited fashion. E.g., “item1 item2|item3|0.2|0.3”. Notice that there are no other
spaces other than the ones that separate the items. When minconf=-1, the file will not be generated.
3) <output_name>_info.txt: This file will have a line for each piece of information. More specifically, it
will need to include the following information:
- minsup:
- minconf:
- input file:
- output name:
- Number of items:
- Number of transactions:
- Number of frequent 1-itemsets:
 
 
 
 
 
 
 
 
- Number of frequent 2-itemsets:
- ...
- Number of frequent ? –itemsets:
- Total number of frequent items:
- The length ? of the largest ? -itemset:
- The most frequent itemset:
- Number of high confidence rules:
- The rule with the highest confidence:
- Time in seconds to find the frequent itemsets:
- Time in seconds to find the confident rules:
4) <output_name>_plot_items.txt: A bar plot with the number of frequent ? -itemsets for different
values of ? .
5) <output_name>_plot_rules.txt: A bar plot with the number of high-confidence rules for different
values of ? . When minconf=-1, the file will not be generated.
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps

Blurred answer
Knowledge Booster
Reference Types in Function
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education