Assignment2_Fall23

pdf

School

Purdue University *

*We aren’t endorsed by this school

Course

595

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

2

Uploaded by CoachOryxPerson15

Report
Assignment 2: Association Analysis Due: October 20 2023 11:59PM Two datasets (Market, Gene) are provided. For each dataset, we provide the transaction- item representation discussed in class Each row denotes a transaction, and each transaction consists of a set of items. In this assignment, you are asked to implement Apriori algorithm that discovers a collection of frequent itemsets from a transaction database. Template is for Python 3. You are asked to fill in two functions: apriori_gen and get_freq . apriori_gen generates new candidate itemsets based on the frequent itemsets found in the previous iteration, and prunes the itemsets containing any infrequent itemset of size k -1. get_freq returns the candidate itemsets that meet a minimum support threshold. Do not change the input and output of the two functions in the template. Please take the following steps: 1. Implement Apriori algorithm. The pseudo code can be found below. Apriori ( T , minSupport ){ // T is the database and minSupport is the minimum support 𝐶 1 = {the set of unique single items in T } 𝐿 1 = {the set of single item whose support is not less than minSupport } for (k = 2; 𝐿 𝑘−1 ≠ ∅ ; k++ ){ //generate and prune candidate set 𝐶 𝑘 𝐶 𝑘 is a list of itemsets in which each itemset is formed by merging two itemsets in 𝐿 𝑘−1 if their first k-2 items are identical Remove an itemset from 𝐶 𝑘 if any ( k-1 )-subset of this candidate itemset is not in the frequent itemset list 𝐿 𝑘−1 //Count the support of each candidate itemset for each transaction t in database do { for each candidate 𝑐 in 𝐶 𝑘 // increment the support count of all candidate itemsets that are contained in transaction t if c is a subset of t then count[c] ← count[c] + 1 } for each candidate 𝑐 in 𝐶 𝑘 // Judge if a candidate itemset is frequent or not if the support of c is not less than minSupport then include c in 𝐿 𝑘 } return {𝐿 1 , 𝐿 2 , … , 𝐿 𝑘 } }
If you are not sure about whether it is OK to use a certain function, please post your question on Piazza. 2. Apply your implemented Apriori on the Market dataset with minimum support threshold=50%. You should get the following candidate itemsets ( C k ) and frequent itemsets ( L k ): Candidate itemsets: C 2 : {Eggs,Key-chain}, {Eggs,Mango}, {Eggs,Onion}, {Eggs,Yo-yo}, {Key-chain, Mango}, {Key-chain, Onion}, {Key-chain,Yo-yo}, {Mango,Onion}, {Mango,Yo-yo}, {Onion,Yo-yo} C 3 : {Eggs,Key-chain,Onion}, {Key-chain,Mango,Onion} Frequent itemsets: L 1 : {Eggs}, {Key-chain}, {Mango}, {Onion}, {Yo-yo} L 2 : {Eggs,Key-chain}, {Eggs,Onion}, {Key-chain,Mango}, {Key-chain,Onion}, {Key- chain,Yo-yo}, {Mango, Onion} L 3 : {Eggs, Key-chain,Onion} 3. If you get the same collection of itemsets at Step 2, you can proceed to apply your implemented Apriori algorithm on the Gene dataset with minimum support threshold=50%. You should be able to get 51 length-1 frequent itemsets ( L 1 ), 1275 length-2 candidate itemsets ( C 2 ), 29 length-2 frequent itemsets ( L 2 ), 20 length-3 candidate itemsets ( C 3 ) and 2 length-3 frequent itemsets ( L 3 ). 4. Prepare your submission. Your final submission should be a zip file named as Assignment2.zip. In the zip file, you should include: The Python code. Report: A word or pdf file named as Assignment2 (.doc, .docx or .pdf). The report should consist of the following parts: 1) The frequent itemsets you obtain on Gene dataset (L1, L2, L3). 2) The length-3 candidate itemsets generated during Apriori (C3) on Gene dataset. 3) The codes of the two functions: apriori_gen and get_freq. 5. Submit the zip file under Assignment 2 on Brightspace. Please refer to Course Syllabus for late submission policy and academic integrity policy. This assignment must be done independently. Running your submitted code should be able to reproduce the results in the report.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help