3130392A-6BA0-485F-A3C0-ACEB16115FCA

pdf

School

New Jersey Institute Of Technology *

*We aren’t endorsed by this school

Course

634

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by DeaconLightning13472

HomeWork4 NAME:TARUN TARIKERE VENKATESHA ID:tt383 HW 4-1 Association Rule Based on the transactions below, considering minimum support=50% and minimum confidence=30%, a)Find frequent itemsets using Apriori Calculate the support for each item: Item Count Apples 3 Oranges 5 Bananas 4 Lemons 4 Avocados 3 Melons 3

Item Support Apples 50% Oranges 83.3% Bananas 66.6% Lemons 66.6% Avocados 50% Melons 50% Min threshold :50% Candidate 2-itemsets: {Oranges, Bananas}, {Oranges, Lemons}, {Bananas, Lemons} Itemset Count {Oranges,Bananas} 3 {Oranges,Lemons} 3 {Bananas,Lemons} 3 Itemset Support {Oranges,Bananas} 50% {Oranges,Lemons} 50% {Bananas,Lemons} 50% Frequent 2-itemsets: {Oranges, Bananas, Lemons} (b) List all the rules (in the form X, Y -> Z, meaning if someone buys X and Y, then they will also buy Z). List the support and confidence matching the rules. Frequent Itemsets: support itemsets 0 0.5 (Oranges) 1 0.5 (Bananas) 2 0.5 (Lemons) 3 0.5 (Oranges, Bananas) 4 0.5 (Oranges, Lemons) 5 0.5 (Bananas, Lemons)

For frequent itemset(oranges ,Bananas): Support(Oranges, Bananas) = 0.5 Support(Oranges) = 0.5 Confidence(Oranges -> Bananas) = Support(Oranges, Bananas) / Support(Oranges) = 0.5 / 0.5 = 1.0 Support(Bananas) = 0.5 Confidence(Bananas -> Oranges) = Support(Oranges, Bananas) / Support(Bananas) = 0.5 / 0.5 = 1.0 For Frequent itemset (Oranges,Lemons) Support(Oranges, Lemons) = 0.5 Support(Oranges) = 0.5 Confidence(Oranges -> Lemons) = Support(Oranges, Lemons) / (Support(Oranges)) = 0.5 / 0.5 = 1.0 Support(Lemons) = 0.5 Confidence(Lemons -> Oranges) = Support(Oranges, Lemons) /( Support(Lemons)) = 0.5 / 0.5 = 1.0 For freq itemset (Bananas ,Lemons) Support(Bananas, Lemons) = 0.5 Support(Bananas) = 0.5 Confidence(Bananas -> Lemons) = Support(Bananas, Lemons) / (Support(Bananas) )= 0.5 / 0.5 = 1.0 Support(Lemons) = 0.5 Confidence(Lemons -> Bananas) = Support(Bananas, Lemons) / (Support(Lemons) )= 0.5 / 0.5 = 1.0 Oranges -> Bananas (Support: 0.5, Confidence: 1.0) Bananas -> Oranges (Sup: 0.5, Conf: 1.0) Oranges -> Lemons (Sup: 0.5, Conf: 1.0) Lemons -> Oranges (Sup: 0.5, Conf: 1.0) Bananas -> Lemons (Sup: 0.5, Conf: 1.0) Lemons -> Bananas (Sup: 0.5, Conf: 1.0) HW 4-2 Clustering Download the X_clusters.csv file using the link https://drive.google.com/file/d/1w6N0m10zeBDaEDT_v_2yWYg3_okoyFUA/view?usp=sharing Links to an external site. (a) Find how many clusters the data contains. Justify your answer. What is the average silhouette value for your answer? (b) Find the cluster centroids for each cluster using k-means clustering.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

a) 7 clusters has the high avg silhouette score of 0.97.This is best separation for cluster.If we increase further silhouette score decreases. b) Find the cluster centroids for each cluster using k-means clustering . Centroid for Cluster 1: [ 1.3586106 8.48729567 -8.5840372 -8.25855204] Centroid for Cluster 2: [0.97182825 4.31776493 2.03768407 0.92724738] Centroid for Cluster 3: [-7.64190625 2.79416346 -7.12545392 8.86662881] Centroid for Cluster 4: [-9.61389255 6.61523648 5.554753 7.40448412] Centroid for Cluster 5: [ 9.26783427 -2.3362629 5.79012882 0.58839881] Centroid for Cluster 6: [-1.53471591 2.88692801 -1.24914731 7.84152567] Centroid for Cluster 7: [ 9.51893731 5.97650833 -0.75679915 5.60573505]

HW4-3 a) Mean values: Unnamed: 0 49.500000 0 45.649995 dtype: float64 Standard Deviations: Unnamed: 0 29.011492 0 11.806190 dtype: float64 (b) Probability that a randomly selected value falls between 10 and 20: [0.06794193 0.0136401 ] (c) Z-scores of the data: Unnamed: 0 0 0 -1.706220 0.661288 1 -1.671751 -0.037898 2 -1.637282 0.827528 3 -1.602813 1.791382 4 -1.568344 -0.143483 .. ... ... 95 1.568344 -1.497154 96 1.602813 0.440410 97 1.637282 0.401800 98 1.671751 0.119978 99 1.706220 -0.143961 [100 rows x 2 columns] (d) Number of outliers: 100 Indices of outliers: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]