3130392A-6BA0-485F-A3C0-ACEB16115FCA
pdf
keyboard_arrow_up
School
New Jersey Institute Of Technology *
*We aren’t endorsed by this school
Course
634
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
5
Uploaded by DeaconLightning13472
HomeWork4
NAME:TARUN TARIKERE VENKATESHA
ID:tt383
HW 4-1 Association Rule
Based on the transactions below, considering minimum support=50% and
minimum confidence=30%,
a)Find frequent itemsets using Apriori
Calculate the support for each item:
Item
Count
Apples
3
Oranges
5
Bananas
4
Lemons
4
Avocados
3
Melons
3
Item
Support
Apples
50%
Oranges
83.3%
Bananas
66.6%
Lemons
66.6%
Avocados
50%
Melons
50%
Min threshold :50%
Candidate 2-itemsets: {Oranges, Bananas}, {Oranges, Lemons}, {Bananas, Lemons}
Itemset
Count
{Oranges,Bananas}
3
{Oranges,Lemons}
3
{Bananas,Lemons}
3
Itemset
Support
{Oranges,Bananas}
50%
{Oranges,Lemons}
50%
{Bananas,Lemons}
50%
Frequent 2-itemsets: {Oranges, Bananas, Lemons}
(b) List all the rules (in the form X, Y -> Z, meaning if someone buys X and Y, then they will
also buy Z). List the support and confidence matching the rules.
Frequent Itemsets:
support
itemsets
0
0.5
(Oranges)
1
0.5
(Bananas)
2
0.5
(Lemons)
3
0.5 (Oranges, Bananas)
4
0.5
(Oranges, Lemons)
5
0.5
(Bananas, Lemons)
For frequent itemset(oranges ,Bananas):
Support(Oranges, Bananas) = 0.5
Support(Oranges) = 0.5
Confidence(Oranges -> Bananas) = Support(Oranges, Bananas) / Support(Oranges) = 0.5 / 0.5 = 1.0
Support(Bananas) = 0.5
Confidence(Bananas -> Oranges) = Support(Oranges, Bananas) / Support(Bananas) = 0.5 / 0.5 = 1.0
For Frequent itemset (Oranges,Lemons)
Support(Oranges, Lemons) = 0.5
Support(Oranges) = 0.5
Confidence(Oranges -> Lemons) = Support(Oranges, Lemons) / (Support(Oranges)) = 0.5 / 0.5 = 1.0
Support(Lemons) = 0.5
Confidence(Lemons -> Oranges) = Support(Oranges, Lemons) /( Support(Lemons)) = 0.5 / 0.5 = 1.0
For freq itemset (Bananas ,Lemons)
Support(Bananas, Lemons) = 0.5
Support(Bananas) = 0.5
Confidence(Bananas -> Lemons) = Support(Bananas, Lemons) / (Support(Bananas) )= 0.5 / 0.5 = 1.0
Support(Lemons) = 0.5
Confidence(Lemons -> Bananas) = Support(Bananas, Lemons) / (Support(Lemons) )= 0.5 / 0.5 = 1.0
Oranges -> Bananas (Support: 0.5, Confidence: 1.0)
Bananas -> Oranges (Sup: 0.5, Conf: 1.0)
Oranges -> Lemons (Sup: 0.5, Conf: 1.0)
Lemons -> Oranges (Sup: 0.5, Conf: 1.0)
Bananas -> Lemons (Sup: 0.5, Conf: 1.0)
Lemons -> Bananas (Sup: 0.5, Conf: 1.0)
HW 4-2 Clustering
Download the X_clusters.csv file using the link
https://drive.google.com/file/d/1w6N0m10zeBDaEDT_v_2yWYg3_okoyFUA/view?usp=sharing
Links to an external site.
(a) Find how many clusters the data contains. Justify your answer. What is the average silhouette
value for your answer?
(b) Find the cluster centroids for each cluster using k-means clustering.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
a)
7 clusters has the high avg silhouette score of 0.97.This is best separation for cluster.If we increase further
silhouette score decreases.
b) Find the cluster centroids for each cluster using k-means clustering
.
Centroid for Cluster 1: [ 1.3586106
8.48729567 -8.5840372 -8.25855204]
Centroid for Cluster 2: [0.97182825 4.31776493 2.03768407 0.92724738]
Centroid for Cluster 3: [-7.64190625 2.79416346 -7.12545392 8.86662881]
Centroid for Cluster 4: [-9.61389255 6.61523648 5.554753
7.40448412]
Centroid for Cluster 5: [ 9.26783427 -2.3362629
5.79012882 0.58839881]
Centroid for Cluster 6: [-1.53471591 2.88692801 -1.24914731 7.84152567]
Centroid for Cluster 7: [ 9.51893731 5.97650833 -0.75679915 5.60573505]
HW4-3
a)
Mean values:
Unnamed:
0 49.500000
0 45.649995 dtype: float64
Standard Deviations:
Unnamed: 0 29.011492 0 11.806190
dtype: float64
(b) Probability that a randomly selected value falls between 10 and 20: [0.06794193 0.0136401 ]
(c) Z-scores of the data:
Unnamed: 0 0
0 -1.706220 0.661288
1 -1.671751 -0.037898
2 -1.637282 0.827528
3 -1.602813 1.791382
4 -1.568344 -0.143483
.. ... ...
95 1.568344 -1.497154
96 1.602813 0.440410
97 1.637282 0.401800
98 1.671751 0.119978
99 1.706220 -0.143961
[100 rows x 2 columns]
(d) Number of outliers: 100
Indices of outliers: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
Related Documents
Related Questions
A random sample of 19 health maintenance organizations (HMOS) was selected. For each HMO, the co-payment (in dollars)
for a doctor's office visit was recorded. The results are as follows.
12, 10, 10, 5, 6, 7, 7, 6, 12, 9, 9, 9, 11, 11, 6, 8, 7, 11, 10
Send data to calculator
Send data to Excel
Under the assumption that co-payment amounts are normally distributed, find a 95% confidence interval for the mean co-
payment amount in dollars. Then find the lower limit and upper limit of the 95% confidence interval.
Carry your intermediate computations to at least three decimal places. Round your answers to one decimal place. (If
necessary, consult a list of formulas.)
Lower limit:||
Upper limit:|
arrow_forward
The interaction effect in a factorial design is considered, because it may modify the main effects of the independent variables.
True
False
arrow_forward
Q: Consider a set of equity returns from 4 different markets across 12 different periods. Define the fourth variable as the response variable (Y) . Define the following 3 models (attached image along with questions)
arrow_forward
In factorial designs, the response produced when the treatments of one factor interact with the treatments of another in influencing the response variable is known as?
arrow_forward
A random sample of 17 health maintenance organizations (HMOs) was selected. For each HMO, the co-payment (in dollars) for a doctor's office visit was recorded. The results are as follows.49,56,52,51,26,27, 57, 49, 48, 31, 42, 39, 31, 30, 44,45, 29Send data to calculator Send data to ExcelUnder the assumption that co-payment amounts are normally distributed, find a 90% confidence interval for the mean co-payment amount in dollars. Give the lower limit and upper limit of the 90% confidence interval.
Carry your intermediate computations to at least three decimal places. Round your answers to one decimal place. (If necessary, consult a list of formulas.)
Lower limit:
Upper limit:
A random sample of
17
health maintenance organizations (HMOs) was selected. For each HMO, the co-payment (in dollars) for a doctor's office visit was recorded. The results are as follows.
49
,
56
,
52
,
51
,
26
,
27
,
57
,
49
,
48
,
31
,
42
,
39
,
31
,…
arrow_forward
mc
arrow_forward
Data was collected for 40 randomly selected trees growing in the median strip of a busy freeway. The height of each tree (in inches) was recorded. Data was recorded to the nearest whole inch. The data is summarized in the histogram below.
Based on this histogram, determing the minimum and maximum possible tree heights for the recorded data. Note: Each class contains its lower class boundary, but not its upper class boundary.Minimum tree height = inchesMaximum tree height = inches
arrow_forward
If there are 65 successes of 150 trials in the one group, and 115 successes of 200 in another, test if the groups significantly different? Use ? = 0.05.
arrow_forward
None
arrow_forward
At a stop sign, some drivers come to a full stop, some come to a `rolling stop' (not a full stop, but slow down), and some do not stop at all. We would like to test if there is an association between gender and type of stop (full, rolling, or no stop). We collect data by standing a few feet from a stop sign and taking note of type of stop and the gender of the driver. Below is a contingency table summarizing the data we collected.
Male
Female
Full stop
6
6
Rolling stop
16
15
No stop
4
3
If gender is not associated with type of stop, how many males would we expect to not stop at all?
6.24
5.76
3.64
3.36
arrow_forward
Days Precipitation
Yield
261
34.2
115
215
53.7
178
202
42.8
131
238
36.9
147
170
39.1
137
323
13.4
191
220
63.2
133
arrow_forward
A market researcher for an automobile company suspects differences in preferred color between male and female buyers.
Advertisements targeted to different groups should take such differences into account, if they exist. The researcher examines the most
recent sales information of a particular car that comes in three colors. (You may find it useful to reference the appropriate table: chi-
square table or F table)
Sex of Automobile Buyer.
Color
Male
Female
Silver
477
298
Black
536
308
Red
482
348
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- A random sample of 19 health maintenance organizations (HMOS) was selected. For each HMO, the co-payment (in dollars) for a doctor's office visit was recorded. The results are as follows. 12, 10, 10, 5, 6, 7, 7, 6, 12, 9, 9, 9, 11, 11, 6, 8, 7, 11, 10 Send data to calculator Send data to Excel Under the assumption that co-payment amounts are normally distributed, find a 95% confidence interval for the mean co- payment amount in dollars. Then find the lower limit and upper limit of the 95% confidence interval. Carry your intermediate computations to at least three decimal places. Round your answers to one decimal place. (If necessary, consult a list of formulas.) Lower limit:|| Upper limit:|arrow_forwardThe interaction effect in a factorial design is considered, because it may modify the main effects of the independent variables. True Falsearrow_forwardQ: Consider a set of equity returns from 4 different markets across 12 different periods. Define the fourth variable as the response variable (Y) . Define the following 3 models (attached image along with questions)arrow_forward
- In factorial designs, the response produced when the treatments of one factor interact with the treatments of another in influencing the response variable is known as?arrow_forwardA random sample of 17 health maintenance organizations (HMOs) was selected. For each HMO, the co-payment (in dollars) for a doctor's office visit was recorded. The results are as follows.49,56,52,51,26,27, 57, 49, 48, 31, 42, 39, 31, 30, 44,45, 29Send data to calculator Send data to ExcelUnder the assumption that co-payment amounts are normally distributed, find a 90% confidence interval for the mean co-payment amount in dollars. Give the lower limit and upper limit of the 90% confidence interval. Carry your intermediate computations to at least three decimal places. Round your answers to one decimal place. (If necessary, consult a list of formulas.) Lower limit: Upper limit: A random sample of 17 health maintenance organizations (HMOs) was selected. For each HMO, the co-payment (in dollars) for a doctor's office visit was recorded. The results are as follows. 49 , 56 , 52 , 51 , 26 , 27 , 57 , 49 , 48 , 31 , 42 , 39 , 31 ,…arrow_forwardmcarrow_forward
- Data was collected for 40 randomly selected trees growing in the median strip of a busy freeway. The height of each tree (in inches) was recorded. Data was recorded to the nearest whole inch. The data is summarized in the histogram below. Based on this histogram, determing the minimum and maximum possible tree heights for the recorded data. Note: Each class contains its lower class boundary, but not its upper class boundary.Minimum tree height = inchesMaximum tree height = inchesarrow_forwardIf there are 65 successes of 150 trials in the one group, and 115 successes of 200 in another, test if the groups significantly different? Use ? = 0.05.arrow_forwardNonearrow_forward
- At a stop sign, some drivers come to a full stop, some come to a `rolling stop' (not a full stop, but slow down), and some do not stop at all. We would like to test if there is an association between gender and type of stop (full, rolling, or no stop). We collect data by standing a few feet from a stop sign and taking note of type of stop and the gender of the driver. Below is a contingency table summarizing the data we collected. Male Female Full stop 6 6 Rolling stop 16 15 No stop 4 3 If gender is not associated with type of stop, how many males would we expect to not stop at all? 6.24 5.76 3.64 3.36arrow_forwardDays Precipitation Yield 261 34.2 115 215 53.7 178 202 42.8 131 238 36.9 147 170 39.1 137 323 13.4 191 220 63.2 133arrow_forwardA market researcher for an automobile company suspects differences in preferred color between male and female buyers. Advertisements targeted to different groups should take such differences into account, if they exist. The researcher examines the most recent sales information of a particular car that comes in three colors. (You may find it useful to reference the appropriate table: chi- square table or F table) Sex of Automobile Buyer. Color Male Female Silver 477 298 Black 536 308 Red 482 348arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- College AlgebraAlgebraISBN:9781305115545Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL