3130392A-6BA0-485F-A3C0-ACEB16115FCA
pdf
keyboard_arrow_up
School
New Jersey Institute Of Technology *
*We aren’t endorsed by this school
Course
634
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
5
Uploaded by DeaconLightning13472
HomeWork4
NAME:TARUN TARIKERE VENKATESHA
ID:tt383
HW 4-1 Association Rule
Based on the transactions below, considering minimum support=50% and
minimum confidence=30%,
a)Find frequent itemsets using Apriori
Calculate the support for each item:
Item
Count
Apples
3
Oranges
5
Bananas
4
Lemons
4
Avocados
3
Melons
3
Item
Support
Apples
50%
Oranges
83.3%
Bananas
66.6%
Lemons
66.6%
Avocados
50%
Melons
50%
Min threshold :50%
Candidate 2-itemsets: {Oranges, Bananas}, {Oranges, Lemons}, {Bananas, Lemons}
Itemset
Count
{Oranges,Bananas}
3
{Oranges,Lemons}
3
{Bananas,Lemons}
3
Itemset
Support
{Oranges,Bananas}
50%
{Oranges,Lemons}
50%
{Bananas,Lemons}
50%
Frequent 2-itemsets: {Oranges, Bananas, Lemons}
(b) List all the rules (in the form X, Y -> Z, meaning if someone buys X and Y, then they will
also buy Z). List the support and confidence matching the rules.
Frequent Itemsets:
support
itemsets
0
0.5
(Oranges)
1
0.5
(Bananas)
2
0.5
(Lemons)
3
0.5 (Oranges, Bananas)
4
0.5
(Oranges, Lemons)
5
0.5
(Bananas, Lemons)
For frequent itemset(oranges ,Bananas):
Support(Oranges, Bananas) = 0.5
Support(Oranges) = 0.5
Confidence(Oranges -> Bananas) = Support(Oranges, Bananas) / Support(Oranges) = 0.5 / 0.5 = 1.0
Support(Bananas) = 0.5
Confidence(Bananas -> Oranges) = Support(Oranges, Bananas) / Support(Bananas) = 0.5 / 0.5 = 1.0
For Frequent itemset (Oranges,Lemons)
Support(Oranges, Lemons) = 0.5
Support(Oranges) = 0.5
Confidence(Oranges -> Lemons) = Support(Oranges, Lemons) / (Support(Oranges)) = 0.5 / 0.5 = 1.0
Support(Lemons) = 0.5
Confidence(Lemons -> Oranges) = Support(Oranges, Lemons) /( Support(Lemons)) = 0.5 / 0.5 = 1.0
For freq itemset (Bananas ,Lemons)
Support(Bananas, Lemons) = 0.5
Support(Bananas) = 0.5
Confidence(Bananas -> Lemons) = Support(Bananas, Lemons) / (Support(Bananas) )= 0.5 / 0.5 = 1.0
Support(Lemons) = 0.5
Confidence(Lemons -> Bananas) = Support(Bananas, Lemons) / (Support(Lemons) )= 0.5 / 0.5 = 1.0
Oranges -> Bananas (Support: 0.5, Confidence: 1.0)
Bananas -> Oranges (Sup: 0.5, Conf: 1.0)
Oranges -> Lemons (Sup: 0.5, Conf: 1.0)
Lemons -> Oranges (Sup: 0.5, Conf: 1.0)
Bananas -> Lemons (Sup: 0.5, Conf: 1.0)
Lemons -> Bananas (Sup: 0.5, Conf: 1.0)
HW 4-2 Clustering
Download the X_clusters.csv file using the link
https://drive.google.com/file/d/1w6N0m10zeBDaEDT_v_2yWYg3_okoyFUA/view?usp=sharing
Links to an external site.
(a) Find how many clusters the data contains. Justify your answer. What is the average silhouette
value for your answer?
(b) Find the cluster centroids for each cluster using k-means clustering.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
a)
7 clusters has the high avg silhouette score of 0.97.This is best separation for cluster.If we increase further
silhouette score decreases.
b) Find the cluster centroids for each cluster using k-means clustering
.
Centroid for Cluster 1: [ 1.3586106
8.48729567 -8.5840372 -8.25855204]
Centroid for Cluster 2: [0.97182825 4.31776493 2.03768407 0.92724738]
Centroid for Cluster 3: [-7.64190625 2.79416346 -7.12545392 8.86662881]
Centroid for Cluster 4: [-9.61389255 6.61523648 5.554753
7.40448412]
Centroid for Cluster 5: [ 9.26783427 -2.3362629
5.79012882 0.58839881]
Centroid for Cluster 6: [-1.53471591 2.88692801 -1.24914731 7.84152567]
Centroid for Cluster 7: [ 9.51893731 5.97650833 -0.75679915 5.60573505]
HW4-3
a)
Mean values:
Unnamed:
0 49.500000
0 45.649995 dtype: float64
Standard Deviations:
Unnamed: 0 29.011492 0 11.806190
dtype: float64
(b) Probability that a randomly selected value falls between 10 and 20: [0.06794193 0.0136401 ]
(c) Z-scores of the data:
Unnamed: 0 0
0 -1.706220 0.661288
1 -1.671751 -0.037898
2 -1.637282 0.827528
3 -1.602813 1.791382
4 -1.568344 -0.143483
.. ... ...
95 1.568344 -1.497154
96 1.602813 0.440410
97 1.637282 0.401800
98 1.671751 0.119978
99 1.706220 -0.143961
[100 rows x 2 columns]
(d) Number of outliers: 100
Indices of outliers: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
Related Documents
Related Questions
estion Completion Status:
QUESTION 1
A drug company is testing a new pain reliever. They select 100 arthritis patients and
effects. 23% report that they experienced unwanted side effects.
Name any sample statisticS
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac)
conduct a double blind test of the drug to see if there are any unwanted side
iX
T TTArial
TE
3 (12pt)
Words 0
Path: p
0UFSTIO N 2
Save All An:P
bunit. Chok Sare AfAzisiuars to save allonswers
Chck Saue and Subrit to smue and
arrow_forward
Base on the same given data uploaded in module 4, will you conclude that the number of bathroom of houses is a significant factor for house price? I your answer is affirmative, you need to explain how the number of bathroom influences the house price, using a post hoc procedure. (Please treat number of bathrooms as a categorical variable in this analysis)
Base on the same given data, conduct an analysis for the variable sale price to see if sale price is influenced by living area. Summarize your finding including all regular steps for your method. Also, will you conclude that larger house corresponding to higher price (justify)?
arrow_forward
None
arrow_forward
the link to the data is given below. please help asap i will upvote!!
https://drive.google.com/file/d/1Ov-VfoR_pU25sj1Kg0LLxTHu6lxv_Wsc/view?usp=sharing
arrow_forward
Produce a CROSSTAB for the variables CLASS and RECYCLE. Before conducting your analysis, first reverse code SATJOB so that 1 = Very Dissatisfied; 2=Moderately Dissatisfied; 3=Moderately Satisfied; and 4=Very satisfied. Set all other values (8, 9) to system missing. Then examine the relationship between social class and job satisfaction by testing both statistical significance and strength of association. Use chi-square for significance and choose the appropriate measures for strength of association (Phi, Cramer’s V, Lambda, or Gamma). Set alpha to .05.
State the null and research hypotheses:
H0:
H1:
What is the obtained chi-square value?
What is the significance level (p-value) for the obtained chi-square?
Should we reject or fail to reject the null hypothesis?
Is there a statistically significant relationship between these variables?
Which measure of association would be most appropriate for these variables?
What is the value of the measure of association?
Interpret your findings by…
arrow_forward
Select the correct sentence in the passage.
Which sentence best supports the idea that artificial and human translators are comparable?
A Computer Can Now Translate Languages as Well as a Human
by Vanessa Bates Ramirez (excerpt)
In other words, it’s finally happened: computers are smarter than people. In a contest that tested the new software against human translators, it came close to matching the fluency of humans for some languages. People fluent in two languages scored the new system between 64 and 87 percent better than the previous one.
Google is already using the new system for Chinese to English translation and plans to completely replace its existing translation technique with the GMNT.
arrow_forward
national standard requires that public bridges over
20
feet in length must be inspected and rated every 2 years. The rating scale ranges from 0 (poorest rating) to 9 (highest rating). A group of engineers used a probabilistic model to forecast the inspection ratings of all major bridges in a city. For the year 2020, the engineers forecast that
4%
of all major bridges in that city will have ratings of 4 or below. Complete parts a and
b.
a. Use the forecast to find the probability that in a random sample of
8
major bridges in the city, at least 3 will have an inspection rating of 4 or below in 2020.
P(x≥3)=nothing
(Round to five decimal places as needed.)
arrow_forward
Please help!!
From these following topics in CANADA, choose one of the 3, and formulate a research question, in which you could analyze the data.
Sawn wood: Sawn wood, production, deliveries and stocks by species
Weather: Weather data for Hamilton January 2020
Basketball - The Raptors: Statistics with the players
arrow_forward
Find part B
arrow_forward
aing Gorilla Glass
ortal
O CompassLearning
thelearningo
Chrome - Learning Activity (#QZM5004) AQ5MA QZM5004_04
dgenuit
A thelearningodyssey.com/Assess/RuntimeQuestion.aspx?Taskld-D0&LAld=261428&Assignmentld%-D0
3 4 5
Send Message
Turn In
Edgenuity Quiz
7 Help
OExit
Find the digit 6 in each of these numbers:
364
346
Which sentence correctly compares the values?
pok
The value of the 6 in 364 is 10 the value of the 6 in
O A)
346.
The value of the 6 in 364 is 100 times the value of the
O B)
6 in 346.
The value of the 6 in 364 is 100 the value of the 6 in
OC)
346.
The value of the 6 in 364 is 10 times the value of the
O D)
acer
女
Ce
&
#3
$4
8.
9.
6
7
个
arrow_forward
A data set contains 10 numerical predictors, 7 categorical variables with 2 levels, and 3 categorical variables with 4 levels. How many candidate models does the “all possible” procedure generate? (please show how to do it)
arrow_forward
MUST USE SPSS PLEASE!
A researcher wants to examine the relationship between a freshman’s motivation and GPA after their first semester enrolled in a university. The researcher sent out an initial survey to measure the motivation of a small group of incoming students at the beginning of the semester and a follow up questionnaire examining their GPA at the end of the semester. Use the following data the researcher gathered to complete each step of the assignment instructions. What would be the predicted first semester GPA of a student who attained a motivational score of 4.5 at the beginning of the semester?
Motivation
GPA
1
3.4
6
3.4
2
2.5
7
3.1
5
2.8
4
2.6
3
2.1
1
1.6
8
3.1
6
2.6
5
3.2
6
3.1
5
3.2
5
2.7
6
2.8
6
2.6
7
2.5
7
2.8
2
1.8
9
3.7
Use the SPSS program to calculate the correlation and create a scatterplot.
Provide the appropriate output given by the program.
Describe this relationship (both in strength and direction…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- estion Completion Status: QUESTION 1 A drug company is testing a new pain reliever. They select 100 arthritis patients and effects. 23% report that they experienced unwanted side effects. Name any sample statisticS For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac) conduct a double blind test of the drug to see if there are any unwanted side iX T TTArial TE 3 (12pt) Words 0 Path: p 0UFSTIO N 2 Save All An:P bunit. Chok Sare AfAzisiuars to save allonswers Chck Saue and Subrit to smue andarrow_forwardBase on the same given data uploaded in module 4, will you conclude that the number of bathroom of houses is a significant factor for house price? I your answer is affirmative, you need to explain how the number of bathroom influences the house price, using a post hoc procedure. (Please treat number of bathrooms as a categorical variable in this analysis) Base on the same given data, conduct an analysis for the variable sale price to see if sale price is influenced by living area. Summarize your finding including all regular steps for your method. Also, will you conclude that larger house corresponding to higher price (justify)?arrow_forwardNonearrow_forward
- the link to the data is given below. please help asap i will upvote!! https://drive.google.com/file/d/1Ov-VfoR_pU25sj1Kg0LLxTHu6lxv_Wsc/view?usp=sharingarrow_forwardProduce a CROSSTAB for the variables CLASS and RECYCLE. Before conducting your analysis, first reverse code SATJOB so that 1 = Very Dissatisfied; 2=Moderately Dissatisfied; 3=Moderately Satisfied; and 4=Very satisfied. Set all other values (8, 9) to system missing. Then examine the relationship between social class and job satisfaction by testing both statistical significance and strength of association. Use chi-square for significance and choose the appropriate measures for strength of association (Phi, Cramer’s V, Lambda, or Gamma). Set alpha to .05. State the null and research hypotheses: H0: H1: What is the obtained chi-square value? What is the significance level (p-value) for the obtained chi-square? Should we reject or fail to reject the null hypothesis? Is there a statistically significant relationship between these variables? Which measure of association would be most appropriate for these variables? What is the value of the measure of association? Interpret your findings by…arrow_forwardSelect the correct sentence in the passage. Which sentence best supports the idea that artificial and human translators are comparable? A Computer Can Now Translate Languages as Well as a Human by Vanessa Bates Ramirez (excerpt) In other words, it’s finally happened: computers are smarter than people. In a contest that tested the new software against human translators, it came close to matching the fluency of humans for some languages. People fluent in two languages scored the new system between 64 and 87 percent better than the previous one. Google is already using the new system for Chinese to English translation and plans to completely replace its existing translation technique with the GMNT.arrow_forward
- national standard requires that public bridges over 20 feet in length must be inspected and rated every 2 years. The rating scale ranges from 0 (poorest rating) to 9 (highest rating). A group of engineers used a probabilistic model to forecast the inspection ratings of all major bridges in a city. For the year 2020, the engineers forecast that 4% of all major bridges in that city will have ratings of 4 or below. Complete parts a and b. a. Use the forecast to find the probability that in a random sample of 8 major bridges in the city, at least 3 will have an inspection rating of 4 or below in 2020. P(x≥3)=nothing (Round to five decimal places as needed.)arrow_forwardPlease help!! From these following topics in CANADA, choose one of the 3, and formulate a research question, in which you could analyze the data. Sawn wood: Sawn wood, production, deliveries and stocks by species Weather: Weather data for Hamilton January 2020 Basketball - The Raptors: Statistics with the playersarrow_forwardFind part Barrow_forward
- aing Gorilla Glass ortal O CompassLearning thelearningo Chrome - Learning Activity (#QZM5004) AQ5MA QZM5004_04 dgenuit A thelearningodyssey.com/Assess/RuntimeQuestion.aspx?Taskld-D0&LAld=261428&Assignmentld%-D0 3 4 5 Send Message Turn In Edgenuity Quiz 7 Help OExit Find the digit 6 in each of these numbers: 364 346 Which sentence correctly compares the values? pok The value of the 6 in 364 is 10 the value of the 6 in O A) 346. The value of the 6 in 364 is 100 times the value of the O B) 6 in 346. The value of the 6 in 364 is 100 the value of the 6 in OC) 346. The value of the 6 in 364 is 10 times the value of the O D) acer 女 Ce & #3 $4 8. 9. 6 7 个arrow_forwardA data set contains 10 numerical predictors, 7 categorical variables with 2 levels, and 3 categorical variables with 4 levels. How many candidate models does the “all possible” procedure generate? (please show how to do it)arrow_forwardMUST USE SPSS PLEASE! A researcher wants to examine the relationship between a freshman’s motivation and GPA after their first semester enrolled in a university. The researcher sent out an initial survey to measure the motivation of a small group of incoming students at the beginning of the semester and a follow up questionnaire examining their GPA at the end of the semester. Use the following data the researcher gathered to complete each step of the assignment instructions. What would be the predicted first semester GPA of a student who attained a motivational score of 4.5 at the beginning of the semester? Motivation GPA 1 3.4 6 3.4 2 2.5 7 3.1 5 2.8 4 2.6 3 2.1 1 1.6 8 3.1 6 2.6 5 3.2 6 3.1 5 3.2 5 2.7 6 2.8 6 2.6 7 2.5 7 2.8 2 1.8 9 3.7 Use the SPSS program to calculate the correlation and create a scatterplot. Provide the appropriate output given by the program. Describe this relationship (both in strength and direction…arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- College AlgebraAlgebraISBN:9781305115545Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL