Weekly Quiz - K-means Clustering

pdf

School

University of Texas *

*We aren’t endorsed by this school

Course

DSBA

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by BrigadierRainCat57

Q No: 1 CCorrect AnsweD Marks: 2/2 What does the predict() function of the sklearn KMeans class return? ter to which a data point belongs '(Ym Se ected>' The predict()Links to an external site. function is used to predict the closest cluster to which a data point belongs. Q No: 2 (Correct Answer> Marks: 1/1 For K-means clustering, what will be the cluster centroids for the following 2 clusters? C1:{(3,9),(5,4), (4,6)} C2: {(6,0),(8,1), (7,2} <Yo\. Se ected)» C1={(3+5+4)/3, (5+4+6)/3} = {4,5} C2 = {(6+8+7)/3, (0+1+2)/3} = {71}

Q No: 3 <Correct Answer) Marks: 2/2 When doing K-means clustering, what will be the Euclidean distance of a point A(4,0) be from the centroid of the cluster which has two data points (3,3) and (5,5)? ] ré D\ 4 ( You Selected ) \o - e/ Cluster centroid C1 = {((3+5)/2) , ((3+5)/2)} = {4,4} Distance between the point A{4,0} and centroid C1{4,4} 2 2 Euclidean distance = \/(4 -4) +@-0) 4 Q No: 4 (Correct Answer> Marks: 2/z Which of the following are considered to be the weakness of K-means clustering? Finding out the ideal value of K is complex and time-consuming Susceptible to the curse of dimensionality Not sensitive to starting positions of the initial centroid Not sensitive to outliers 1l and 2 |<Yo¢ Se ected)l . Finding an ideal value of K requires multiple iterations with different values of K to see which value of K has the lowest within-cluster sum of squared errors. . K-means clustering is considered to be affected by the curse of dimensionality. As the no. of dimensions increases, the computational complexity of K-means clustering increases. . K-means clustering is considered to be sensitive to the starting position as this determines the position of the centroids of the clusters. . K-means clustering is sensitive to outliers. Outliers significantly affect the position of the centroid in K- means clustering.

Q No: 5 (Correct Answer) Marks: 2/2 While using K-means clustering, we scale the variables before we do clustering. This is done primarily to Q ert the data to same scale hence variables which are of different units are giver |/Yo 1 Se ected\' N / Scaling the data brings all the attributes to similar scale which makes equal importance to all the attributes while performing clustering No: 6 ( Correct Answer ) Marks: 1/1 Consider the following elbow plot: 1S 4 304 0S5 <4 — ~ - - w4 o ~ @ < While performing K-means clustering, what is the ideal value of K to choose based on the above plot? (You‘ Se ected)l As the slope decreases drastically from 2 to 3 and at cluster point 3, the graph takes a sharp turn, 3 is considered to be the ideal no. of clusters.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Q No:7 (Correct Answer) Marks: 1/1 Which of the following is NOT true in the case of K-means clustering? The data points that are the farthest from a centroid will create a cluster centerec 1/-\{ou Se ected\. AN - J K-means clusters data by separating data points into group based on distance The data points that are the closest to a centroid will create a cluster. If we're using the Euclidean distance between data points and every centroid, a straight line is drawn between two centroids, then a perpendicular bisector (boundary line) divides this line into two clusters Q No: 8 (Correct Answer) Marks: 1/1 What is the default value of n_clusters in sklearn.cluster.KMeans, the K-means clustering class in Scikit-learn? (O D l\YoxA Se ecteq/i In sklearn.cluster.KMeansLinks to an external site., the n_clusters is an optional parameter that takes an integer value specifying the number of clusters to form as well as the number of centroids to generate. The default value is 8.

Q No: 9 <Correct Answer) Marks: 2/2 Which of the following are Unsupervised Learning techniques? Hierarchical Clustering Random Forests K-means Clustering Logistic Regression 1 and 2 1 and 3 l(You Se ected)' Both forms of clustering (K-means and Hierarchical) are considered to be forms of Unsupervised Learning as we don't categorize the data into dependent and independent variables before clustering. As for Random Forests and Logistic Regression, we separate the data into dependent and independent variables before applying the algorithms. So, they are Supervised Learning techniques. Q No: 10 <Correct Answ@ Marks: 1/1 In K-means clustering, suppose the number of clusters is equal to the number of data points (observations). Then what will be the sum of squared errors within each group (or cluster)? Approaches infinity (very large number) |<Yo~. Se ected>| With increase in the no of clusters, ideally, the within group sum of squared errors decreases. As the no of clusters increase and is equal to the no of observations the within group sum of squared errors becomes zero. When there is only one data point in a cluster, the data point itself becomes the centroid. Distance from the point to the centroid is always zero

Related Documents

15.6.2 Lab - Configure IPv4 and IPv6 Static and Default Routes.docx

7.4.2 Lab - Implement DHCPv4.docx

13.1.10 Packet Tracer - Configure a Wireless Network.docx

a4-soln.pdf

Journal 2-2.docx

Lab 1_Linear Kinematics _sprint analysis.docx

tesol104_document_l06Homework - Simone Bagni.docx

PRIYA LR&CFS QUIZ-1.docx

PRIYA LR&CFS ASSIGNMENT-3.docx

SUMANTH ADS Assignment-5 .docx

WK3Assgn_Bowman_A.doc

MST300NSC Primary Addendum Winter 2024.docx

Recommended textbooks for you

C++ Programming: From Problem Analysis to Program...

Computer Science

ISBN:9781337102087

Author:D. S. Malik

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning