Weekly Quiz - K-means Clustering

pdf

School

University of Texas *

*We aren’t endorsed by this school

Course

DSBA

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

5

Uploaded by BrigadierRainCat57

Report
Q No: 1 CCorrect AnsweD Marks: 2/2 What does the predict() function of the sklearn KMeans class return? ter to which a data point belongs '(Ym Se ected>' The predict()Links to an external site. function is used to predict the closest cluster to which a data point belongs. Q No: 2 (Correct Answer> Marks: 1/1 For K-means clustering, what will be the cluster centroids for the following 2 clusters? C1:{(3,9),(5,4), (4,6)} C2: {(6,0),(8,1), (7,2} <Yo\. Se ected)» C1={(3+5+4)/3, (5+4+6)/3} = {4,5} C2 = {(6+8+7)/3, (0+1+2)/3} = {71}
Q No: 3 <Correct Answer) Marks: 2/2 When doing K-means clustering, what will be the Euclidean distance of a point A(4,0) be from the centroid of the cluster which has two data points (3,3) and (5,5)? ] D\ 4 ( You Selected ) \o - e/ Cluster centroid C1 = {((3+5)/2) , ((3+5)/2)} = {4,4} Distance between the point A{4,0} and centroid C1{4,4} 2 2 Euclidean distance = \/(4 -4) +@-0) 4 Q No: 4 (Correct Answer> Marks: 2/z Which of the following are considered to be the weakness of K-means clustering? Finding out the ideal value of K is complex and time-consuming Susceptible to the curse of dimensionality Not sensitive to starting positions of the initial centroid Not sensitive to outliers 1l and 2 |<Yo¢ Se ected)l . Finding an ideal value of K requires multiple iterations with different values of K to see which value of K has the lowest within-cluster sum of squared errors. . K-means clustering is considered to be affected by the curse of dimensionality. As the no. of dimensions increases, the computational complexity of K-means clustering increases. . K-means clustering is considered to be sensitive to the starting position as this determines the position of the centroids of the clusters. . K-means clustering is sensitive to outliers. Outliers significantly affect the position of the centroid in K- means clustering.
Q No: 5 (Correct Answer) Marks: 2/2 While using K-means clustering, we scale the variables before we do clustering. This is done primarily to Q ert the data to same scale hence variables which are of different units are giver |/Yo 1 Se ected\' N / Scaling the data brings all the attributes to similar scale which makes equal importance to all the attributes while performing clustering No: 6 ( Correct Answer ) Marks: 1/1 Consider the following elbow plot: 1S 4 304 0S5 <4 ~ - - w4 o ~ @ < While performing K-means clustering, what is the ideal value of K to choose based on the above plot? (You‘ Se ected)l As the slope decreases drastically from 2 to 3 and at cluster point 3, the graph takes a sharp turn, 3 is considered to be the ideal no. of clusters.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q No:7 (Correct Answer) Marks: 1/1 Which of the following is NOT true in the case of K-means clustering? The data points that are the farthest from a centroid will create a cluster centerec 1/-\{ou Se ected\. AN - J K-means clusters data by separating data points into group based on distance The data points that are the closest to a centroid will create a cluster. If we're using the Euclidean distance between data points and every centroid, a straight line is drawn between two centroids, then a perpendicular bisector (boundary line) divides this line into two clusters Q No: 8 (Correct Answer) Marks: 1/1 What is the default value of n_clusters in sklearn.cluster.KMeans, the K-means clustering class in Scikit-learn? (O D l\YoxA Se ecteq/i In sklearn.cluster.KMeansLinks to an external site., the n_clusters is an optional parameter that takes an integer value specifying the number of clusters to form as well as the number of centroids to generate. The default value is 8.
Q No: 9 <Correct Answer) Marks: 2/2 Which of the following are Unsupervised Learning techniques? Hierarchical Clustering Random Forests K-means Clustering Logistic Regression 1 and 2 1 and 3 l(You Se ected)' Both forms of clustering (K-means and Hierarchical) are considered to be forms of Unsupervised Learning as we don't categorize the data into dependent and independent variables before clustering. As for Random Forests and Logistic Regression, we separate the data into dependent and independent variables before applying the algorithms. So, they are Supervised Learning techniques. Q No: 10 <Correct Answ@ Marks: 1/1 In K-means clustering, suppose the number of clusters is equal to the number of data points (observations). Then what will be the sum of squared errors within each group (or cluster)? Approaches infinity (very large number) |<Yo~. Se ected>| With increase in the no of clusters, ideally, the within group sum of squared errors decreases. As the no of clusters increase and is equal to the no of observations the within group sum of squared errors becomes zero. When there is only one data point in a cluster, the data point itself becomes the centroid. Distance from the point to the centroid is always zero