After learning about the k-means clustering algorithm in the big data course, some of your classmates tell you that they are not very enthusiastic about using it. The main reason they provide is that, when applied to the same dataset, the algorithm seems to be giving different clusters every times it is run. What should you say to them? You should explain to them that they are interpreting the computer output incorrectly. Even though K-means seems to give different clusters every time it is run on the same dataset, if they look more closely at those clusters, they will notice that they are really the same clusters, but with different labels. You should explain to them that they are using the computer functions incorrectly. The K-means algorithm always results in the same clusters. You should explain to them that they should run the k-means algorithm several times and then pick up the clusters with the smallest objective function (all while warning them that, for the same value of the objective function, some apparently different clusters only seem to be different because of the labels of the clusters). You should advise them to run the algorithm only once. The point of the algorithm is to divide the data into clusters and that is precisely what the algorithm does after one run.
After learning about the k-means clustering
You should explain to them that they are interpreting the computer output incorrectly. Even though K-means seems to give different clusters every time it is run on the same dataset, if they look more closely at those clusters, they will notice that they are really the same clusters, but with different labels. |
||
You should explain to them that they are using the computer functions incorrectly. The K-means algorithm always results in the same clusters. |
||
You should explain to them that they should run the k-means algorithm several times and then pick up the clusters with the smallest objective function (all while warning them that, for the same value of the objective function, some apparently different clusters only seem to be different because of the labels of the clusters). |
||
You should advise them to run the algorithm only once. The point of the algorithm is to divide the data into clusters and that is precisely what the algorithm does after one run. |
Trending now
This is a popular solution!
Step by step
Solved in 2 steps