The K-mean Clustering Analysis

docx

School

Concordia University Portland *

*We aren’t endorsed by this school

Course

543

Subject

Computer Science

Date

Jun 24, 2024

Type

docx

Pages

8

Uploaded by chanduRapolu123

Report
Select a dataset from UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets.php I’ve selected this Dataset https://archive.ics.uci.edu/dataset/53/iris Describe the dataset you select. I've chosen this dataset because it's a classic example used in statistical and machine learning studies. The Iris dataset includes three different species of iris flowers, with 50 samples for each species. Each flower has measurements for sepal length, sepal width, petal length, and petal width. I can analyze the relationships between these measurements to understand how they differ between species. I can also predict the species of an iris flower based on its measurements. This dataset is perfect for practicing classification techniques and exploring how different features contribute to identifying the species of iris flowers. Perform a K-Means cluster analysis on the variables of the dataset you select. I’m unable to preform k mean cluster so I’ve executed in R console After performing K-Means clustering on the Iris dataset, I got these values. The dataset includes measurements of sepal length, sepal width, petal length, and petal width for three iris species. The K-Means clustering grouped the flowers into three clusters with sizes 62, 38, and 50. The explained proportion of variance by the clusters is 88.43%, and the average silhouette score is 0.5528, indicating a reasonable clustering quality. The cluster centers are as follows: Cluster 1 has a sepal length of 5.90, sepal width of 2.75, petal length of 4.39, and petal width of 1.43. Cluster 2 has a sepal length of 6.85, sepal width of 3.07, petal length of 5.74, and petal width of 2.07. Cluster 3 has a sepal length of 5.01, sepal width of 3.43, petal length of 1.46, and petal width of 0.25. The performance metrics show a maximum diameter of 2.68, minimum separation of 0.26, and a Dunn Index of 0.099. I think this analysis shows that K-Means clustering can effectively group iris flowers based on their physical measurements, with reasonably distinct clusters.
Set up and run the model in JASP by selecting “Number of Clusters” values Similarly, like above I’ve executed R code and I tried visualizing the data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Identify and explain the optimal “Number of Clusters” value. Optimal Number of Clusters After performing K-Means clustering on the Iris dataset, the "Elbow Method" was used to identify the optimal number of clusters. The elbow plot, which shows the within-cluster sum of squares (WSS) against the number of clusters, helps in determining this. From the plot, the "elbow" point, where the WSS starts to decrease more slowly, is at three clusters. This indicates that three clusters are the optimal choice for this dataset.
After performing K-Means clustering, I visualized the results using a Cluster Plot Matrix. The Cluster Plot Matrix shows scatter plots for different combinations of features, with points colored according to their cluster assignments. Cluster Plot Matrix: Sepal Length vs. Sepal Width: In the plot of Sepal Length against Sepal Width, the clusters are reasonably well-separated. The green cluster is distinct from the other two, while the black and red clusters show some overlap. Sepal Length vs. Petal Length: The clusters are more clearly separated in this plot. The green cluster is isolated from the black and red clusters, which also show distinct boundaries from each other. Sepal Length vs. Petal Width: This plot also shows good separation. The green cluster is clearly apart from the other two clusters, with the black and red clusters being distinct from each other. Sepal Width vs. Petal Length: This plot demonstrates excellent separation among the clusters. The green, black, and red clusters are all well-separated, with minimal overlap. Sepal Width vs. Petal Width: Similar to the previous plot, the clusters are distinctly separated. The green cluster stands out, and the black and red clusters are also well-separated. Petal Length vs. Petal Width: This plot provides the clearest separation among the clusters. The green, black, and red clusters are distinctly apart from one another, showing that these features are highly effective in distinguishing the clusters. Discuss a basic approach to the problem. Basic Approach I downloaded the Iris dataset from the UCI Machine Learning Repository, which includes measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers. I've removed some null values by executing R commands to ensure the dataset was clean and ready for analysis. After that, I've executed an R program to perform cluster analysis using the K-Means clustering algorithm. I selected the relevant features (sepal length, sepal width, petal length, and petal width) for clustering and used the Elbow Method to determine the optimal number of clusters, which was three. I then evaluated the clustering results using metrics such as within-cluster sum of squares and silhouette scores. To visualize the results, I've created plots using the ggplot2 library in R, which helped to clearly demonstrate the distinct clusters formed by the K-Means algorithm. Review the model results information in the JASP and summarizing the key points, including screenshots. Model Results in JASP The results from the K-Means clustering analysis in JASP are summarized as follows: Cluster Sizes: The three clusters have sizes 62, 38, and 50. Explained Proportion of Variance: The clustering explains 88.43% of the variance in the data. Silhouette Score: The average silhouette score is 0.5528, indicating a reasonable clustering quality. Cluster Centers: Cluster 1: Sepal length 5.90, sepal width 2.75, petal length 4.39, petal width 1.43
Cluster 2: Sepal length 6.85, sepal width 3.07, petal length 5.74, petal width 2.07 Cluster 3: Sepal length 5.01, sepal width 3.43, petal length 1.46, petal width 0.25 Performance Metrics: Maximum Diameter: 2.68 Minimum Separation: 0.26 Dunn Index: 0.099
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Discuss general findings, and overall conclusions based on the results of the analysis. Based on the analysis, K-Means clustering effectively grouped the iris flowers into three distinct clusters based on their physical measurements. The explained proportion of variance and silhouette score suggest that the clusters are reasonably distinct and well-separated. The cluster plot matrix visually confirms the distinct grouping of the clusters. Overall, the results indicate that the K-Means clustering method can successfully classify iris flowers into their respective species groups based on the provided measurements. References: UCI Machine Learning Repository. (n.d.). https://archive.ics.uci.edu/dataset/53/iris Alexander Swan, Ph.D. (2023, February 9). JASP 0.17 Tutorial: Syntax Mode (R) (Episode 45) [Video]. YouTube. https://www.youtube.com/watch?v=Mwv88u8tULo