The K-mean Clustering Analysis

docx

School

Concordia University Portland *

*We aren’t endorsed by this school

Course

543

Subject

Computer Science

Date

Jun 24, 2024

Type

docx

Pages

Uploaded by chanduRapolu123

Select a dataset from UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets.php I’ve selected this Dataset https://archive.ics.uci.edu/dataset/53/iris Describe the dataset you select. I've chosen this dataset because it's a classic example used in statistical and machine learning studies. The Iris dataset includes three different species of iris flowers, with 50 samples for each species. Each flower has measurements for sepal length, sepal width, petal length, and petal width. I can analyze the relationships between these measurements to understand how they differ between species. I can also predict the species of an iris flower based on its measurements. This dataset is perfect for practicing classification techniques and exploring how different features contribute to identifying the species of iris flowers. Perform a K-Means cluster analysis on the variables of the dataset you select. I’m unable to preform k mean cluster so I’ve executed in R console After performing K-Means clustering on the Iris dataset, I got these values. The dataset includes measurements of sepal length, sepal width, petal length, and petal width for three iris species. The K-Means clustering grouped the flowers into three clusters with sizes 62, 38, and 50. The explained proportion of variance by the clusters is 88.43%, and the average silhouette score is 0.5528, indicating a reasonable clustering quality. The cluster centers are as follows: Cluster 1 has a sepal length of 5.90, sepal width of 2.75, petal length of 4.39, and petal width of 1.43. Cluster 2 has a sepal length of 6.85, sepal width of 3.07, petal length of 5.74, and petal width of 2.07. Cluster 3 has a sepal length of 5.01, sepal width of 3.43, petal length of 1.46, and petal width of 0.25. The performance metrics show a maximum diameter of 2.68, minimum separation of 0.26, and a Dunn Index of 0.099. I think this analysis shows that K-Means clustering can effectively group iris flowers based on their physical measurements, with reasonably distinct clusters.

Set up and run the model in JASP by selecting “Number of Clusters” values Similarly, like above I’ve executed R code and I tried visualizing the data

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Identify and explain the optimal “Number of Clusters” value. Optimal Number of Clusters After performing K-Means clustering on the Iris dataset, the "Elbow Method" was used to identify the optimal number of clusters. The elbow plot, which shows the within-cluster sum of squares (WSS) against the number of clusters, helps in determining this. From the plot, the "elbow" point, where the WSS starts to decrease more slowly, is at three clusters. This indicates that three clusters are the optimal choice for this dataset.

After performing K-Means clustering, I visualized the results using a Cluster Plot Matrix. The Cluster Plot Matrix shows scatter plots for different combinations of features, with points colored according to their cluster assignments. Cluster Plot Matrix: Sepal Length vs. Sepal Width: In the plot of Sepal Length against Sepal Width, the clusters are reasonably well-separated. The green cluster is distinct from the other two, while the black and red clusters show some overlap. Sepal Length vs. Petal Length: The clusters are more clearly separated in this plot. The green cluster is isolated from the black and red clusters, which also show distinct boundaries from each other. Sepal Length vs. Petal Width: This plot also shows good separation. The green cluster is clearly apart from the other two clusters, with the black and red clusters being distinct from each other. Sepal Width vs. Petal Length: This plot demonstrates excellent separation among the clusters. The green, black, and red clusters are all well-separated, with minimal overlap. Sepal Width vs. Petal Width: Similar to the previous plot, the clusters are distinctly separated. The green cluster stands out, and the black and red clusters are also well-separated. Petal Length vs. Petal Width: This plot provides the clearest separation among the clusters. The green, black, and red clusters are distinctly apart from one another, showing that these features are highly effective in distinguishing the clusters. Discuss a basic approach to the problem. Basic Approach I downloaded the Iris dataset from the UCI Machine Learning Repository, which includes measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers. I've removed some null values by executing R commands to ensure the dataset was clean and ready for analysis. After that, I've executed an R program to perform cluster analysis using the K-Means clustering algorithm. I selected the relevant features (sepal length, sepal width, petal length, and petal width) for clustering and used the Elbow Method to determine the optimal number of clusters, which was three. I then evaluated the clustering results using metrics such as within-cluster sum of squares and silhouette scores. To visualize the results, I've created plots using the ggplot2 library in R, which helped to clearly demonstrate the distinct clusters formed by the K-Means algorithm. Review the model results information in the JASP and summarizing the key points, including screenshots. Model Results in JASP The results from the K-Means clustering analysis in JASP are summarized as follows: Cluster Sizes: The three clusters have sizes 62, 38, and 50. Explained Proportion of Variance: The clustering explains 88.43% of the variance in the data. Silhouette Score: The average silhouette score is 0.5528, indicating a reasonable clustering quality. Cluster Centers: Cluster 1: Sepal length 5.90, sepal width 2.75, petal length 4.39, petal width 1.43

Cluster 2: Sepal length 6.85, sepal width 3.07, petal length 5.74, petal width 2.07 Cluster 3: Sepal length 5.01, sepal width 3.43, petal length 1.46, petal width 0.25 Performance Metrics: Maximum Diameter: 2.68 Minimum Separation: 0.26 Dunn Index: 0.099

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Discuss general findings, and overall conclusions based on the results of the analysis. Based on the analysis, K-Means clustering effectively grouped the iris flowers into three distinct clusters based on their physical measurements. The explained proportion of variance and silhouette score suggest that the clusters are reasonably distinct and well-separated. The cluster plot matrix visually confirms the distinct grouping of the clusters. Overall, the results indicate that the K-Means clustering method can successfully classify iris flowers into their respective species groups based on the provided measurements. References: UCI Machine Learning Repository. (n.d.). https://archive.ics.uci.edu/dataset/53/iris Alexander Swan, Ph.D. (2023, February 9). JASP 0.17 Tutorial: Syntax Mode (R) (Episode 45) [Video]. YouTube. https://www.youtube.com/watch?v=Mwv88u8tULo

Related Documents

Chapter 7_.docx

CYB 200 Module Four Activity Template_KOCH.docx

Lab 4 - CIS Control 01 - Vulnerability Scanning and Asset Discovery Tool (Active Scanning).docx

ABA 624 WEEK 1 SAFMEDS.docx

Requirements.docx

ABA 624 WEEK 2 SAFMEDS.docx

BUSINESS 13 - Machine Learning Practice Quiz.docx

Lab-3 forensics.docx

Hack the box.docx

Assignment1 P1, CPSC5207E Virtualization and s24 v7.pdf

CTS1133C-M03Prt01-NETLAB15.8-202210.pdf

W08 Quiz_ Chapter 9 Reading Check_ Child Development.pdf

Recommended textbooks for you

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781285196145
Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Programming Logic & Design Comprehensive