Homework 3C

pdf

School

University of Arkansas *

*We aren’t endorsed by this school

Course

4143

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

3

Uploaded by SuperBatMaster87

Report
1 Homework 3 Problem 3 Consider the following dataset shown in Table 1. Samples Feature 1 Feature 2 1 0 0 2 0 1 3 1 0 4 1 1 Table 1: Simple Dataset for Problem 3 Answer the following questions: 1. Suppose that we assign Sample 1 and 2 to the first cluster (Cluster 1) and assign Samples 3 and 4 to the second cluster (Cluster 2), i.e., C 1 = {1 , 2} , C 2 = {3 , 4}, and Suppose that: we assign Sample 1, 2, and 3 to Cluster 1 and Sample 4 to Cluster 2, and we use squared Euclidean distance to calculate how “dissimilar” the two points are. Using the definition of S (C) introduced in 3-clustering.pdf to compute S (C 1 ) + S (C 2 ) and show your work. we use Euclidean distance to calculate how “dissimilar” the two points are. Using the definition of S (C) introduced in 3-clustering.pdf to compute S (C 1 ) + S (C 2 ) and show your work.
2 Problem 4 1. If the goal is to assign all four data samples into 2 clusters, list all candidate clustering (i.e., all possible ways to assign the four samples into 2 clusters). For example, C 1 = {1 , 2} , C 2 = {3 , 4} could be one candidate clustering; C 1 = {1} , C 2 = {2 , 3 , 4} could be another, and what else? (please show your work) Options: 1. Option 1: C 1 = {1} , C 2 = {2 , 3 , 4} 2. Option 2: C 1 = {1,2} , C 2 = {3 , 4} 3. Option 3: C 1 = {1,2,3} , C 2 = {4} 4. Option 4: C 1 = {1,3} , C 2 = {2 , 4} 5. Option 5: C 1 = {1,2,4}, C 2 = {3} 6. Option 6: C 1 = {1,4}, C 2 = {2,3} 7. Option 7: C 1 = {2}, C 2 = {1,3,4} 2. Compute the total “dissimilarity” scores, i.e., S (C 1 ) + S (C 2 ), of all the candidate clusterings above (using squared Euclidean distance ). What is the best way to divide the four data points into 2 cluster? Options: 1. Option 1: C 1 = {1} , C 2 = {2 , 3 , 4} = S (C 1 ) + S (C 2 ) = 0 + 4/3 = 4/3 2. Option 2: C 1 = {1,2} , C 2 = {3 , 4} = S (C 1 ) + S (C 2 ) = 0.5 + 0.5 = 1 3. Option 3: C 1 = {1,2,3} , C 2 = {4} = S (C 1 ) + S (C 2 ) = 4/3 + 0 = 4/3 4. Option 4: C 1 = {1,3} , C 2 = {2 , 4} = S (C 1 ) + S (C 2 ) = 0.5 + 0.5 = 1 5. Option 5: C 1 = {1,2,4}, C 2 = {3} = S (C 1 ) + S (C 2 ) = 4/3 + 0 = 4/3 6. Option 6: C 1 = {1,4}, C 2 = {2,3} = S (C 1 ) + S (C 2 ) = 1 + 1 = 2 7. Option 7: C 1 = {2}, C 2 = {1,3,4} = S (C 1 ) + S (C 2 ) = 0 + 4/3 = 4/3 The best ways to divide the four data points into 2 clusters are option 2 and 4 because dissimilarity scores are closer to 0 than the other options.
3 Problem 5 Download USArrests.csv from the Data folder on Blackboard Learn, and complete the following tasks: 1. How many rows and columns does this data set have? 2. Use only two features, “Murder”, and “Assault”, perform a K-means clustering analysis using Python. Here, we let K = 2, and set n init = 200 . Plot the two clusters generated by Python (differentiate two clusters using different colors) and insert the figure below.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help