m import pickle from pathlib import Path from collections import defaultdict Problem 1 - K Means Clustering A sample dataset has been provided to you in the './data/sample_dataset_kmeans.pickle' path. The centroids are in './data/sample_centroids_kmeans.pickle' and the sample result is in './data/sample_result_kmeans.pickle' path. You can use these to test your code. Here are the attributes for the dataset. Use this dataset to test your functions. Dataset should load the points in the form of a list of lists where each list item represents a point in the space. An example dataset will have the following structure. If there are 3 points in the dataset, this would appear as follows in the list of lists. dataset = [ [5,6], [3,5], [2,8] Note: A sample dataset to test your code has been provided in the location "data/sample_dataset_kmeans.pickle". Please maintain this as it would be necessary while grading. Do not change the variable names of the returned values. After calculating each of those values, assign them to the corresponding value that is being returned. Here is the dataset: [[46, 33], [26, 21], [23, 96], [82, 20], [25, 42], [29, 99], [30, 64], [57, 51], [12, 68], [25, 9]] In [ ]: Here are the centroids: [[12, 68], [46, 33], [25, 42]] Here are the sample results: {'1': {'cluster1': [[23, 96], [29, 99], [30, 64], [12, 68]], 'cluster2': [[46, 33], [82, 20], [57, 51], [25, 9]], 'cluster3': [[26, 21], [25, 42]], 'centroids': [[23.5, 81.75], [52.5, 28.25], [25.5, 31.5]]}, '2': {'cluster1': [[23, 96], [29, 99], [30, 64], [12, 68]], 'cluster2': [[46, 33], [82, 20], [57, 51]], 'cluster3': [[26, 21], [25, 42], [25, 9]], 'centroids': [[23.5, 81.75], [61.666666666666664, 34.666666666666664], [25.333333333333332, 24.0]]}} This is the function I need to complete: def k_means_clustering(centroids, dataset): # Description: Perform k means clustering for 2 iterations given as input the dataset and centroids. # Input: # 1. centroids - A list of lists containing the initial centroids for each cluster. # 2. dataset - A list of lists denoting points in the space. # Output: # 1. results - A dictionary where the key is iteration number and store the cluster assignments in the # appropriate clusters. Also, update the centroids list after each iteration. result = { '1': { 'cluster1': [], 'cluster2': [], 'cluster3': [], 'centroids': []}, '2': { 'cluster1': [], 'cluster2': [], 'cluster3': [], 'centroids': []} } centroid1, centroid2, centroid3 = centroids[0], centroids[1], centroids[2]
import argparse
import pandas as pd
import numpy as np
import random
import pickle
from pathlib import Path
from collections import defaultdict
Problem 1 - K Means Clustering
A sample dataset has been provided to you in the './data/sample_dataset_kmeans.pickle' path. The centroids are in './data/sample_centroids_kmeans.pickle' and the sample result is in './data/sample_result_kmeans.pickle' path. You can use these to test your code.
Here are the attributes for the dataset. Use this dataset to test your functions.
- Dataset should load the points in the form of a list of lists where each list item represents a point in the space.
- An example dataset will have the following structure. If there are 3 points in the dataset, this would appear as follows in the list of lists.
dataset = [ [5,6], [3,5], [2,8]
Note:
- A sample dataset to test your code has been provided in the location "data/sample_dataset_kmeans.pickle". Please maintain this as it would be necessary while grading.
- Do not change the variable names of the returned values.
- After calculating each of those values, assign them to the corresponding value that is being returned.
Here is the dataset:
def k_means_clustering(centroids, dataset):
# Description: Perform k means clustering for 2 iterations given as input the dataset and centroids.
# Input:
# 1. centroids - A list of lists containing the initial centroids for each cluster.
# 2. dataset - A list of lists denoting points in the space.
# Output:
# 1. results - A dictionary where the key is iteration number and store the cluster assignments in the
# appropriate clusters. Also, update the centroids list after each iteration.
result = {
'1': { 'cluster1': [], 'cluster2': [], 'cluster3': [], 'centroids': []},
'2': { 'cluster1': [], 'cluster2': [], 'cluster3': [], 'centroids': []}
}
centroid1, centroid2, centroid3 = centroids[0], centroids[1], centroids[2]
for iteration in range(2):
#your code here
return result
Trending now
This is a popular solution!
Step by step
Solved in 2 steps with 1 images
Unnfortunately I am only allowed to use the following packaes in my answer:
port argparse
import pandas as pd
import numpy as np
import pickle
from pathlib import Path
from collections import defaultdict