The preliminary dataset contains the geographical (2D) locations of habitual pizza eaters. Based on this information identify the clusters of potential customers and their location centers by implementing the k-means clustering algorithm. For that follow these patterns some modules for this code: import numpy as np import matplotlib.pyplot as plt %matplotlib inline 1. Load and visualize the dataset Use the np.loadtxt() function to read the customters.txt file into a NumPy array (use customers as the variable name for the array). This array contains the 2D coordinates of 250 (n) potential customers. Make sure, you understand the shape and meaning of this array. Use the plt.scatter() function to create a 2D plot of the locations of the customers. Note: the plot functions expect the x and y coordinates as separate arrays. E.g. you can select the x coordinates from all customers by using customers[:, 0]. 2.Determine the number of optimal clusters Use your human intuition based on the scatter plot above to decide how many clusters of customers you want to identify. This is the only step which relies on human brainpower (in practice, this can be automated as well, but that is beyond the scope of the assignment). Set variable k to the number of desired clusters. 3.Initialize the cluster centers The algorithm will keep track (and update) the centers of each cluster and will assign each customer to exactly one cluster based on the distances of the customer and centers. For the initial locations of the cluster centers pick k number of random of customer locations and store this array in a new variable, called centroids. Verify, that the shape os this array is (k, 2). 4.Initialize the cluster assignment Compute a new 1-dimensional integer array of n elements, called assigment, which describes the customer to cluster assignment. The assignment is decided on the geographical distances. For each customer store the integer index of the cluster, whose current location center (see centroids) is the closest to the customer. Do not use explicit for loops. Hint: first, compute all the pairwise distances between the customers and all cluster centers using a 3D array and automatic broadcasting. Try to understand the shape and meaning of the following expression: customers - centroids[:, np.newaxis, :]. Use this expression with squaring and the NumPy sum function (on the proper axis) to compute the pairwise distances. Finally, use argmin (on the proper axis) to find the indexes of the closest cluster centers. 5.Verify the cluster assignment Execute the code below to verify that the customers, centroids, and assignment arrays are properly initialized. You should see the k cluster centers in red and the assignments in different colors. Note: this is (obviously) not the final/optimal clustering, yet. This is the code to verify your code---- plt.scatter(customers[:, 0], customers[:, 1], c=assignment) plt.scatter(centroids[:, 0], centroids[:, 1], c="red") plt.title("Initial cluster assignment"); 6.Update the cluster centers---- Compute the updated location of the cluster centers. Based on the cluster assignment, compute the mean location of the customers in each cluster. This is going to be the new/updated location of the cluster center (centroids). Hint: This time you may use a (short) loop on the number of clusters. Try to use boolean indexing (masking) with the assignment to select all the customers belonging to the current cluster. You can also use the NumPy mean() function (with the proper axis) to compute the mean location of the cluster. 7.Verify the updated cluster centers Execute the code below to verify that centroids is properly updated. While this is still not the final clustering, each center should be at the center of its assigned customers. This is the code to verify this plt.scatter(customers[:, 0], customers[:, 1], c=assignment) plt.scatter(centroids[:, 0], centroids[:, 1], c="red") plt.title("Initial cluster centers"); 8.Iterative optimization Based on steps 5 & 7 (you can use copy & paste), implement a loop which iteratively updates the cluster assignment and the cluster centers as long as there is some change in the cluster assignment. 9.Verify the final clusters Execute the code below to see the final clusters. Note: while the results should look reasonable, it is not guaranteed that the algorithm finds the most optimal clusters and centers (becuase of the initial random picks of the centers). Try to run the notebook multiple times (you may want to use np.random.seed() with different parameters at the start) to see if you can get the desired/optimal result. This is the code to verify that plt.scatter(customers[:, 0], customers

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

This is a python problem. Please follow instructions. Hints are also present if you need. Also I am attaching txt file picture.

 

 

The preliminary dataset contains the geographical (2D) locations of habitual pizza eaters. Based on this information identify the clusters of potential customers and their location centers by implementing the k-means clustering algorithm. For that follow these patterns

some modules for this code:

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

1. Load and visualize the dataset

Use the np.loadtxt() function to read the customters.txt file into a NumPy array (use customers as the variable name for the array). This array contains the 2D coordinates of 250 (n) potential customers. Make sure, you understand the shape and meaning of this array. Use the plt.scatter() function to create a 2D plot of the locations of the customers. Note: the plot functions expect the x and y coordinates as separate arrays. E.g. you can select the x coordinates from all customers by using customers[:, 0].

2.Determine the number of optimal clusters

Use your human intuition based on the scatter plot above to decide how many clusters of customers you want to identify. This is the only step which relies on human brainpower (in practice, this can be automated as well, but that is beyond the scope of the assignment). Set variable k to the number of desired clusters.

3.Initialize the cluster centers

The algorithm will keep track (and update) the centers of each cluster and will assign each customer to exactly one cluster based on the distances of the customer and centers. For the initial locations of the cluster centers pick k number of random of customer locations and store this array in a new variable, called centroids. Verify, that the shape os this array is (k, 2).

4.Initialize the cluster assignment

Compute a new 1-dimensional integer array of n elements, called assigment, which describes the customer to cluster assignment. The assignment is decided on the geographical distances. For each customer store the integer index of the cluster, whose current location center (see centroids) is the closest to the customer. Do not use explicit for loops.

Hint: first, compute all the pairwise distances between the customers and all cluster centers using a 3D array and automatic broadcasting. Try to understand the shape and meaning of the following expression: customers - centroids[:, np.newaxis, :]. Use this expression with squaring and the NumPy sum function (on the proper axis) to compute the pairwise distances. Finally, use argmin (on the proper axis) to find the indexes of the closest cluster centers.

5.Verify the cluster assignment

Execute the code below to verify that the customers, centroids, and assignment arrays are properly initialized. You should see the k cluster centers in red and the assignments in different colors. Note: this is (obviously) not the final/optimal clustering, yet.

This is the code to verify your code----

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster assignment");

6.Update the cluster centers----

Compute the updated location of the cluster centers. Based on the cluster assignment, compute the mean location of the customers in each cluster. This is going to be the new/updated location of the cluster center (centroids).

Hint: This time you may use a (short) loop on the number of clusters. Try to use boolean indexing (masking) with the assignment to select all the customers belonging to the current cluster. You can also use the NumPy mean() function (with the proper axis) to compute the mean location of the cluster.

7.Verify the updated cluster centers

Execute the code below to verify that centroids is properly updated. While this is still not the final clustering, each center should be at the center of its assigned customers.

This is the code to verify this

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster centers");

8.Iterative optimization

Based on steps 5 & 7 (you can use copy & paste), implement a loop which iteratively updates the cluster assignment and the cluster centers as long as there is some change in the cluster assignment.

9.Verify the final clusters

Execute the code below to see the final clusters. Note: while the results should look reasonable, it is not guaranteed that the algorithm finds the most optimal clusters and centers (becuase of the initial random picks of the centers). Try to run the notebook multiple times (you may want to use np.random.seed() with different parameters at the start) to see if you can get the desired/optimal result.

This is the code to verify that

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Final clusters");

File
Edit View
-3.633271721169406110e+00 -2.749418531032250979e+00
-2.976795656232069209e+00 -3.130130619095286004e+00
-4.046241035806734665e+00 -2.413595797540917687e+00
-3.183621547459694057e+00 -2.977639683920154212e+00
-3.904108587053154888e+00 -2.951091432822544380e+00
-4.071339081664882897e+00 -2.115075173563672806e+00
-3.447559215925898091e+00 -2.980188866448493457e+00
-4.287105999033476778e+00 -2.585069959723080846e+00
-4.814356372871987588e+00 -3.429133604769490695e+00
-4.608749571748512963e+00 -2.559789061555480583e+00
-3.284028185085224205e+00 -2.575397699107908611e+00
-4.107026100720188033e+00 -2.312489516765867670e+00
-3.745452631681100986e+00 -2.272984902724009437e+00
-4.498589231871656047e+00 -1.940104407638526318e+00
-2.904279496557304885e+00 -2.600874593005827240e+00
-4.084467024266913882e+00 -2.862792504061023369e+00
-3.271673844947790677e+00 -3.413581559927078679e+00
-3.931534472019772686e+00 -3.212371884819077206e+00
-3.466785710627703132e+00 -3.138573013614168961e+00
-3.612295593413866079e+00 -3.023156182750712961e+00
-3.727258144722126687e+00 -2.848107858858544450e+00
-3.953124565531799917e+00 -3.224786800329186853e+00
-4.705704394830918957e+00 -3.262516703252164696e+00
-3.460736643741590512e+00 -1.986497950173193416e+00
-4.009482121549115874e+00 -3.203895027118378813e+00
-3.920261751360571534e+00 -2.531711316055035965e+00
-3.629803635077848867e+00 -2.968804447175029892e+00
-3.891567316336832505e+00 -2.812282510113611522e+00
-3.748114007276682536e+00 -2.379283960750413485e+00
-3.948081672063044056e+00 -3.535852976432075412e+00
-3.363040524165844758e+00 -3.056029241842494582e+00
-3.138654804442309043e+00 -2.827927829381158720e+00
-3.513565383118311125e+00 -2.26225685766863
Transcribed Image Text:File Edit View -3.633271721169406110e+00 -2.749418531032250979e+00 -2.976795656232069209e+00 -3.130130619095286004e+00 -4.046241035806734665e+00 -2.413595797540917687e+00 -3.183621547459694057e+00 -2.977639683920154212e+00 -3.904108587053154888e+00 -2.951091432822544380e+00 -4.071339081664882897e+00 -2.115075173563672806e+00 -3.447559215925898091e+00 -2.980188866448493457e+00 -4.287105999033476778e+00 -2.585069959723080846e+00 -4.814356372871987588e+00 -3.429133604769490695e+00 -4.608749571748512963e+00 -2.559789061555480583e+00 -3.284028185085224205e+00 -2.575397699107908611e+00 -4.107026100720188033e+00 -2.312489516765867670e+00 -3.745452631681100986e+00 -2.272984902724009437e+00 -4.498589231871656047e+00 -1.940104407638526318e+00 -2.904279496557304885e+00 -2.600874593005827240e+00 -4.084467024266913882e+00 -2.862792504061023369e+00 -3.271673844947790677e+00 -3.413581559927078679e+00 -3.931534472019772686e+00 -3.212371884819077206e+00 -3.466785710627703132e+00 -3.138573013614168961e+00 -3.612295593413866079e+00 -3.023156182750712961e+00 -3.727258144722126687e+00 -2.848107858858544450e+00 -3.953124565531799917e+00 -3.224786800329186853e+00 -4.705704394830918957e+00 -3.262516703252164696e+00 -3.460736643741590512e+00 -1.986497950173193416e+00 -4.009482121549115874e+00 -3.203895027118378813e+00 -3.920261751360571534e+00 -2.531711316055035965e+00 -3.629803635077848867e+00 -2.968804447175029892e+00 -3.891567316336832505e+00 -2.812282510113611522e+00 -3.748114007276682536e+00 -2.379283960750413485e+00 -3.948081672063044056e+00 -3.535852976432075412e+00 -3.363040524165844758e+00 -3.056029241842494582e+00 -3.138654804442309043e+00 -2.827927829381158720e+00 -3.513565383118311125e+00 -2.26225685766863
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps with 1 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY