Python_K - Jupyter Notebook

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

2510

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

pdf

Pages

7

Uploaded by qwertykeyboard27

Report
11/10/23, 12:48 PM Python_K - Jupyter Notebook localhost:8889/notebooks/Python_K.ipynb 1/7 In [1]: The CLV data provides the average net income per week (dollars) that a sample of high school students made from part-time work during the summer. It also shows how much they spent online for goods and services each week (again in $US). Amazon.com (who was the marketplace for much of this online spending) wants to analyze this data for potential market segmentation strategies. In [2]: In [3]: Out[2]: INCOME SPEND 0 233 150 1 250 187 2 204 172 3 236 178 4 354 163 Out[3]: INCOME SPEND count 303.000000 303.000000 mean 245.273927 149.646865 std 48.499412 22.905161 min 126.000000 71.000000 25% 211.000000 133.500000 50% 240.000000 153.000000 75% 274.000000 166.000000 max 417.000000 202.000000 #Load the required libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns #read the data from the file and display the first five rows dataset = pd.read_csv( 'CLV.csv' ) dataset.head() #descriptive statistics of the dataset dataset.describe()
11/10/23, 12:48 PM Python_K - Jupyter Notebook localhost:8889/notebooks/Python_K.ipynb 2/7 In [4]: In [5]: In [6]: /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/seaborn/ax isgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs) Out[4]: <seaborn.axisgrid.PairGrid at 0x161b44d90> #visualize the raw data sns.pairplot(dataset) #Using the elbow method to find the ideal number of clusters from sklearn.cluster import KMeans #creates empty list called wcv for use below wcv = []
11/10/23, 12:48 PM Python_K - Jupyter Notebook localhost:8889/notebooks/Python_K.ipynb 3/7 In [7]: /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) #Generate list of within cluster variations (wcv) for clusters of size 1 for i in range ( 1 , 11 ): km = KMeans(n_clusters = i) #initialize the model km.fit(dataset) #fitting the model wcv.append(km.inertia_) #put the within cluster variation to the list
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/10/23, 12:48 PM Python_K - Jupyter Notebook localhost:8889/notebooks/Python_K.ipynb 4/7 In [8]: In [9]: In [10]: /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) #plot wcv versus number of clusters #Look for the 'elbow' in the plot for a better idea of how many clusters plt.plot( range ( 1 , 11 ),wcv) plt.title( 'Elbow Method' ) plt.xlabel( 'Number of Clusters' ) plt.ylabel( 'wcv' ) plt.show() #Fitting kmeans to the dataset (for 4 clusters and random state = 0) km4 = KMeans(n_clusters = 4 ,random_state = 0 ) #initialize y_means = km4.fit_predict(dataset) #train #Adds the cluster labels to the original dataset in a column title LABEL dataset[ 'LABEL' ] = y_means
11/10/23, 12:48 PM Python_K - Jupyter Notebook localhost:8889/notebooks/Python_K.ipynb 5/7 In [11]: Analysis and Discussion (this is subjective) The plot shows the distribution of the 4 clusters. We could interpret them as the following customer segments: Cluster 2: Customers with low income and all levels of spending Cluster 3: Customers with a medium income and high spending Cluster 0: Customers with a medium income and low spending Cluster 1: Customers with a high income and primarily medium to high spending Clusters 2 and 1 might be segmented further to arrive at more specific target customer groups In [12]: Out[11]: <Axes: xlabel='INCOME', ylabel='SPEND'> #Generate scatterplot to show data and how it was clustered #Data colored based on LABEL values, color set by palette #legend = full ensures entire legend is displayed sns.scatterplot(x = "INCOME" ,y = "SPEND" ,hue = "LABEL" ,palette = "Set1" ,legend = 'f #Let's look at how the clusters are created when k=6 #Drop column LABEL with 4 cluster classification dataset = dataset.drop(columns = [ 'LABEL' ])
11/10/23, 12:48 PM Python_K - Jupyter Notebook localhost:8889/notebooks/Python_K.ipynb 6/7 In [13]: In [14]: In [15]: Analysis and Discussion (again, this is subjective) The plot shows the distribution of the 6 clusters. We could interpret them as the following customer segments: Cluster 3: Low income, high spending Cluster 1: Low income, low spending Cluster 4: Medium income, high spending Cluster 0: Medium income, low spending Cluster 2: High income, primarily medium to high spending Cluster 5: Very high income, high spending Based on the 6 clusters, we could formulate marketing strategies relevant to each cluster: A typical strategy might focus certain promotional efforts for the high value customers of Clusters 2, 4, and 5. /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) Out[15]: <Axes: xlabel='INCOME', ylabel='SPEND'> #Fitting kmeans to the dataset (for 6 clusters and random state = 0) km6 = KMeans(n_clusters = 6 ,random_state = 0 ) y_means = km6.fit_predict(dataset) #Adds the new cluster labels to the original dataset dataset[ 'LABEL' ] = y_means #Generate scatterplot to show data and how it was clustered sns.scatterplot(x = "INCOME" , y = "SPEND" , hue = "LABEL" ,palette = "Set1" ,legend =
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/10/23, 12:48 PM Python_K - Jupyter Notebook localhost:8889/notebooks/Python_K.ipynb 7/7 Cluster 3 is a unique customer segment, where in spite of their relatively lower income, these customers tend to spend more online, possibly indicating brand loyalty. There could be some discounted pricing based promotional campaigns for this group to retain them and their spending levels. For Cluster 1 where both the income and spending are low, price-sensitive strategies might be introduced to increase the spending by this segment. Customers in Cluster 0 are not spending much in spite of a good income. Further analysis of these segments might lead to insights of the satisfaction / dissatisfaction of these customers. Strategies could then be created accordingly. In [ ]: