Python_K - Jupyter Notebook
pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
2510
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
Pages
7
Uploaded by qwertykeyboard27
11/10/23, 12:48 PM
Python_K - Jupyter Notebook
localhost:8889/notebooks/Python_K.ipynb
1/7
In [1]:
The CLV data provides the average net income per week (dollars) that a sample of high school
students made from part-time work during the summer. It also shows how much they spent
online for goods and services each week (again in $US). Amazon.com (who was the
marketplace for much of this online spending) wants to analyze this data for potential market
segmentation strategies.
In [2]:
In [3]:
Out[2]:
INCOME
SPEND
0
233
150
1
250
187
2
204
172
3
236
178
4
354
163
Out[3]:
INCOME
SPEND
count
303.000000
303.000000
mean
245.273927
149.646865
std
48.499412
22.905161
min
126.000000
71.000000
25%
211.000000
133.500000
50%
240.000000
153.000000
75%
274.000000
166.000000
max
417.000000
202.000000
#Load the required libraries
import
pandas
as
pd
import
matplotlib.pyplot
as
plt
import
seaborn
as
sns
#read the data from the file and display the first five rows
dataset
=
pd.read_csv(
'CLV.csv'
)
dataset.head()
#descriptive statistics of the dataset
dataset.describe()
11/10/23, 12:48 PM
Python_K - Jupyter Notebook
localhost:8889/notebooks/Python_K.ipynb
2/7
In [4]:
In [5]:
In [6]:
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/seaborn/ax
isgrid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)
Out[4]:
<seaborn.axisgrid.PairGrid at 0x161b44d90>
#visualize the raw data
sns.pairplot(dataset)
#Using the elbow method to find the ideal number of clusters
from
sklearn.cluster
import
KMeans
#creates empty list called wcv for use below
wcv
=
[]
11/10/23, 12:48 PM
Python_K - Jupyter Notebook
localhost:8889/notebooks/Python_K.ipynb
3/7
In [7]:
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
#Generate list of within cluster variations (wcv) for clusters of size 1
for
i
in
range
(
1
,
11
):
km
=
KMeans(n_clusters
=
i)
#initialize the model
km.fit(dataset)
#fitting the model
wcv.append(km.inertia_)
#put the within cluster variation to the list
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 12:48 PM
Python_K - Jupyter Notebook
localhost:8889/notebooks/Python_K.ipynb
4/7
In [8]:
In [9]:
In [10]:
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
#plot wcv versus number of clusters
#Look for the 'elbow' in the plot for a better idea of how many clusters
plt.plot(
range
(
1
,
11
),wcv)
plt.title(
'Elbow Method'
)
plt.xlabel(
'Number of Clusters'
)
plt.ylabel(
'wcv'
)
plt.show()
#Fitting kmeans to the dataset (for 4 clusters and random state = 0)
km4
=
KMeans(n_clusters
=
4
,random_state
=
0
)
#initialize
y_means
=
km4.fit_predict(dataset)
#train
#Adds the cluster labels to the original dataset in a column title LABEL
dataset[
'LABEL'
]
=
y_means
11/10/23, 12:48 PM
Python_K - Jupyter Notebook
localhost:8889/notebooks/Python_K.ipynb
5/7
In [11]:
Analysis and Discussion (this is subjective) The plot shows the distribution of the 4 clusters. We
could interpret them as the following customer segments:
Cluster 2: Customers with low income and all levels of spending
Cluster 3: Customers with a medium income and high spending Cluster 0: Customers with a
medium income and low spending Cluster 1: Customers with a high income and primarily
medium to high spending
Clusters 2 and 1 might be segmented further to arrive at more specific target customer groups
In [12]:
Out[11]:
<Axes: xlabel='INCOME', ylabel='SPEND'>
#Generate scatterplot to show data and how it was clustered
#Data colored based on LABEL values, color set by palette
#legend = full ensures entire legend is displayed
sns.scatterplot(x
=
"INCOME"
,y
=
"SPEND"
,hue
=
"LABEL"
,palette
=
"Set1"
,legend
=
'f
#Let's look at how the clusters are created when k=6
#Drop column LABEL with 4 cluster classification
dataset
=
dataset.drop(columns
=
[
'LABEL'
])
11/10/23, 12:48 PM
Python_K - Jupyter Notebook
localhost:8889/notebooks/Python_K.ipynb
6/7
In [13]:
In [14]:
In [15]:
Analysis and Discussion (again, this is subjective) The plot shows the distribution of the 6
clusters. We could interpret them as the following customer segments:
Cluster 3: Low income, high spending Cluster 1: Low income, low spending Cluster 4: Medium
income, high spending Cluster 0: Medium income, low spending Cluster 2: High income,
primarily medium to high spending Cluster 5: Very high income, high spending
Based on the 6 clusters, we could formulate marketing strategies relevant to each cluster:
A typical strategy might focus certain promotional efforts for the high value customers of
Clusters 2, 4, and 5.
/Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl
uster/_kmeans.py:1412: FutureWarning: The default value of `n_init` wil
l change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
Out[15]:
<Axes: xlabel='INCOME', ylabel='SPEND'>
#Fitting kmeans to the dataset (for 6 clusters and random state = 0)
km6
=
KMeans(n_clusters
=
6
,random_state
=
0
)
y_means
=
km6.fit_predict(dataset)
#Adds the new cluster labels to the original dataset
dataset[
'LABEL'
]
=
y_means
#Generate scatterplot to show data and how it was clustered
sns.scatterplot(x
=
"INCOME"
, y
=
"SPEND"
, hue
=
"LABEL"
,palette
=
"Set1"
,legend
=
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 12:48 PM
Python_K - Jupyter Notebook
localhost:8889/notebooks/Python_K.ipynb
7/7
Cluster 3 is a unique customer segment, where in spite of their relatively lower income, these
customers tend to spend more online, possibly indicating brand loyalty. There could be some
discounted pricing based promotional campaigns for this group to retain them and their
spending levels.
For Cluster 1 where both the income and spending are low, price-sensitive strategies might be
introduced to increase the spending by this segment.
Customers in Cluster 0 are not spending much in spite of a good income. Further analysis of
these segments might lead to insights of the satisfaction / dissatisfaction of these customers.
Strategies could then be created accordingly.
In [ ]: