FDA_Quiz18
pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6400
Subject
Information Systems
Date
Dec 6, 2023
Type
Pages
9
Uploaded by DeaconTurkey3670
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
1/9
In [1]:
import
pandas
as
pd
import
numpy
as
np
from
datetime
import
datetime
,
timedelta
from
sklearn.preprocessing
import
StandardScaler
from
sklearn.cluster
import
KMeans
import
matplotlib.pyplot
as
plt
import
seaborn
as
sns
# Set random seed for reproducibility
np
.
random
.
seed
(
0
)
# Generate dates within the last year
def
generate_dates
(
n
):
start_date
=
datetime
.
now
()
-
timedelta
(
days
=
365
)
return
[
start_date
+
timedelta
(
days
=
np
.
random
.
randint
(
0
,
365
))
for
_
in
range
(
n
)]
# Generating dataset
n_customers
=
100
n_transactions
=
1000
customer_ids
=
np
.
random
.
choice
(
range
(
1
,
n_customers
+
1
),
n_transactions
)
dates_of_purchase
=
generate_dates
(
n_transactions
)
total_amounts
=
np
.
random
.
uniform
(
10
,
1000
,
n_transactions
)
df
=
pd
.
DataFrame
({
'Customer_ID'
:
customer_ids
,
'Date_of_Purchase'
:
dates_of_purchase
,
'Total_Amount'
:
total_amounts
})
df
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
2/9
Customer_ID
Date_of_Purchase
Total_Amount
0
45
2023-11-07 16:18:36.651732
669.212684
1
48
2023-10-20 16:18:36.651732
406.870768
2
65
2023-02-11 16:18:36.651732
770.512718
3
68
2022-11-18 16:18:36.651732
532.437578
4
68
2023-08-07 16:18:36.651732
245.147907
...
...
...
...
995
27
2023-02-18 16:18:36.651732
282.256337
996
49
2023-09-10 16:18:36.651732
505.532084
997
72
2023-07-28 16:18:36.651732
242.204403
998
55
2022-11-16 16:18:36.651732
343.777623
999
97
2023-06-25 16:18:36.651732
198.357904
1000 rows × 3 columns
Calculate RFM Metrics
Out[1]:
In [2]:
today
=
datetime
.
now
()
rfm_df
=
df
.
groupby
(
'Customer_ID'
)
.
agg
({
'Date_of_Purchase'
:
lambda
x
: (
today
-
x
.
max
())
.
days
,
# Recency
'Customer_ID'
:
'count'
,
# Frequency
'Total_Amount'
:
'sum'
# Monetary
})
rfm_df
.
columns
=
[
'Recency'
,
'Frequency'
,
'Monetary'
]
rfm_df
.
reset_index
(
inplace
=
True
)
print
(
rfm_df
)
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
3/9
Customer_ID
Recency
Frequency
Monetary
0
1
19
18
10494.480803
1
2
39
7
2350.175988
2
3
14
7
3228.347700
3
4
6
15
9447.744024
4
5
6
10
5771.380852
..
...
...
...
...
95
96
51
5
4054.394477
96
97
81
8
3918.264412
97
98
2
8
3376.680488
98
99
18
9
4854.750414
99
100
13
11
6317.456599
[100 rows x 4 columns]
Standardize the RFM values
Determine the optimal number of clusters
In [3]:
scaler
=
StandardScaler
()
rfm_scaled
=
scaler
.
fit_transform
(
rfm_df
)
In [4]:
def
find_optimal_clusters
(
data
,
max_k
=
10
):
distortions
=
[]
for
i
in
range
(
1
,
max_k
+
1
):
kmeans
=
KMeans
(
n_clusters
=
i
,
random_state
=
0
)
kmeans
.
fit
(
data
)
distortions
.
append
(
kmeans
.
inertia_
)
plt
.
figure
(
figsize
=
(
8
,
6
))
plt
.
plot
(
range
(
1
,
max_k
+
1
),
distortions
,
marker
=
'o'
)
plt
.
title
(
'Elbow Method for Optimal k'
)
plt
.
xlabel
(
'Number of Clusters'
)
plt
.
ylabel
(
'Distortion'
)
plt
.
grid
()
plt
.
show
()
find_optimal_clusters
(
rfm_scaled
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
4/9
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
5/9
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
6/9
Apply a clustering technique (like K-means) to segment the customers based on RFM values.
In [9]:
# Optimal will 3 or 4 looking at the graph above
In [5]:
optimal_clusters
=
3
#as per the elbow point
kmeans
=
KMeans
(
n_clusters
=
optimal_clusters
,
random_state
=
0
)
rfm_df
[
'Cluster'
]
=
kmeans
.
fit_predict
(
rfm_scaled
)
print
(
rfm_df
.
head
())
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
7/9
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init
` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo
ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
warnings.warn(
Customer_ID
Recency
Frequency
Monetary
Cluster
0
1
19
18
10494.480803
1
1
2
39
7
2350.175988
2
2
3
14
7
3228.347700
2
3
4
6
15
9447.744024
1
4
5
6
10
5771.380852
2
Result Interpretation and Visualization
Cluster
Recency
Frequency
Monetary
Customer_Count
0
0
106.470588
8.411765
4216.244917
17
1
1
18.805556
12.916667
6887.448334
36
2
2
28.468085
8.340426
4171.591229
47
In [6]:
segment_analysis
=
rfm_df
.
groupby
(
'Cluster'
)
.
agg
({
'Recency'
:
'mean'
,
'Frequency'
:
'mean'
,
'Monetary'
:
'mean'
,
'Customer_ID'
:
'count'
})
.
reset_index
()
segment_analysis
=
segment_analysis
.
rename
(
columns
=
{
'Customer_ID'
:
'Customer_Count'
})
print
(
segment_analysis
)
In [7]:
cluster_colors
=
{
0
:
'red'
,
1
:
'blue'
,
2
:
'green'
}
plt
.
figure
(
figsize
=
(
10
,
6
))
sns
.
scatterplot
(
x
=
'Recency'
,
y
=
'Frequency'
,
hue
=
'Cluster'
,
data
=
rfm_df
,
palette
=
cluster_colors
)
plt
.
legend
()
plt
.
title
(
'Cluster Segments (RMF Clusters)'
)
plt
.
xlabel
(
'Scaled Recency'
)
plt
.
ylabel
(
'Scaled Frequency'
)
plt
.
show
()
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
8/9
Interpretation
Cluster 0: Relatively high Recency value Relatively lower Frequency value Relatively lower Monetary value (compared to other cluster 1)
Hence, may be less engaged with recent purchases and have a lower overall spending
Cluster 1: Relatively very low Recency value Relatively high Frequency value Relatively high Monetary value (compared to other clusters)
These are recent, frequent, and high-value customers and are likely to be the most valuable segment
11/16/23, 4:20 PM
FDA_Quiz18
localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false
9/9
Cluster 2: Relatively moderate Recency value Relatively lower Frequency value Relatively lower Monetary value (compared to cluster 1)
Can be concluded that customers are somewhat engaged but may not be as active or high-spending as those in Cluster 1
Customers in Cluster 1 are likely to be considered as good customers as they have made recent purchases, are frequent buyers, and
contribute significantly to the overall monetary value. Special attention and tailored marketing strategies can be directed towards
retaining and further engaging these customers.
In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help