FDA_Quiz18

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

6400

Subject

Information Systems

Date

Dec 6, 2023

Type

pdf

Pages

9

Uploaded by DeaconTurkey3670

Report
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 1/9 In [1]: import pandas as pd import numpy as np from datetime import datetime , timedelta from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans import matplotlib.pyplot as plt import seaborn as sns # Set random seed for reproducibility np . random . seed ( 0 ) # Generate dates within the last year def generate_dates ( n ): start_date = datetime . now () - timedelta ( days = 365 ) return [ start_date + timedelta ( days = np . random . randint ( 0 , 365 )) for _ in range ( n )] # Generating dataset n_customers = 100 n_transactions = 1000 customer_ids = np . random . choice ( range ( 1 , n_customers + 1 ), n_transactions ) dates_of_purchase = generate_dates ( n_transactions ) total_amounts = np . random . uniform ( 10 , 1000 , n_transactions ) df = pd . DataFrame ({ 'Customer_ID' : customer_ids , 'Date_of_Purchase' : dates_of_purchase , 'Total_Amount' : total_amounts }) df
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 2/9 Customer_ID Date_of_Purchase Total_Amount 0 45 2023-11-07 16:18:36.651732 669.212684 1 48 2023-10-20 16:18:36.651732 406.870768 2 65 2023-02-11 16:18:36.651732 770.512718 3 68 2022-11-18 16:18:36.651732 532.437578 4 68 2023-08-07 16:18:36.651732 245.147907 ... ... ... ... 995 27 2023-02-18 16:18:36.651732 282.256337 996 49 2023-09-10 16:18:36.651732 505.532084 997 72 2023-07-28 16:18:36.651732 242.204403 998 55 2022-11-16 16:18:36.651732 343.777623 999 97 2023-06-25 16:18:36.651732 198.357904 1000 rows × 3 columns Calculate RFM Metrics Out[1]: In [2]: today = datetime . now () rfm_df = df . groupby ( 'Customer_ID' ) . agg ({ 'Date_of_Purchase' : lambda x : ( today - x . max ()) . days , # Recency 'Customer_ID' : 'count' , # Frequency 'Total_Amount' : 'sum' # Monetary }) rfm_df . columns = [ 'Recency' , 'Frequency' , 'Monetary' ] rfm_df . reset_index ( inplace = True ) print ( rfm_df )
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 3/9 Customer_ID Recency Frequency Monetary 0 1 19 18 10494.480803 1 2 39 7 2350.175988 2 3 14 7 3228.347700 3 4 6 15 9447.744024 4 5 6 10 5771.380852 .. ... ... ... ... 95 96 51 5 4054.394477 96 97 81 8 3918.264412 97 98 2 8 3376.680488 98 99 18 9 4854.750414 99 100 13 11 6317.456599 [100 rows x 4 columns] Standardize the RFM values Determine the optimal number of clusters In [3]: scaler = StandardScaler () rfm_scaled = scaler . fit_transform ( rfm_df ) In [4]: def find_optimal_clusters ( data , max_k = 10 ): distortions = [] for i in range ( 1 , max_k + 1 ): kmeans = KMeans ( n_clusters = i , random_state = 0 ) kmeans . fit ( data ) distortions . append ( kmeans . inertia_ ) plt . figure ( figsize = ( 8 , 6 )) plt . plot ( range ( 1 , max_k + 1 ), distortions , marker = 'o' ) plt . title ( 'Elbow Method for Optimal k' ) plt . xlabel ( 'Number of Clusters' ) plt . ylabel ( 'Distortion' ) plt . grid () plt . show () find_optimal_clusters ( rfm_scaled )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 4/9 C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 5/9 C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn(
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 6/9 Apply a clustering technique (like K-means) to segment the customers based on RFM values. In [9]: # Optimal will 3 or 4 looking at the graph above In [5]: optimal_clusters = 3 #as per the elbow point kmeans = KMeans ( n_clusters = optimal_clusters , random_state = 0 ) rfm_df [ 'Cluster' ] = kmeans . fit_predict ( rfm_scaled ) print ( rfm_df . head ())
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 7/9 C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init ` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\galra\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWarning: KMeans is known to have a memo ry leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environ ment variable OMP_NUM_THREADS=1. warnings.warn( Customer_ID Recency Frequency Monetary Cluster 0 1 19 18 10494.480803 1 1 2 39 7 2350.175988 2 2 3 14 7 3228.347700 2 3 4 6 15 9447.744024 1 4 5 6 10 5771.380852 2 Result Interpretation and Visualization Cluster Recency Frequency Monetary Customer_Count 0 0 106.470588 8.411765 4216.244917 17 1 1 18.805556 12.916667 6887.448334 36 2 2 28.468085 8.340426 4171.591229 47 In [6]: segment_analysis = rfm_df . groupby ( 'Cluster' ) . agg ({ 'Recency' : 'mean' , 'Frequency' : 'mean' , 'Monetary' : 'mean' , 'Customer_ID' : 'count' }) . reset_index () segment_analysis = segment_analysis . rename ( columns = { 'Customer_ID' : 'Customer_Count' }) print ( segment_analysis ) In [7]: cluster_colors = { 0 : 'red' , 1 : 'blue' , 2 : 'green' } plt . figure ( figsize = ( 10 , 6 )) sns . scatterplot ( x = 'Recency' , y = 'Frequency' , hue = 'Cluster' , data = rfm_df , palette = cluster_colors ) plt . legend () plt . title ( 'Cluster Segments (RMF Clusters)' ) plt . xlabel ( 'Scaled Recency' ) plt . ylabel ( 'Scaled Frequency' ) plt . show ()
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 8/9 Interpretation Cluster 0: Relatively high Recency value Relatively lower Frequency value Relatively lower Monetary value (compared to other cluster 1) Hence, may be less engaged with recent purchases and have a lower overall spending Cluster 1: Relatively very low Recency value Relatively high Frequency value Relatively high Monetary value (compared to other clusters) These are recent, frequent, and high-value customers and are likely to be the most valuable segment
11/16/23, 4:20 PM FDA_Quiz18 localhost:8888/nbconvert/html/Downloads/FDA/Quiz/FDA_Quiz18.ipynb?download=false 9/9 Cluster 2: Relatively moderate Recency value Relatively lower Frequency value Relatively lower Monetary value (compared to cluster 1) Can be concluded that customers are somewhat engaged but may not be as active or high-spending as those in Cluster 1 Customers in Cluster 1 are likely to be considered as good customers as they have made recent purchases, are frequent buyers, and contribute significantly to the overall monetary value. Special attention and tailored marketing strategies can be directed towards retaining and further engaging these customers. In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help