Python_H - Jupyter Notebook

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

2510

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

pdf

Pages

5

Uploaded by qwertykeyboard27

Report
11/10/23, 12:48 PM Python_H - Jupyter Notebook localhost:8889/notebooks/Python_H.ipynb 1/5 In [1]: The CLV data provides the average net income per week (dollars) that a sample of high school students made from part-time work during the summer. It also shows how much they spent online for goods and services each week (again in $US). Amazon.com (who was the marketplace for much of this online spending) wants to analyze this data for potential market segmentation strategies. In [2]: In [3]: Out[2]: INCOME SPEND 0 233 150 1 250 187 2 204 172 3 236 178 4 354 163 Out[3]: INCOME SPEND count 303.000000 303.000000 mean 245.273927 149.646865 std 48.499412 22.905161 min 126.000000 71.000000 25% 211.000000 133.500000 50% 240.000000 153.000000 75% 274.000000 166.000000 max 417.000000 202.000000 #Load the required libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns #read the data from the file and display the first five rows dataset = pd.read_csv( 'CLV.csv' ) dataset.head() #descriptive statistics of the dataset dataset.describe()
11/10/23, 12:48 PM Python_H - Jupyter Notebook localhost:8889/notebooks/Python_H.ipynb 2/5 In [4]: /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/seaborn/ax isgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs) Out[4]: <seaborn.axisgrid.PairGrid at 0x15a61e7d0> #visualize the raw data sns.pairplot(dataset)
11/10/23, 12:48 PM Python_H - Jupyter Notebook localhost:8889/notebooks/Python_H.ipynb 3/5 In [5]: In [6]: In [7]: /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_agglomerative.py:1005: FutureWarning: Attribute `affinity` was d eprecated in version 1.2 and will be removed in 1.4. Use `metric` inste ad warnings.warn( #Create a dendrogram of the data using ward method #Can use horizontal line with dendogram to help choose number of clusters import scipy.cluster.hierarchy as sch dend = sch.dendrogram(sch.linkage(dataset,method = "ward" )) plt.title( "Dendrogram" ) plt.xlabel( 'Customer' ) plt.ylabel( 'Euclidean Distance' ) plt.show() #Try first with 4 clusters #Fitting Hierarchical Clustering to the dataset #Number of clusters = 4 and uses Euclidean distances from sklearn.cluster import AgglomerativeClustering hc = AgglomerativeClustering(n_clusters = 4 ,affinity = 'euclidean' ) #initiali y_hc = hc.fit_predict(dataset) #fits the data #Adds the clusters to the original dataset in a column title Label dataset[ 'Label' ] = y_hc
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/10/23, 12:48 PM Python_H - Jupyter Notebook localhost:8889/notebooks/Python_H.ipynb 4/5 In [8]: In [9]: In [10]: Out[8]: <Axes: xlabel='INCOME', ylabel='SPEND'> /Users/keneshakhurana/anaconda3/lib/python3.11/site-packages/sklearn/cl uster/_agglomerative.py:1005: FutureWarning: Attribute `affinity` was d eprecated in version 1.2 and will be removed in 1.4. Use `metric` inste ad warnings.warn( #Generate scatterplot to show data and how it was clustered #Data colored based on LABEL values, color set by palette #legend = full ensures entire legend is displayed sns.scatterplot(x = 'INCOME' ,y = 'SPEND' ,hue = 'Label' ,palette = 'Set1' ,legend = 'f #Try next with 7 clusters (replace n_clusters=4 with n_clusters=7) dataset1 = pd.read_csv( 'CLV.csv' ) from sklearn.cluster import AgglomerativeClustering hc1 = AgglomerativeClustering(n_clusters = 7 ,affinity = 'euclidean' ) #initial y_hc1 = hc1.fit_predict(dataset1) #fits the data #Adds the clusters to the original dataset in a column title LABEL dataset1[ 'Label2' ] = y_hc1
11/10/23, 12:48 PM Python_H - Jupyter Notebook localhost:8889/notebooks/Python_H.ipynb 5/5 In [11]: In [ ]: Out[11]: <Axes: xlabel='INCOME', ylabel='SPEND'> #Generate scatterplot to show data and how it was clustered #Data colored based on LABEL values, color set by palette #legend = full ensures entire legend is displayed sns.scatterplot(x = 'INCOME' ,y = 'SPEND' ,hue = 'Label2' ,palette = 'Set1' ,legend =