Complete Activity 3.01 on page 145 in Data Science for Marketing Analytics.Load the necessary packages: Numpy Pandas Matplot.plyplot Seaborn from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans You will use the dataset, bankloan.csv Download bankloan.csv. Import as Bank (this just makes it easier to use if you keep the dataframe name short). Provide the code and output for the header displaying the first five rows. Document the data in the file in a 100 or less word paragraph. Provide the code and output for the information about the data. Perform standard scaling on the Income and CCAvg columns to create new columns, Income_scaled and CCAvg_scaled. Provide code. Provide code and output for descriptive statistics. Perform k-means clustering, specifying 3 clusters using Income and CCAvg as the features. Specify random_state as 42. Create a new column, Cluster, containing the predicted cluster from the model. Provide the code. NOTE: You may receive a Future Warning. As long as your code works, this is fine. Visualize the clusters by using different markers and colors for the clusters on a scatter plot between Income and CCAvg. Use ‘Blue’ for your color. Provide code and output (NOTE: You should have three charts in your output). Provide the code and output for the average values of Income and CCAvg for the three clusters. Perform a visual comparison of the clusters using the standardized values for Income and CCAvg. Use ‘gray’, and ‘black’ for the colors. Provide code and output. To understand the clusters better using other relevant features, print the average values against the clusters for the Age, Mortgage, Family, CreditCard, Online, and Personal Loan features and check which cluster has the highest propensity for taking a personal loan. Provide code and output. Discuss your findings in the clusters in a summary paragraph and what it means to the bank in 250 words or less.
Complete Activity 3.01 on page 145 in Data Science for Marketing Analytics.Load the necessary packages: Numpy Pandas Matplot.plyplot Seaborn from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans You will use the dataset, bankloan.csv Download bankloan.csv. Import as Bank (this just makes it easier to use if you keep the dataframe name short). Provide the code and output for the header displaying the first five rows. Document the data in the file in a 100 or less word paragraph. Provide the code and output for the information about the data. Perform standard scaling on the Income and CCAvg columns to create new columns, Income_scaled and CCAvg_scaled. Provide code. Provide code and output for descriptive statistics. Perform k-means clustering, specifying 3 clusters using Income and CCAvg as the features. Specify random_state as 42. Create a new column, Cluster, containing the predicted cluster from the model. Provide the code. NOTE: You may receive a Future Warning. As long as your code works, this is fine. Visualize the clusters by using different markers and colors for the clusters on a scatter plot between Income and CCAvg. Use ‘Blue’ for your color. Provide code and output (NOTE: You should have three charts in your output). Provide the code and output for the average values of Income and CCAvg for the three clusters. Perform a visual comparison of the clusters using the standardized values for Income and CCAvg. Use ‘gray’, and ‘black’ for the colors. Provide code and output. To understand the clusters better using other relevant features, print the average values against the clusters for the Age, Mortgage, Family, CreditCard, Online, and Personal Loan features and check which cluster has the highest propensity for taking a personal loan. Provide code and output. Discuss your findings in the clusters in a summary paragraph and what it means to the bank in 250 words or less.
Related questions
Question
Complete Activity 3.01 on page 145 in Data Science for Marketing Analytics.
Load the necessary packages:
- Numpy
- Pandas
- Matplot.plyplot
- Seaborn
from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans
You will use the dataset, bankloan.csv Download bankloan.csv. Import as Bank (this just makes it easier to use if you keep the dataframe name short).
- Provide the code and output for the header displaying the first five rows.
- Document the data in the file in a 100 or less word paragraph.
- Provide the code and output for the information about the data.
- Perform standard scaling on the Income and CCAvg columns to create new columns, Income_scaled and CCAvg_scaled. Provide code.
- Provide code and output for descriptive statistics.
- Perform k-means clustering, specifying 3 clusters using Income and CCAvg as the features. Specify random_state as 42. Create a new column, Cluster, containing the predicted cluster from the model. Provide the code. NOTE: You may receive a Future Warning. As long as your code works, this is fine.
- Visualize the clusters by using different markers and colors for the clusters on a scatter plot between Income and CCAvg. Use ‘Blue’ for your color. Provide code and output (NOTE: You should have three charts in your output).
- Provide the code and output for the average values of Income and CCAvg for the three clusters.
- Perform a visual comparison of the clusters using the standardized values for Income and CCAvg. Use ‘gray’, and ‘black’ for the colors. Provide code and output.
- To understand the clusters better using other relevant features, print the average values against the clusters for the Age, Mortgage, Family, CreditCard, Online, and Personal Loan features and check which cluster has the highest propensity for taking a personal loan. Provide code and output.
- Discuss your findings in the clusters in a summary paragraph and what it means to the bank in 250 words or less.
AI-Generated Solution
AI-generated content may present inaccurate or offensive content that does not represent bartleby’s views.
Unlock instant AI solutions
Tap the button
to generate a solution