Ebooks - Cengage eReader 18
pdf
keyboard_arrow_up
School
Conestoga College *
*We aren’t endorsed by this school
Course
INFO8086
Subject
Information Systems
Date
Apr 3, 2024
Type
Pages
2
Uploaded by ElderButterflyMaster1065
Book Title: eTextbook: Business Analytics
Chapter 5. Descriptive Data Mining
Case Problem 2. Know Thy Customer
Case Problem 2. Know Thy Customer
Know Thy Customer (KTC) is a financial consulting company that
provides personalized financial advice to its clients. As a basis for
developing this tailored advising, KTC would like to segment its
customers into several representative groups based on key
characteristics. Peyton Blake, the director of KTC’s fledging analytics
division, plans to establish the set of representative customer profiles
based on 600 customer records in the file KnowThyCustomer
. Each
customer record contains data on age, gender, annual income,
marital status, number of children, whether the customer has a car
loan, and whether the customer has a home mortgage. KTC’s market
research staff has determined that these seven characteristics should
form the basis of the customer clustering.
Peyton has invited a summer intern, Danny Riles, into her office so
they can discuss how to proceed. As they review the data on the
computer screen, Peyton’s brow furrows as she realizes that this task
may not be trivial. The data contains both categorical variables
(Female, Married, Car, and Mortgage) and numerical variables (Age,
Income, and Children).
1. Using Manhattan distance to compute dissimilarity between
observations, apply hierarchical clustering on all seven
variables, experimenting with using complete linkage and
group average linkage. Normalize the values of the input
variables. Recommend a set of customer profiles (clusters).
Describe these clusters according to their “average”
252
characteristics. Why might hierarchical clustering not be a good
method to use for these seven variables?
2. Apply a two-step approach:
a. Using matching distance to compute dissimilarity
between observations, employ hierarchical clustering
with group average linkage to produce four clusters using
the variables Female, Married, Loan, and Mortgage.
b. Based on the clusters from part (a), split the original 600
observations into four separate data sets as suggested by
the four clusters from part (a). For each of these four data
sets, apply k
-means clustering with using Age,
Income, and Children as variables. Normalize the values
of the input variables. This will generate a total of eight
clusters. Describe these eight clusters according to their
“average” characteristics. What benefit does this two-step
clustering approach have over just using hierarchical
clustering on all seven variables as in part (1) or just
using k
-means clustering on all seven variables? What
weakness does it have?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help