Week 4 - Assignment
docx
keyboard_arrow_up
School
University Of Arizona *
*We aren’t endorsed by this school
Course
625
Subject
Information Systems
Date
Dec 6, 2023
Type
docx
Pages
7
Uploaded by DeanEnergyManatee35
1
The Data Mining Best Practices Paper
Valerie Barrett
The University of Arizona Global Campus
BUS625: Data & Decision Analytics
Donald Platine
January 23, 2023
2
Data mining is a complex process that no individual can carry on their own, and it is a team
effort.
Cr
oss
I
ndustry
S
tandard
P
rocess for
D
ata
M
ining, otherwise known as CRISP-DM,
contains six sequential phases (Hotz, 2023):
1.
Business understanding
2.
Data understanding
3.
Data preparation
4.
Modeling
5.
Evaluation
6.
Deployment
These six phases have become the standard and most common methodology for data mining.
Business understanding is similar to building a house's foundation, which is the most
essential. This is the start of a deep understanding of the customers' needs, focusing on
understanding objectives and requirements. In business understanding, business objectives are
determined, then the situation is assessed, and data mining goals are determined to produce a
project plan. Sharpe (2019) addresses it best, overemphasizing the importance of having a
consensus on an exact, correctly formulated problem that is addressed before continuing with
any data mining efforts.
Next is
data understanding
, which only adds to the foundation of business understanding.
It drives the attention to detect, gather, and evaluate data sets contributing to achieving project
goals. To simplify, this phase is where initial data is collected, data described, explored then
verified. Most of the collected data will be transactional, the transaction between the company
and the consumer (Sharpe, 2019, pg. 752, para. 3).
3
Data preparation
is the stage that prepares the final data for modeling. What takes place
here is the process of selecting which data sets will be used, cleaning data, which is the most
time-consuming task, where data is
cleaned
of errors or implausible values. Finally, constructing,
integrating, and formatting data where new attributes are obtained, new data sets are created by
combining multiple sources and then reformatted.
Once all the data has been prepared, the
modeling
phase begins. This phase tends to be
the shortest phase during the process, where selecting modeling techniques, generating test
designs, building models, and assessing models are the focus of this phase. Just like the saying
goes, practice makes perfect. The team should continue practicing these models until one is good
enough to move forward with.
Evaluation
comes once the collection of variables has been determined. This phase
focuses on what models best meet the business's needs and what to do next. In this phase, results
are evaluated, processes are reviewed, and what steps are next are to be determined during this
phase.
Lastly, the
deployment
phase of CRISP-DM is only sometimes used. This is the phase
that is the most complex and contains four tasks that are involved during this phase. Plan
deployment, monitoring and maintenance, final report production, and project reviews. The
deployment phase does not always mean the end of the model, so ensure that once this model
goes onto production or the next level, maintain the CRISP-DM model throughout production.
Continuous monitoring and intermittent tuning are often mandatory to ensure everything stays on
task.
To recap, data mining is discovering patterns in large data sets for businesses, and
predictive analysis using knowledge extraction from correlations or patterns is used to increase
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
sales, decrease costs, and improve customer loyalty. Three significant pitfalls associated with
data mining are the complexity of data, privacy and security, and performance (DataUntold,
2022).
Complex data can take up to ample time and can be extremely expensive. This type of
process does require some training in order to use due to the complicated process. Small
businesses tend to steer away from this type of technology because it is complicated to find
sufficient data that is not already private or exclusive. Secondly, there are some rising privacy
concerns regarding data mining. For individuals, organizations, and the government, decision-
making strategies require some security through sharing data collection. For many of these large
sets of data, depending on the industry in which data is collected, customer profiles contain
private and sensitive information which is collected for understanding behavior patterns. We face
illegal access and the confidential nature of the information here. This subject needs to be taken
extra caution to ensure all the rules and regulations are followed. The last pitfall that is
concerning is performance. Since data mining performance depends on techniques and
algorithms, these issues directly affect data mining performance. So massive data sets can lead to
difficulties in the distribution of the information, ultimately leading back to the complexity of
data.
Some practices that follow hand in hand with pitfalls are skipping data quality checks and
putting a model into production without adequate testing. When failing to detect and correct data
quality problems, worthless results are because of the lack of accurate and timely information,
leading to the lack of management and accountability. When dealing with complex data sets and
large amounts of information, it is essential to be held accountable when reviewing the data and
ensuring things are done thoroughly, which goes hand in hand with putting a model into
5
production without adequate testing. If not executed to their fullest potential, these practices
could ultimately alter the testing results, jeopardizing the entire model. When collecting data of
this stature, it is vital to make sure accountability and thoroughness are applied during these
processes. Going through all the hard stuff to produce these models, just to become lackluster
and screw up the end, makes all the hard work executed prior to the results worthless.
Mcdonald's is an excellent example of a company that has successfully practiced data
mining. With over 34,000 restaurants and 8 million people served, they utilize JMP data mining
software to track the large customer inflow and analyze their preferences (Mukherjee, 2020). Just
as data mining is designed to do, it analyzes an extensive data set, asses consumer knowledge to
boost their global businesses, and also allows for exemplary customer service. In this example,
data mining helps McDonald's analyze factors such as wait times, menu information, consumer
patterns while ordering, etc. They collect their data from drive-thru experiences, mobile app
ordering, and in-store digital menus. All the data collected for each consumer is analyzed and can
optimize the entire customer experience.
Facebook! Facebook is a company that could have done better at data mining. For
example, since they have so many blunders, Facebook Beacon. As previously bought up in
discussion posts, Beacon was launched as part of Facebook's advertising system in 2007 (Parker,
2019). This is where the unsolicited off-site activities found their way into Facebook feed and
ads. Many users of Facebook were unaware that activities on third-party sites were being shared
with Facebook, and there was no way around it or even a chance to opt out of it. In 2009, two
years after the launch, Beacon was indeed shut. As many of us Facebook users have seen
recently, the lesson was not learned. Some third-party sites still share our information with
Facebook, resulting in even more unwanted solicited ads. Honestly, Mark Zuckerberg and
6
Facebook was a stolen idea, and all Zuckerberg saw were dollar signs. Still, to this day, he does
not care about the users, but honestly, why would he? As much as we all complain about
unsolicited and unwanted ads, knowing that information, users still use Facebook without
checking up since it has become a way of life. There should have been an opt-out security button
that allowed users to decide if they wanted their activities outside of Facebook to be tracked and
that data collected. There should be an opt-out option as soon as the login screen appears.
Facebook may be gaining valuable information. However, the concept of Facebook is to buy
something other than that zero-point turn lawnmower but to keep up with family and friends,
share moments of life with others, and so forth.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7
References
DataUntold. (2022, Feb. 7).
Top 13 data mining challenges and pitfalls.
https://datauntold.com/data-mining-challenges/
Hotz, N. (2023, Jan. 19).
What is CRISP DM?
Data Science Process Alliance.
https://www.datascience-pm.com/crisp-dm-2/
Parker, T. (2019, Jan. 31).
Facebook’s biggest data mining fails.
Reclaim the Net.
https://reclaimthenet.org/facebook-data-mining-fails/
Sharpe, N. D., De Veaux, R. D., & Velleman, P. F. (2019).
Business statistics
(4th ed.)
.