Week 4 - Assignment

docx

School

University Of Arizona *

*We aren’t endorsed by this school

Course

625

Subject

Information Systems

Date

Dec 6, 2023

Type

docx

Pages

Uploaded by DeanEnergyManatee35

1 The Data Mining Best Practices Paper Valerie Barrett The University of Arizona Global Campus BUS625: Data & Decision Analytics Donald Platine January 23, 2023

2 Data mining is a complex process that no individual can carry on their own, and it is a team effort. Cr oss I ndustry S tandard P rocess for D ata M ining, otherwise known as CRISP-DM, contains six sequential phases (Hotz, 2023): 1. Business understanding 2. Data understanding 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment These six phases have become the standard and most common methodology for data mining. Business understanding is similar to building a house's foundation, which is the most essential. This is the start of a deep understanding of the customers' needs, focusing on understanding objectives and requirements. In business understanding, business objectives are determined, then the situation is assessed, and data mining goals are determined to produce a project plan. Sharpe (2019) addresses it best, overemphasizing the importance of having a consensus on an exact, correctly formulated problem that is addressed before continuing with any data mining efforts. Next is data understanding , which only adds to the foundation of business understanding. It drives the attention to detect, gather, and evaluate data sets contributing to achieving project goals. To simplify, this phase is where initial data is collected, data described, explored then verified. Most of the collected data will be transactional, the transaction between the company and the consumer (Sharpe, 2019, pg. 752, para. 3).

3 Data preparation is the stage that prepares the final data for modeling. What takes place here is the process of selecting which data sets will be used, cleaning data, which is the most time-consuming task, where data is cleaned of errors or implausible values. Finally, constructing, integrating, and formatting data where new attributes are obtained, new data sets are created by combining multiple sources and then reformatted. Once all the data has been prepared, the modeling phase begins. This phase tends to be the shortest phase during the process, where selecting modeling techniques, generating test designs, building models, and assessing models are the focus of this phase. Just like the saying goes, practice makes perfect. The team should continue practicing these models until one is good enough to move forward with. Evaluation comes once the collection of variables has been determined. This phase focuses on what models best meet the business's needs and what to do next. In this phase, results are evaluated, processes are reviewed, and what steps are next are to be determined during this phase. Lastly, the deployment phase of CRISP-DM is only sometimes used. This is the phase that is the most complex and contains four tasks that are involved during this phase. Plan deployment, monitoring and maintenance, final report production, and project reviews. The deployment phase does not always mean the end of the model, so ensure that once this model goes onto production or the next level, maintain the CRISP-DM model throughout production. Continuous monitoring and intermittent tuning are often mandatory to ensure everything stays on task. To recap, data mining is discovering patterns in large data sets for businesses, and predictive analysis using knowledge extraction from correlations or patterns is used to increase

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4 sales, decrease costs, and improve customer loyalty. Three significant pitfalls associated with data mining are the complexity of data, privacy and security, and performance (DataUntold, 2022). Complex data can take up to ample time and can be extremely expensive. This type of process does require some training in order to use due to the complicated process. Small businesses tend to steer away from this type of technology because it is complicated to find sufficient data that is not already private or exclusive. Secondly, there are some rising privacy concerns regarding data mining. For individuals, organizations, and the government, decision- making strategies require some security through sharing data collection. For many of these large sets of data, depending on the industry in which data is collected, customer profiles contain private and sensitive information which is collected for understanding behavior patterns. We face illegal access and the confidential nature of the information here. This subject needs to be taken extra caution to ensure all the rules and regulations are followed. The last pitfall that is concerning is performance. Since data mining performance depends on techniques and algorithms, these issues directly affect data mining performance. So massive data sets can lead to difficulties in the distribution of the information, ultimately leading back to the complexity of data. Some practices that follow hand in hand with pitfalls are skipping data quality checks and putting a model into production without adequate testing. When failing to detect and correct data quality problems, worthless results are because of the lack of accurate and timely information, leading to the lack of management and accountability. When dealing with complex data sets and large amounts of information, it is essential to be held accountable when reviewing the data and ensuring things are done thoroughly, which goes hand in hand with putting a model into

5 production without adequate testing. If not executed to their fullest potential, these practices could ultimately alter the testing results, jeopardizing the entire model. When collecting data of this stature, it is vital to make sure accountability and thoroughness are applied during these processes. Going through all the hard stuff to produce these models, just to become lackluster and screw up the end, makes all the hard work executed prior to the results worthless. Mcdonald's is an excellent example of a company that has successfully practiced data mining. With over 34,000 restaurants and 8 million people served, they utilize JMP data mining software to track the large customer inflow and analyze their preferences (Mukherjee, 2020). Just as data mining is designed to do, it analyzes an extensive data set, asses consumer knowledge to boost their global businesses, and also allows for exemplary customer service. In this example, data mining helps McDonald's analyze factors such as wait times, menu information, consumer patterns while ordering, etc. They collect their data from drive-thru experiences, mobile app ordering, and in-store digital menus. All the data collected for each consumer is analyzed and can optimize the entire customer experience. Facebook! Facebook is a company that could have done better at data mining. For example, since they have so many blunders, Facebook Beacon. As previously bought up in discussion posts, Beacon was launched as part of Facebook's advertising system in 2007 (Parker, 2019). This is where the unsolicited off-site activities found their way into Facebook feed and ads. Many users of Facebook were unaware that activities on third-party sites were being shared with Facebook, and there was no way around it or even a chance to opt out of it. In 2009, two years after the launch, Beacon was indeed shut. As many of us Facebook users have seen recently, the lesson was not learned. Some third-party sites still share our information with Facebook, resulting in even more unwanted solicited ads. Honestly, Mark Zuckerberg and

6 Facebook was a stolen idea, and all Zuckerberg saw were dollar signs. Still, to this day, he does not care about the users, but honestly, why would he? As much as we all complain about unsolicited and unwanted ads, knowing that information, users still use Facebook without checking up since it has become a way of life. There should have been an opt-out security button that allowed users to decide if they wanted their activities outside of Facebook to be tracked and that data collected. There should be an opt-out option as soon as the login screen appears. Facebook may be gaining valuable information. However, the concept of Facebook is to buy something other than that zero-point turn lawnmower but to keep up with family and friends, share moments of life with others, and so forth.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

7 References DataUntold. (2022, Feb. 7). Top 13 data mining challenges and pitfalls. https://datauntold.com/data-mining-challenges/ Hotz, N. (2023, Jan. 19). What is CRISP DM? Data Science Process Alliance. https://www.datascience-pm.com/crisp-dm-2/ Parker, T. (2019, Jan. 31). Facebook’s biggest data mining fails. Reclaim the Net. https://reclaimthenet.org/facebook-data-mining-fails/ Sharpe, N. D., De Veaux, R. D., & Velleman, P. F. (2019). Business statistics (4th ed.) .

Week 4 - Assignment

Related Documents