MIS 655 Topic 8 DQ 2

docx

School

Grand Canyon University *

*We aren’t endorsed by this school

Course

MIS 655

Subject

Information Systems

Date

Feb 20, 2024

Type

docx

Pages

2

Uploaded by MasterTitanium11775

Report
How does having more records to base a rule on affect the conclusion (i.e., prediction)? What is the effect of more data on sampling chance in the Naïve Bayes classifier? More data leads to more predictive power. For sophisticated models such as gradient boosted trees and random forests, quality data and feature engineering reduce the errors drastically. But simply having more data is not useful. The saying that businesses need a lot of data is a myth. Large amounts of data afford simple models much more power; if you have 1 trillion data points, outliers are easier to classify and the underlying distribution of that data is clearer. If you have 10 data points, this is probably not the case. You’ll have to perform more sophisticated normalization and transformation routines on the data before it is useful. Researchers have demonstrated that massive data can lead to lower estimation variance and hence better predictive performance. More data increases the probability that it contains useful information, which is advantageous.  However, not all data is always helpful. A good example is clickstream data utilized by e-com companies where a user’s actions are monitored and analyzed. Such data includes parts of the page that are clicked, keywords, cookie data, cursor positions and web page components that are visible. This is a lot of data coming in rapidly, but only a portion is valuable in predicting a user’s characteristics and preferences. The rest is noise. The Naïve Bayes classifier is a simple and versatile classifier. Since the computations are cheap, the Naive Bayes classifier works very efficiently for large datasets. However, Increasing the number of features in a naive Bayes classifier does not always guarantee an improvement in performance. While more features can potentially capture more information, they can also lead to overfitting, increased computational complexity, and the inclusion of irrelevant or redundant information. It's important to carefully consider the quality and relevance of the features being added to ensure that they contribute positively to the classifier's performance. Regularization techniques and feature selection methods can also be used to mitigate the potential downsides of increasing the number of features.   Chawla, V. (2020). AIM. I s More Data Always Better For Building Analytics Models? https://analyticsindiamag.com/is-more-data-always-better-for-building-analytics-models/ UC Business Analytics R Programming Guide. (n.d.). Naïve Bayes Classifier https://uc-r.github.io/naive_bayes
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help