Validation set

docx

School

Slippery Rock University of Pennsylvania *

*We aren’t endorsed by this school

Course

MISC

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

4

Uploaded by DrClover9807

Report
Validation set data used to tune parameters in models Testing set data used to assess the likely future performance of a model Supervised training data includes both input (x) and result (y) Unsupervised the model is NOT provided with the results (y) during training S Supervised or Unsupervised? Classification S Supervised or Unsupervised? Regression S Supervised or Unsupervised? Ranking U Supervised or Unsupervised? Clustering U Supervised or Unsupervised? Co-occurence grouping/frequent itemset BOTH
Supervised or Unsupervised? Data reduction Cross Industry Standard Process for Data Mining CRISP-DM CRISP-DM process that places a structure on the problem life cycle of 6 phases used to maintain reasonable consistency, repeatability, and objectiveness Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment 6 phases of CRISP-DM Data leakage a variable collected in historical data gives information on the target variable (info that appears in historical data but is not actually available when the decision has to be made) imputation replacing missing data with substituted values estimated from the data set Categorical/nominal data data that has two or more categories, but there is no intrinsic ordering to the categories ordinal data data that has two or more categories, has a clear ordering of the variables Box Plot Histogram
Scatter Plot 3 types of data visualization for numerical attributes Bar Plot Dot Plot Mosaic Plot 3 types of data visualization for categorical attributes Regression Company X wants to know how much return on investment it is going to get based on the funds it has invested in marketing a new product... what type of problem is this? False True/False The difference between supervised and unsupervised learning is supervised learning has a categorical target variable and unsupervised learning has a numeric target variable False True/False The best way to deal with missing values in a feature is to always remove observations with missing True True/False When implementing CRISP-DM, a data scientist often needs to go through the operation for several iterations Predictive Model formula learned from old data for estimating the unknown value of interest for some new data pure certain in the outcome, homogeneous with respect to the target variable entropy measures uncertainty/impurity higher, more
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The ____ the entropy value, the ___ uncertain/impure the data is