Terms in this set

docx

School

Slippery Rock University of Pennsylvania *

*We aren’t endorsed by this school

Course

230

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

4

Uploaded by DrClover9807

Report
Terms in this set (74) Original True True/False We can build unsupervised data mining models when we lack labels for the target variable in the training data False True/False For supervised data mining the value of the target variable is known when the model is used to predict future unseen data False True/False Finding the characteristics that differentiate my most profitable customers form my less profitable customers is an example of an unsupervised learning task True True/False Cross-validation is used to estimate generalization performance False True/False The points on a model's precision-recall curve represent the cost of different classifications B Which is NOT true about overfitting? A. If a model is overfitting, it will have a poor generalization performance. B. Overfitting happens when the model is overly simplified. C. A hold-out set can be used to examine overfitting.
D. Overfitting can be avoided by tuning parameters on a validation set or via crossvalidation. D Which of the following does NOT describe SVM (support vector machine)? A. SVM can be applied when the data are not linearly separable B. The decision boundaries are determined by the support vectors. Other training data can be ignored. C. SVM makes a prediction by evaluating the similarity between the new instance and the support vectors, usually represented by a kernel function. D. SVM uses Hinge loss as a loss function which is measured as the distance between the error point to the decision boundary C Which of the following models has a decision boundary different than others? A. Linear regression B. Logistic regression C. CART D. SVM with a linear kernel D Which of the following is true about logistic regression? A. Logistic regression is a regression model and needs a numerical target variable. B. Logistic regression can generate non-linear decision boundaries without feature engineering. C. Logistic regression can directly work with any form of data so no data transformation is required for categorical attributes. D. Logistic regression predicts probability of membership in the positive class. Cross Validation Generalization performance Domain-knowledge validation Comprehensibility ROC Curve
Ranking Overfitting avoidance Complexity Control Entropy How mixed up classes are Logistic Log odds Information Gain Difference between parents and children Regression Numeric target SVMs Maximum margin A model is too fit to the training set and does not generalize to the unseen data Briefly explain what overfitting is Classification find a decision boundary that separates one class from the other; supervised segmentation Regression value estimation; given an input x, predict a numerical value for the target variable y Clustering grouping individuals together by their similarity so that individuals in the same group are more similar to each other than those in other groups Co-occurence grouping Frequent items mining, association rule discovery, market-basket analysis nothing to predict associating items often bought together Data reduction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
reduce the dimension of the data to focus more on something; replace large dataset with small dataset Training set data used to build models