Terms in this set
docx
keyboard_arrow_up
School
Slippery Rock University of Pennsylvania *
*We aren’t endorsed by this school
Course
230
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
4
Uploaded by DrClover9807
Terms in this set (74)
Original
True
True/False
We can build unsupervised data mining models when we lack labels for the target variable in the training
data
False
True/False
For supervised data mining the value of the target variable is known when the model is used to predict
future unseen data
False
True/False
Finding the characteristics that differentiate my most profitable customers form my less profitable
customers is an example of an unsupervised learning task
True
True/False
Cross-validation is used to estimate generalization performance
False
True/False
The points on a model's precision-recall curve represent the cost of different classifications
B
Which is NOT true about overfitting?
A. If a model is overfitting, it will have a poor generalization performance.
B. Overfitting happens when the model is overly simplified.
C. A hold-out set can be used to examine overfitting.
D. Overfitting can be avoided by tuning parameters on a validation set or via crossvalidation.
D
Which of the following does NOT describe SVM (support vector machine)?
A. SVM can be applied when the data are not linearly separable
B. The decision boundaries are determined by the support vectors. Other training data can be
ignored.
C. SVM makes a prediction by evaluating the similarity between the new instance and the
support vectors, usually represented by a kernel function.
D. SVM uses Hinge loss as a loss function which is measured as the distance between the
error point to the decision boundary
C
Which of the following models has a decision boundary different than others?
A. Linear regression
B. Logistic regression
C. CART
D. SVM with a linear kernel
D
Which of the following is true about logistic regression?
A. Logistic regression is a regression model and needs a numerical target variable.
B. Logistic regression can generate non-linear decision boundaries without feature
engineering.
C. Logistic regression can directly work with any form of data so no data transformation is
required for categorical attributes.
D. Logistic regression predicts probability of membership in the positive class.
Cross Validation
Generalization performance
Domain-knowledge validation
Comprehensibility
ROC Curve
Ranking
Overfitting avoidance
Complexity Control
Entropy
How mixed up classes are
Logistic
Log odds
Information Gain
Difference between parents and children
Regression
Numeric target
SVMs
Maximum margin
A model is too fit to the training set and does not generalize to the unseen data
Briefly explain what overfitting is
Classification
find a decision boundary that separates one class from the other; supervised segmentation
Regression
value estimation; given an input x, predict a numerical value for the target variable y
Clustering
grouping individuals together by their similarity so that individuals in the same group are more similar to
each other than those in other groups
Co-occurence grouping
Frequent items mining, association rule discovery, market-basket analysis
nothing to predict
associating items often bought together
Data reduction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
reduce the dimension of the data to focus more on something; replace large dataset with small dataset
Training set
data used to build models