Midterm Expected Questions
pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
6501
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
Pages
7
Uploaded by DeanProtonRabbit35
Midterm Expected Questions
Week 1
1)
Should high-risk situations such as eating a mushroom or testing for HIV require high or
low classifiers? (2.2)
a)
High
b)
Low
2)
What is NOT a type of structured data? (2.3)
a)
Binary data
b)
Time-Series data
c)
Unrelated data
d)
Text data
3)
Equation for SVM model: 5X1 + 100X2 + 0.01X3 + 1.2X4 + 15 = 0. Which of the
coefficients is NOT relevant? (2.6)
a)
5
b)
0.01
c)
1.2
d)
15
4)
Can non-linear lines be drawn in SVM? (2.6)
a)
Yes
b)
no
5)
For calculating batting avg, which scaling method should be used? (2.7)
a)
Common scaling
b)
Standardizing
6)
For clustering mode, which scaling method should be used? (2.7)
a)
Common scaling
b)
Standardizing
Answers
1: a
2: d
3: b (near zero coefficients are irrelevant)
4: a
5: a
6: b
Week 2
1)
When is clustering NOT appropriate (4.1)
a)
Grouping data points
b)
Finding probability
c)
Discovering groups in data points
2)
Infinity norm is.. (4.2)
a)
Largest (absolute) of a set of numbers
b)
Smallest (absolute) of a set of numbers
c)
Squared value of the coefficients
d)
Largest (absolute) coefficient
3)
K-mean algorithm is (4.3)
a)
Heuristic
b)
Machine learning
c)
Expectation-maximization
d)
All of the above
4)
Heuristic means (4.3)
a)
Fast and guaranteed to find the best solution
b)
Slow but guaranteed to find the best solution
c)
Fast but not guaranteed to find the best solution
5)
Clustering is (4.6)
a)
Supervised learning
b)
Unsupervised learning
Answers
1: b
2: a
3: d
4: c
5: b
Week 3
1)
Value isn’t far from the rest, but is far from the points nearly in time (5.2)
a)
Point outlier
b)
Collective outlier
c)
Contextual outlier
2)
Something is missing in a range of points, but cannot tell exactly where (5.2)
a)
Point outlier
b)
Collective outlier
c)
Contextual outlier
3)
Removing real data outliers can result in (5.3)
a)
Model being more precise
b)
Model being more explainable
c)
Model being too optimistic
d)
Model being too predictable
4)
Change detection is useful to (6.1)
a)
Determine whether action might be needed
b)
Determine impact of past action
c)
Determine changes to help plan
d)
A and B
e)
All of the above
5)
In CUSUM, bigger the C (6.2)
a)
More sensitive the method
b)
Less sensitive the method
Answers
1: c
2: b
3: c
4: e
5: b
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Week 4
1)
For the exponential smoothing equation
, When 0 is
𝑆𝑡 = 𝑎𝑥(𝑡) + (1 − 𝑎)𝑆(𝑡 − 1) closer to zero it means (7.1)
a)
A lot of randomness in the system
b)
Not much randomness in the system
c)
A big spike in the graph
d)
Not many spikes in the graph
2)
Holt-Winters is a ______ exponential smoothing (7.2)
a)
Single
b)
Double
c)
Triple
d)
Quadruple
3)
(select all correct answers) Exponential smoothing is used (7.3 / 7.4)
a)
to smooth out randomness
b)
to smooth out high peaks and valleys of real data
c)
for simple short-term forecasting
d)
for long-term complex forecasting
4)
The best estimate of the next baseline is the (7.4)
a)
Initial baseline
b)
Most current baseline
c)
Initial trend
d)
Most current trend
5)
ARIMA can be used on (7.5)
a)
Time series data
b)
any data
6)
GARCH is used to (7.6)
a)
Observe linear errors
b)
Estimate or forecast the variance
c)
Predict trend
d)
Smooth out randomness
7)
Which is NOT a method to analyze time series data? (7.6)
a)
Exponential smoothing
b)
ARIMA
c)
GARCH
d)
KNN
Answers:
1: a
2: c
3: a,c
4: b
5: a
6: b
7: d
Week 5
1)
How do you measure the quality of Simple Linear Regression model? (8.1)
a)
Sum of errors
b)
Sum of coefficients
c)
Sum of squared errors
d)
Sum of squared coefficients
2)
What’s the difference between AIC (akaike info criterion) and BIC (bayesian info
criterion)? (8.2)
a)
AIC encourages models with fewer parameters than BIC does
b)
BIC encourages models with fewer parameters than AIC does
c)
AIC does not have a penalty term for having more parameters
d)
BIC does not have a penalty term for having more parameters
3)
Which components of Analytics is Regression equipped to answer? Select all that apply
(8.3)
a)
Descriptive Analytics
b)
Predictive Analytics
c)
Prescriptive Analytics
4)
Can a regression model have a non-linear line? (8.5)
a)
Yes
b)
no
5)
For p-values, a higher threshold means (8.6)
a)
More factors and the possibility of including irrelevant factors
b)
Less factors and the possibility of leaving out relevant factors
c)
More factors and the possibility of leaving out relevant factors
d)
Less factors and the possibility of including irrelevant factors
6)
Which output can be used to determine the importance of coefficients? Select all correct
answers (8.6)
a)
P-values
b)
Confidence Interval
c)
Coefficient
d)
T-statistic (coefficient divided by its standard error)
Answers
1: c
2: b
3: a,b
4: a
5: a
6: a, b, c, d
Week 6
1)
Which method can be used to deal with heteroscedasticity? (9.1)
a)
Exponential smoothing
b)
Box and whisker plot
c)
Box cox transformation
d)
Linear regression
2)
Detrending can be used on (9.2)
a)
Responses
b)
Predictors
c)
Factor based models
d)
All of the above
3)
Which of these statements are Principal Component Analysis is true? Select all that
apply (9.3)
a)
For high-dimensional and correlated data
b)
PCA attempts to remove the correlations in the data
c)
Rank coordinates by importance
d)
Concentrate on first
n
principal components to reduce randomness
Answers
1: c
2: d
3: a, b, c, d
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Week 7
1)
Trees can be used for? Choose all correct answers (10.1)
a)
Classification problems
b)
Decision making
c)
Clustering
d)
Prediction
2)
Which models can we NOT make with trees? (10.1)
a)
Logistic regression models
b)
Classification models
c)
Decision models
d)
SVM models
3)
When do we stop branching a Tree? (10.2)
a)
Never. The more branches the better
b)
When half of all the data points are used
c)
Split no longer decreases variance more than the threshold
d)
When p-value = 0.05
4)
What is the disadvantage of having too many branches? (10.2)
a)
Lower p-value
b)
Overfitting
c)
Underfitting
d)
Too much data can end up in one leaf
5)
Order the models from most to least explainability (10.3)
a)
Regression tree
b)
Linear regression
c)
Random forest
6)
The equation for calculating sensitivity for Logistic Regression (10.4)
a)
TN / (TN + FN)
b)
TN / (TN + FP)
c)
TN / (TN + TP)
d)
TN / (FN + FP)
Answers:
1: a,b,d
2: d
3: c
4: b
5: b,a,c
6: a