Midterm Expected Questions

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6501

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

pdf

Pages

7

Uploaded by DeanProtonRabbit35

Report
Midterm Expected Questions Week 1 1) Should high-risk situations such as eating a mushroom or testing for HIV require high or low classifiers? (2.2) a) High b) Low 2) What is NOT a type of structured data? (2.3) a) Binary data b) Time-Series data c) Unrelated data d) Text data 3) Equation for SVM model: 5X1 + 100X2 + 0.01X3 + 1.2X4 + 15 = 0. Which of the coefficients is NOT relevant? (2.6) a) 5 b) 0.01 c) 1.2 d) 15 4) Can non-linear lines be drawn in SVM? (2.6) a) Yes b) no 5) For calculating batting avg, which scaling method should be used? (2.7) a) Common scaling b) Standardizing 6) For clustering mode, which scaling method should be used? (2.7) a) Common scaling b) Standardizing Answers 1: a 2: d 3: b (near zero coefficients are irrelevant) 4: a 5: a 6: b
Week 2 1) When is clustering NOT appropriate (4.1) a) Grouping data points b) Finding probability c) Discovering groups in data points 2) Infinity norm is.. (4.2) a) Largest (absolute) of a set of numbers b) Smallest (absolute) of a set of numbers c) Squared value of the coefficients d) Largest (absolute) coefficient 3) K-mean algorithm is (4.3) a) Heuristic b) Machine learning c) Expectation-maximization d) All of the above 4) Heuristic means (4.3) a) Fast and guaranteed to find the best solution b) Slow but guaranteed to find the best solution c) Fast but not guaranteed to find the best solution 5) Clustering is (4.6) a) Supervised learning b) Unsupervised learning Answers 1: b 2: a 3: d 4: c 5: b
Week 3 1) Value isn’t far from the rest, but is far from the points nearly in time (5.2) a) Point outlier b) Collective outlier c) Contextual outlier 2) Something is missing in a range of points, but cannot tell exactly where (5.2) a) Point outlier b) Collective outlier c) Contextual outlier 3) Removing real data outliers can result in (5.3) a) Model being more precise b) Model being more explainable c) Model being too optimistic d) Model being too predictable 4) Change detection is useful to (6.1) a) Determine whether action might be needed b) Determine impact of past action c) Determine changes to help plan d) A and B e) All of the above 5) In CUSUM, bigger the C (6.2) a) More sensitive the method b) Less sensitive the method Answers 1: c 2: b 3: c 4: e 5: b
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Week 4 1) For the exponential smoothing equation , When 0 is 𝑆𝑡 = 𝑎𝑥(𝑡) + (1 − 𝑎)𝑆(𝑡 − 1) closer to zero it means (7.1) a) A lot of randomness in the system b) Not much randomness in the system c) A big spike in the graph d) Not many spikes in the graph 2) Holt-Winters is a ______ exponential smoothing (7.2) a) Single b) Double c) Triple d) Quadruple 3) (select all correct answers) Exponential smoothing is used (7.3 / 7.4) a) to smooth out randomness b) to smooth out high peaks and valleys of real data c) for simple short-term forecasting d) for long-term complex forecasting 4) The best estimate of the next baseline is the (7.4) a) Initial baseline b) Most current baseline c) Initial trend d) Most current trend 5) ARIMA can be used on (7.5) a) Time series data b) any data 6) GARCH is used to (7.6) a) Observe linear errors b) Estimate or forecast the variance c) Predict trend d) Smooth out randomness 7) Which is NOT a method to analyze time series data? (7.6) a) Exponential smoothing b) ARIMA c) GARCH d) KNN Answers: 1: a 2: c 3: a,c 4: b 5: a 6: b 7: d
Week 5 1) How do you measure the quality of Simple Linear Regression model? (8.1) a) Sum of errors b) Sum of coefficients c) Sum of squared errors d) Sum of squared coefficients 2) What’s the difference between AIC (akaike info criterion) and BIC (bayesian info criterion)? (8.2) a) AIC encourages models with fewer parameters than BIC does b) BIC encourages models with fewer parameters than AIC does c) AIC does not have a penalty term for having more parameters d) BIC does not have a penalty term for having more parameters 3) Which components of Analytics is Regression equipped to answer? Select all that apply (8.3) a) Descriptive Analytics b) Predictive Analytics c) Prescriptive Analytics 4) Can a regression model have a non-linear line? (8.5) a) Yes b) no 5) For p-values, a higher threshold means (8.6) a) More factors and the possibility of including irrelevant factors b) Less factors and the possibility of leaving out relevant factors c) More factors and the possibility of leaving out relevant factors d) Less factors and the possibility of including irrelevant factors 6) Which output can be used to determine the importance of coefficients? Select all correct answers (8.6) a) P-values b) Confidence Interval c) Coefficient d) T-statistic (coefficient divided by its standard error) Answers 1: c 2: b 3: a,b 4: a 5: a 6: a, b, c, d
Week 6 1) Which method can be used to deal with heteroscedasticity? (9.1) a) Exponential smoothing b) Box and whisker plot c) Box cox transformation d) Linear regression 2) Detrending can be used on (9.2) a) Responses b) Predictors c) Factor based models d) All of the above 3) Which of these statements are Principal Component Analysis is true? Select all that apply (9.3) a) For high-dimensional and correlated data b) PCA attempts to remove the correlations in the data c) Rank coordinates by importance d) Concentrate on first n principal components to reduce randomness Answers 1: c 2: d 3: a, b, c, d
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Week 7 1) Trees can be used for? Choose all correct answers (10.1) a) Classification problems b) Decision making c) Clustering d) Prediction 2) Which models can we NOT make with trees? (10.1) a) Logistic regression models b) Classification models c) Decision models d) SVM models 3) When do we stop branching a Tree? (10.2) a) Never. The more branches the better b) When half of all the data points are used c) Split no longer decreases variance more than the threshold d) When p-value = 0.05 4) What is the disadvantage of having too many branches? (10.2) a) Lower p-value b) Overfitting c) Underfitting d) Too much data can end up in one leaf 5) Order the models from most to least explainability (10.3) a) Regression tree b) Linear regression c) Random forest 6) The equation for calculating sensitivity for Logistic Regression (10.4) a) TN / (TN + FN) b) TN / (TN + FP) c) TN / (TN + TP) d) TN / (FN + FP) Answers: 1: a,b,d 2: d 3: c 4: b 5: b,a,c 6: a