Question 49. We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth of the RBF kernel A must be optimally adjusted in order to obtain a sensible model. We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces, we use all positive real numbers for both C and A. The performance of a single hyper-parameter setting is evaluated using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning process, we utilise parallel computing. Which of the following statements are correct? a) Using a nested resampling is necessary in order to detect underfitting. b) As both C and A are numeric parameters, any other optimization algorithm could be used instead of random search. c) The choice of eross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would lead to similar results. d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validation

MATLAB: An Introduction with Applications
6th Edition
ISBN:9781119256830
Author:Amos Gilat
Publisher:Amos Gilat
Chapter1: Starting With Matlab
Section: Chapter Questions
Problem 1P
icon
Related questions
icon
Concept explainers
Topic Video
Question
Question 49. We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a
model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth
of the RBF kernel A must be optimally adjusted in order to obtain a sensible model.
We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are
laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling
strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces,
we use all positive real numbers for both C and A. The performance of a single hyper-parameter setting is evaluated
using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning
process, we utilise parallel computing.
Which of the following statements are correct?
a) Using a nested resampling is necessary in order to detect underfitting.
b) As both C and A are numeric parameters, any other optimization algorithm could be used instead of random
search.
c) The choice of cross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would lead
to similar results.
d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validation
loop should be parallelized.
Transcribed Image Text:Question 49. We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth of the RBF kernel A must be optimally adjusted in order to obtain a sensible model. We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces, we use all positive real numbers for both C and A. The performance of a single hyper-parameter setting is evaluated using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning process, we utilise parallel computing. Which of the following statements are correct? a) Using a nested resampling is necessary in order to detect underfitting. b) As both C and A are numeric parameters, any other optimization algorithm could be used instead of random search. c) The choice of cross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would lead to similar results. d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validation loop should be parallelized.
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Centre, Spread, and Shape of a Distribution
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, statistics and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
MATLAB: An Introduction with Applications
MATLAB: An Introduction with Applications
Statistics
ISBN:
9781119256830
Author:
Amos Gilat
Publisher:
John Wiley & Sons Inc
Probability and Statistics for Engineering and th…
Probability and Statistics for Engineering and th…
Statistics
ISBN:
9781305251809
Author:
Jay L. Devore
Publisher:
Cengage Learning
Statistics for The Behavioral Sciences (MindTap C…
Statistics for The Behavioral Sciences (MindTap C…
Statistics
ISBN:
9781305504912
Author:
Frederick J Gravetter, Larry B. Wallnau
Publisher:
Cengage Learning
Elementary Statistics: Picturing the World (7th E…
Elementary Statistics: Picturing the World (7th E…
Statistics
ISBN:
9780134683416
Author:
Ron Larson, Betsy Farber
Publisher:
PEARSON
The Basic Practice of Statistics
The Basic Practice of Statistics
Statistics
ISBN:
9781319042578
Author:
David S. Moore, William I. Notz, Michael A. Fligner
Publisher:
W. H. Freeman
Introduction to the Practice of Statistics
Introduction to the Practice of Statistics
Statistics
ISBN:
9781319013387
Author:
David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:
W. H. Freeman