We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth of the RBF kernel λ must be optimally adjusted in order to obtain a sensible model. We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces, we use all positive real numbers for both C and λ. The performance of a single hyper-parameter setting is evaluated using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning process, we utilise parallel computing. Which of the following statements are correct? a) Using a nested resampling is necessary in order to detect underfitting. b) As both C and λ are numeric parameters, any other optimization algorithm could be used instead of random search. c) The choice of cross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would leadto similar results. d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validationloop should be parallelized.
We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth of the RBF kernel λ must be optimally adjusted in order to obtain a sensible model.
We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces, we use all positive real numbers for both C and λ. The performance of a single hyper-parameter setting is evaluated using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning process, we utilise parallel computing.
Which of the following statements are correct?
a) Using a nested resampling is necessary in order to detect underfitting.
b) As both C and λ are numeric parameters, any other optimization algorithm could be used instead of random search.
c) The choice of cross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would leadto similar results.
d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validationloop should be parallelized.
Step by step
Solved in 2 steps