We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth of the RBF kernel λ must be optimally adjusted in order to obtain a sensible model. We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces, we use all positive real numbers for both C and λ. The performance of a single hyper-parameter setting is evaluated using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning process, we utilise parallel computing. Which of the following statements are correct? a) Using a nested resampling is necessary in order to detect underfitting. b) As both C and λ are numeric parameters, any other optimization algorithm could be used instead of random search. c) The choice of cross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would lead to similar results. d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validation loop should be parallelized.
Inverse Normal Distribution
The method used for finding the corresponding z-critical value in a normal distribution using the known probability is said to be an inverse normal distribution. The inverse normal distribution is a continuous probability distribution with a family of two parameters.
Mean, Median, Mode
It is a descriptive summary of a data set. It can be defined by using some of the measures. The central tendencies do not provide information regarding individual data from the dataset. However, they give a summary of the data set. The central tendency or measure of central tendency is a central or typical value for a probability distribution.
Z-Scores
A z-score is a unit of measurement used in statistics to describe the position of a raw score in terms of its distance from the mean, measured with reference to standard deviation from the mean. Z-scores are useful in statistics because they allow comparison between two scores that belong to different normal distributions.
Question 49. We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth of the RBF kernel λ must be optimally adjusted in order to obtain a sensible model.
We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces, we use all positive real numbers for both C and λ. The performance of a single hyper-parameter setting is evaluated using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning process, we utilise parallel computing.
Which of the following statements are correct?
-
a) Using a nested resampling is necessary in order to detect underfitting.
-
b) As both C and λ are numeric parameters, any other optimization algorithm could be used instead of random search.
-
c) The choice of cross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would lead to similar results.
-
d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validation loop should be parallelized.
Step by step
Solved in 2 steps