Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been pruned back to the red line. 0.9 O.85 0.8 0.75 0.7 0.65 On trainine data On validation data- On validation data (during pruning) 0.6 0.55 0.5 20 70 100 Size of tree (number of nodes) Figure 2: Pruned decision tree Refer to Figure 2. Let's say that we have a third dataset Dnew (from the same data distribution), which is not used for training or pruning. If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25 nodes, and why? Select one. Select one: Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes) Around 0.73 (the same as the accuracy for validation data at 25 nodes) Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes) None of the above Which of the following gives us the best approximation of the true error? Line corresponding to training data Line corresponding to validation data Line corresponding to new dataset Dnew Which of the following are valid ways to avoid overfitting? Select all that apply. Select all that apply: O Decrease the training set size. O Set a threshold for a minimum number of samples required to split at an internal node. O Prune the tree so that cross-validation error is minimal. O Maximize the tree depth. O None of the above.

MATLAB: An Introduction with Applications
6th Edition
ISBN:9781119256830
Author:Amos Gilat
Publisher:Amos Gilat
Chapter1: Starting With Matlab
Section: Chapter Questions
Problem 1P
icon
Related questions
Question

1

Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been
pruned back to the red line.
0.9
0.85
0.8
0.75
0.7
0.65
On training data
On validation data-
On validation data (during pruning)
0.6
0.55
0.5
10
20
30
40
50
60
70
80
90
100
Size of tree (number of nodes)
Figure 2: Pruned decision tree
Refer to Figure 2.
Let's say that we have a third dataset Dnew (from the same data
distribution), which is not used for training or pruning.
If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25
nodes, and why? Select one.
Select one:
Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes)
Around 0.73 (the same as the accuracy for validation data at 25 nodes)
Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes)
None of the above
Which of the following gives us the best approximation of the true error?
Line corresponding to training data
Line corresponding to validation data
Line corresponding to new dataset Dnew
Which of the following are valid ways to avoid overfitting? Select all that apply.
Select all that apply:
O Decrease the training set size.
O Set a threshold for a minimum number of samples required to split at an internal node.
O Prune the tree so that cross-validation error is minimal.
O Maximize the tree depth.
O None of the above.
Transcribed Image Text:Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been pruned back to the red line. 0.9 0.85 0.8 0.75 0.7 0.65 On training data On validation data- On validation data (during pruning) 0.6 0.55 0.5 10 20 30 40 50 60 70 80 90 100 Size of tree (number of nodes) Figure 2: Pruned decision tree Refer to Figure 2. Let's say that we have a third dataset Dnew (from the same data distribution), which is not used for training or pruning. If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25 nodes, and why? Select one. Select one: Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes) Around 0.73 (the same as the accuracy for validation data at 25 nodes) Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes) None of the above Which of the following gives us the best approximation of the true error? Line corresponding to training data Line corresponding to validation data Line corresponding to new dataset Dnew Which of the following are valid ways to avoid overfitting? Select all that apply. Select all that apply: O Decrease the training set size. O Set a threshold for a minimum number of samples required to split at an internal node. O Prune the tree so that cross-validation error is minimal. O Maximize the tree depth. O None of the above.
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 4 steps

Blurred answer
Recommended textbooks for you
MATLAB: An Introduction with Applications
MATLAB: An Introduction with Applications
Statistics
ISBN:
9781119256830
Author:
Amos Gilat
Publisher:
John Wiley & Sons Inc
Probability and Statistics for Engineering and th…
Probability and Statistics for Engineering and th…
Statistics
ISBN:
9781305251809
Author:
Jay L. Devore
Publisher:
Cengage Learning
Statistics for The Behavioral Sciences (MindTap C…
Statistics for The Behavioral Sciences (MindTap C…
Statistics
ISBN:
9781305504912
Author:
Frederick J Gravetter, Larry B. Wallnau
Publisher:
Cengage Learning
Elementary Statistics: Picturing the World (7th E…
Elementary Statistics: Picturing the World (7th E…
Statistics
ISBN:
9780134683416
Author:
Ron Larson, Betsy Farber
Publisher:
PEARSON
The Basic Practice of Statistics
The Basic Practice of Statistics
Statistics
ISBN:
9781319042578
Author:
David S. Moore, William I. Notz, Michael A. Fligner
Publisher:
W. H. Freeman
Introduction to the Practice of Statistics
Introduction to the Practice of Statistics
Statistics
ISBN:
9781319013387
Author:
David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:
W. H. Freeman