88

jpg

School

Govt. College for the Elementary Teachers, Kasur *

*We aren’t endorsed by this school

Course

CREATIVE W

Subject

Computer Science

Date

Nov 24, 2024

Type

jpg

Pages

Uploaded by hdhdbdn

Decision trees in scikit-learn are implemented in the DecisionTreeRegressor and DecisionTreeClassifier classes. scikit-learn only implements pre-pruning, not post-pruning. Let’s look at the effect of pre-pruning in more detail on the Breast Cancer dataset. As always, we import the dataset and split it into a training and a test part. Then we build a model using the default setting of fully developing the tree (growing the tree until all leaves are pure). We fix the random_state in the tree, which is used for tie- breaking internally: In[58]: from sklearn.tree import DecisionTreeClassifier cancer = load_breast_cancer() X_train, X_test, y_train, y_test = train_test_split( cancer.data, cancer.target, stratify=cancer.target, random_state=42) tree = DecisionTreeClassifier(random_state=0) tree.fit(X_train, y_train) print("Accuracy on training set: {:.3f}".format(tree.score(X_train, y_train))) print("Accuracy on test set: {:.3f}".format(tree.score(X_test, y_test))) Out[58]: Accuracy on training set: 1.000 Accuracy on test set: 0.937 As expected, the accuracy on the training set is 100%—because the leaves are pure, the tree was grown deep enough that it could perfectly memorize all the labels on the training data. The test set accuracy is slightly worse than for the linear models we looked at previously, which had around 95% accuracy. If we don’t restrict the depth of a decision tree, the tree can become arbitrarily deep and complex. Unpruned trees are therefore prone to overfitting and not generalizing well to new data. Now let’s apply pre-pruning to the tree, which will stop developing the tree before we perfectly fit to the training data. One option is to stop building the tree after a certain depth has been reached. Here we set max_depth=4, meaning only four consecutive questions can be asked (cf. Figures 2-24 and 2-26). Limiting the depth of the tree decreases overfitting. This leads to a lower accuracy on the training set, but an improvement on the test set: In[59]: tree = DecisionTreeClassifier(max_depth=4, random_state=0) tree.fit(X_train, y_train) print("Accuracy on training set: {:.3f}".format(tree.score(X_train, y_train))) print("Accuracy on test set: {:.3f}".format(tree.score(X_test, y_test))) Supervised Machine Learning Algorithms | 75

Discover more documents: Sign up today!

Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

disc12-regular.pdf

final-games-decnets-sols.pdf

Screenshot (1180).png

Professional Development Plan.docx

Book Analysis 2.edited - Copy.docx

Shiflett_CJUS_361 LUO_PWL.pptx

Tutorial7_week7.pdf

Jesse Laprade's edited White Paper - Nilanjana Ray.docx.pdf

Screenshot (454).png

CIS_Week9A.docx

final-hmm.pdf

attachment_1 (40).docx

Recommended textbooks for you

C++ Programming: From Problem Analysis to Program...

Computer Science

ISBN:9781337102087

Author:D. S. Malik

Publisher:Cengage Learning

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Microsoft Visual C#

Computer Science

ISBN:9781337102100

Author:Joyce, Farrell.

Publisher:Cengage Learning,

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

EBK JAVA PROGRAMMING

Computer Science

ISBN:9781337671385

Author:FARRELL

Publisher:CENGAGE LEARNING - CONSIGNMENT

SEE MORE TEXTBOOKS

Recommended textbooks for you

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT