Skip to main content

Documents Computer Science

ML-COMP3020-Lecture 2-Decision Trees.pdf

ML-COMP3020-Lecture 2-Decision Trees

pdf

School

University of Technology Sydney *

*We aren’t endorsed by this school

Course

3020

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

47

Uploaded by JusticeKudu453

Decision Trees COMP3020 - Machine Learning KHOA D. DOAN khoa.dd@vinuni.edu Slides adapted from SOHEIL KOLOURI

Previously e \What does “learning by examples” mean? o Classification tasks o Learning requires examples + inductive bias o Memorization is not enough, need generalization o Formalizing the learning problem m Learning as function approximation m Learning as minimizing expected loss

Today: Decision Trees What is decision tree? How to learn a decision tree from data? What is inductive bias? What exactly is generalization?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Day @ Weather Temperature Humidity Wind Play? Atraining set: <y ot figh | Weak | No | Playmg Badminton Cloudy Hot High Weak Yes 3| sunny Mild Normal | Strong | Yes 4 Cloudy Mild High Strong Yes 5 Rainy Mild High Strong No 6 Rainy Cool Normal Strong No 7 Rainy Mild High Weak Yes 8 Sunny Hot High Strong No 9 Cloudy Hot Normal Weak Yes [Source] 10 Rainy Mild High Strong No

A decision tree for playing badminton [ } Each node tests a Weather _ feature (attribute) Each branch correspondS/‘\ to a feature value gynny Cloudy Rainy | (outcome) \ [ Humidity} Yes [ Wind 1 /High Norn{ }‘ong Weak\ Each leaf node assigns No Yes . . No Yes a final decision

Decision Trees e Representation o Each node tests a feature (attribute) o Each branch corresponds to a feature value (outcome) o Each leaf node assigns a final decision (classification) m Or adistribution over decisions e A decision tree represents a function that maps examples In XtoaclassinY:-: f. <Weather, Temperature, Humidity, Wind> — PlayBadminton?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Exercise: represent Boolean functions with decision trees AND OR XOR

Exercise: represent Boolean functions with decision trees AND OR XOR X1 X2 X18&X2 T T T T F F F T F F F F ° False True False

Today: Decision Trees What is decision tree? How to learn a decision tree from data? What is inductive bias? What exactly is generalization?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Function approximation with decision trees Problem Setting e Setof possible instances X feature vector = [zq,...,z4] € X e Unknown target function f: X — Y Y is discrete valued e Set of function hypotheses H = {h|h : X — Y|} h is a decision tree Input e J[raining examples {(m(l),y(l)), (m(z),y(z)), Cee (:L'(N),y(N))} of unknown target function Output: Hypothesis h € H that best approximates target function f 10

Decision Trees Learning e Find the hypothesis h ¢ H o Such that it minimizes the training error o Or maximizes the training accuracy e Challenges? o H is too large for exhaustive search 11

Decision Trees Learning e Find the hypothesis h ¢ H o Such that it minimizes the training error o Or maximizes the training accuracy e Challenges? o H is too large for exhaustive search o We opt for heuristic (greedy) search algorithms that m Pick questions to ask, in order m Such that classification accuracy is maximized 12

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Top-down Induction of Decision Trees DTtrain(examples for CurrentNode, features at CurrentNode): Find F, best decision feature for CurrentNode For each value of F, create new child node Assign training examples to child node If training examples perfectly classified Stop Else Recursively apply DTtrain over this new child node 13

How to select the best feature? e A good feature is one that lets us make “correct’ classification decisions I\ B o]V N olo V1 (o B-T el g1\ e] g (SN [SL=X] [ a WAV o F= 1 "\VZ0 101 [0 MY O o150 o One way: select feature that results in the best classification accuracy o Let'stryitout: o Build a decision tree with Humidity and Wind only 14

Build a decision tree: partition with Wind? [ Wind ] Weak Weather Day Temperature Humidity Wind Play? 1 Sunny Hot High Weak No 2 Cloudy Hot High Weak Yes 3 Sunny Mild Normal Strong Yes 4 Cloudy Mild High Strong Yes 5 Rainy Mild High Strong No 6 Rainy Cool Normal Strong | No 7 Rainy Mild High Weak Yes 8 Sunny Hot High Strong ‘ No 9 Cloudy Hot Normal Weak Yes 10 Rainy Mild High Strong | No - 15

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Weather Build a decision Day Temperature = Humidity Wind Play? tre_e' partltlon with 1 sunny Hot High Weak No Wind? 2 Cloudy Hot High Weak Yes [ Wind ] (3 Sunny Mild Normal Strong Yes 4 Cloudy Mild High Strong Yes 5 Rainy Mild High Strong No \_ 6 Rainy Cool Normal Strong No Acc: 0.75 Acc: 0.66 Avg: 0.71 7 Rainy Mild High Weak Yes 8 Sunny Hot High Strong No 9 Cloudy Hot Normal Weak Yes 10 Rainy Mild High Strong No 16

Build a decision tree: partition with Humidity? [ Wind ] Weak Y N Y D Avg: 0.71 Strong [ Humidity ] High Normal Avg: 0.67 Day @ Weather Temperature Humidity Wind Play? 1 Sunny Hot High Weak No 2 Cloudy Hot High Weak Yes 3 Sunny Mild Normal Strong Yes 4 Cloudy Mild High Strong Yes 5 Rainy Mild High Strong No 6 l Rainy Cool Normal Strong | No 7 | Rainy Mild High Weak Yes 8 Sunny Hot High Strong = No 9 Cloudy Hot Normal Weak Yes 10 Rainy Mild High i Strong | No ]

Build a decision tree: partition with Humidity & Wind? [ Wind ] Weak Strong [ Humidity ] High Normal Acc: 0.75 Acc: 1.00 Avg: 0.875 Weather Day Temperature Humidity Wind Play? 1 Sunny Hot f High Weak No \‘ 2 Cloudy Hot J\ahHighg\ ~ Weak =\Yesl%)\ 3 Sunny Mild | Normal Strong Yes | 4 Cloudy Mild High Strong Yes 5 Rainy Mild High Strong No 6 Rainy Cool Normal | Strong No 7 | Rainy ild High | Weak | Yes || 8 Sunny Hot i ngh S;rong | No 9 Cloudy Hot Normal Weak Yes 10 Rainy Mild High Strong | No - 18

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Another feature selection criterion: Entropy e Used in ID3 algorithm [Quinlan et al, 1963] o Pick feature with the smallest entropy to split the examples at each iteration e Entropy measures impurity of a sample of examples 19

Sample Entropy H(S) = —py logypy —p-logy p- e S is asample of training examples e p. proportion of positive samples in S e p_ proportion of negative samples in S e entropy measures the impurity of S 20

Entropy of a random variable e Entropy H(X) of a random variable X- ZP = 1) log, P(X = 1) e H(X)isthe number of expected bits needed to encode a randomly drawn value of X (under most efficient code) 21

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Entropy of a random variable Entropy H(X) of a random variable X: ZP = 1) log, P(X = 1) e H(X)isthe number of expected bits needed to encode a randomly drawn value of X (under most efficient code) Why entropy? Information theory o To encode a message X = 7, the most efficient code assigns — log, P(X = 1) bits o So the expected number of bits to encode one random X is: —ZP — i) log, P(X = i) 22

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Conditional Entropy Conditional Entropy H(Y | X) of a variable Y conditioned on a random variable X m HY|X)=) P(X=jHY |X=j) H(Y |X=j)=Y PY=i| X =j)log, P(Y =i | X = j 23

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Conditional Entropy HY | X)= Y P(X= )Y P(Y =i | X = j)logy P(Y =i | X = j) =1 i=1 X, | %, | Y T|7 T | F |8 T | T B8 T | F BB F| T B F | F [BF

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Conditional Entropy HY | X)= Y P(X= )Y P(Y =i | X = j)logy P(Y =i | X = j) =1 i=1 Example: X X1 % LY ple: t/1\f T | T Bk T | F e P(X,=t) = 4/6 Yedih wased 1 P(X,=f) = 2/6 Y=F:0 yef. - T | F el F | T PBE F | F

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Conditional Entropy n H(Y | X)= - Y P(X =) P(Y =i| X = j)log, P(Y = | X = j) 7=1 1=1 Example: X, t/\f P(X4=t) = 4/6 Y=t:4 y=t-1 P(X,=f) = 2/6 Y=f:0 vy=f-1 H(Y|X,) =-4/6 (1log,1 + 0 log, 0) - 2/6 (1/2 log, 1/2 + 1/2 log, 1/2) = 2/6 X, | %, | Y T|7 T | F |8 T | T B8 T | F BB F| T B F | F [BF 26

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Information Gain e Decrease in entropy (uncertainty) after splitting IGIX)=H(Y)—-H(Y | X) 27

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Information Gain e Decrease in entropy (uncertainty) after splitting IGIX)=H(Y)—-H(Y | X) In our example: IG(X1)=H(Y) - H(Y | X;) m(m|A|—|[H| M| |7m|d|7n |4 X T (=SS S e < 28

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Information Gain e Decrease In entropy (uncertainty) after splitting IG(X)=H(Y) - H(Y | X) In our example: IG(X,)=H(Y)-H(Y | X1) = 0.65 — 0.33 IG(X) > 0 — prefer the split X, | X, | Y T T T|F T T B T|F [ F| T |8 F | F BB 29

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Today: Decision Trees What is decision tree? How to learn a decision tree from data? What is inductive bias? What exactly is generalization? 30

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Inductive bias in decision tree learning s e The learning algorithm performs /{\ heuristic search through the e e ‘ \ space of all decision trees e |t stops at the smallest, fi% /'?5?\ acceptable tree N e Occam’s Razor: prefer the q . simplest hypothesis that fits the /{ES A ;E data + 31

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Why preference for short hypotheses? e Fewer short hypotheses than long ones o Therefore, a short hypothesis that fits the data is less likely to be a statistical coincidence. e But what is so special about short hypothesis? 32

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Today: Decision Trees What is decision tree? How to learn a decision tree from data? What is inductive bias? What exactly is generalization? 33

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Evaluating the learned hypothesis e Assume that o We've learned a tree h using the top-down induction algorithm o |t fits the training data perfectly e Are we done? o Can we guarantee we have found a good hypothesis? 34

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Examples of Class A Examples of Class B | - | Recall: Background in-focus Training Metrics do not tell us anything about generalization! ) £ - ~ - " ; T~ vy : - X Ty ' . A ST - S anityo) e . B Yo A = . - 3 . b i L 4 ) py y . ‘o n . p B - \ : ‘\' N ’ ) l. R . 4y : : -

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Recall: Formalizing Induction e Given o a loss function [ o samples from some unknown data distribution D e OQOurtask is to compute a function f that has low expected error over D with respect to [ E (2 4)~n[l(y, £ Zvay y, f(z)) 36

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Training error is not sufficient e \We care about generalization to new (unseen) and related examples e A tree can classify training data perfectly, yet classify new examples incorrectly. How? 37

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Training error is not sufficient e \We care about generalization to new (unseen) and related examples e A tree can classify training data perfectly, yet classify new examples incorrectly. How? o Training data is only a sample from the data-generating distribution m A feature might correlate with class by coincidence o Training examples can be noisy m e.g., labeling mistakes 38

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Day { Weather | Temperature { Humidity Wind : : | » | *1 Add|ng d nO|Sy 1 | Sunny Hot | High - Weak training example , couqy Hot High | Weak 3 | Sunny Mild Normal | Strong Mild High Strong Sunny Cloudy Rainy Mild ' ngh Strong Cool Normal Strong Mild .~ High | Weak High Normal Strong Weak — 1 ——— r / N / N Hot High Strong No Yes No Yes | o ! 9 | Cloudy Hot ~ Normal | Weak 10 Rainy Mild High Strong 11 Sunny Mild | High ;Strong;

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Day { Weather | Temperature { Humidity Wind : : | » | *1 Add|ng d nO|Sy 1 | Sunny Hot | High - Weak training example , couqy Hot High | Weak 3 | Sunny Mild Normal | Strong Mild High Strong y ot iy Mild | High Strong Cool Normal Strong Mild .~ High | Weak High Normal Strong Weak — 1 ——— r / N / N Hot High Strong No Yes No Yes | o ! 9 | Cloudy Hot ~ Normal | Weak 10 Rainy Mild High Strong 11 Sunny Mild | High ;Strong;

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Overfitting e Given a hypothesis h and its: o Training Error over training data erroriq;,(h) o True Error over all data errory.,.(h) 41

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Overfitting e Given a hypothesis h and its: o Training Error over training data erroriq;,(h) o True Error over all data errory.,.(h) e \We say h overfits the training data if: errorirein(h) < erroriue(h) e Amount of overfitting: erroriue(h) — erroripqin(h) 42

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

The need for test data e We don’t know errory.,.(h) 43

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

The need for test data e We don’t know errory.,.(h) e Solution: o We set aside a test set m Some examples that will be used for evaluation o We never look at them during training o After learning, we calculate erroriest () 44

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Measuring the effect of overfitting in decision trees 0-9 I I I I I I I I I 0.85 0.8 0.75 Accuracy =) < 0.65 - 06 On training data —— . On test data —--— 035 M 0.5 1 1 1 1 1 1 1 1 1 0 10 20 30 40 50 60 70 80 90 100 Size of tree (number of nodes)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Underfitting vs. Overtfitting e \When underfitting? o Learning algorithms had the opportunity to learn more from training data, but didn't e \When overfitting? o Learning algorithms paid too much attention to learn the noisy part of the training data, resulting in a tree that does not generalize well on test data. e \What we want is: o A decision tree that neither underfits nor overfits o Because it is expected to do best in the future 46

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Recap: Decision trees e \What is decision tree? e How to learn a decision tree from data? o Top-down induction to minimize (training) classification error e \What is inductive bias? o Occam’s Razor: short trees are preferred e \What exactly is generalization? o Overfitting can be an issue 47

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

jared_koo_puzzles.docx

jared_koowebpage1.docx

TestOut LabSim.pdf

Screenshot 2023-10-29 151539.png

Quiz 3-Part 2.pdf

Key Terms Pt 2 7-12 (1).docx

w07_markusboyagao.py

Recommended textbooks for you

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Enhanced Discovering Computers 2017 (Shelly Cashm...

Computer Science

ISBN:9781305657458

Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Enhanced Discovering Computers 2017 (Shelly Cashm...
Computer Science
ISBN:9781305657458
Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
MIS
Computer Science
ISBN:9781337681919
Author:BIDGOLI
Publisher:Cengage

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Enhanced Discovering Computers 2017 (Shelly Cashm...

Computer Science

ISBN:9781305657458

Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

SEE MORE TEXTBOOKS

Browse Popular Homework Q&A

Q: Evaluate. (Be sure to check by differentiating!) 6. dx, x4 5. 一 6+5x

Q: The overall average on a process you are attempting to monitor is 60.0 units. The process population…

Q: Compare the solubility of nickel, silver, and cobalt in copper (host). How do you expect the phase…

Q: What is a brief history of the social media platform Spotify?

Q: Which of the following aqueous solutions would have the lowest vapor pressure? (Hint: Which solute…

Q: Assuming no other files are involved and that Dolores is not using any shared objects, what will…

Q: 3. The price index for the current year is 180. This means that, on average, prices in the current…

Q: other - separated by an erosional surface. What is this erosional surface called, and what…

Q: Why do we require Latitudes and Longitudes? What are the major latitudinal belts? 2. What are Time…

Q: Cook Farm Supply Company manufactures and sells a pesticide called Snare. The following data are…

Q: QUESTION 7 How can a sample be loaded into the melting point capillary tube? O Push the open end of…

Q: A researcher finds the following sample values: 2, 8, 6, 2, 7, 3, 4, 4, 5, 6, 7 At the .03 level,…

Q: spring day, 60 degrees Fahrenheit, is equal to ____ degrees Celsius. b. 20 degrees Celsius is ____…

Q: Aspirin prevents blood from clotting and helps prevent strokes. The Second European Stroke…

Q: 5.1.3. Show that (a₁ + a2 +. + an)² ≤ n (a² + a² + · Hint: Use the CBS inequality with x = α1 XX2 On…

Q: What is one possible conflict or issue that can arise from the commodification of users by social…

Q: Identify the following compounds. aromatic ? anti aromatic? non aromatic? Explain the answers for…

Q: 1. What is the bo a. Adenosine b. Calcium

Q: Given: AC bisects BD and ZCBE LEDA. Prove: ABEC ADEA. Note: quadrilateral properties are not…

Q: A car traveling at 20 mph on a curved exit ramp has a constant velocity. True False

Q: 7. Using a Grignard reaction, show methods how you could prepare each of the following three…

Q: For this Short Run firms costs, explain the SHAPE of each curve: MC, ATC and AVC. Where would the…

Q: sidemaria.com 2 Point M is on AB such that AM-MB - 1:1. If the coordinates of A are (-1.5) and the…

Q: an increase in the CPI from 200 to 225 would indicate an annual rate of measured inflation o -3% b.…

Q: 10 Position (m) CO 8 2 10 20 30 40 50 Time (s) For the following questions, refer to the position…