asst2-soln
pdf
keyboard_arrow_up
School
University of Waterloo *
*We aren’t endorsed by this school
Course
486
Subject
Computer Science
Date
Apr 3, 2024
Type
Pages
11
Uploaded by BarristerUniverse13215
CS 486/686 Assignment 2:
Decision Trees and Variable Elimination (100 marks)
Jesse Hoey & Josh Jung
Due Date: March 5, 2024; Time: 11:59pm Waterloo Time (EST)
Instructions
• Submit written solutions in a file named
writeup.pdf
to the A2 Dropbox on
Learn
(
https://learn.
uwaterloo.ca
).
If your written submission on Learn does not contain
one
file named
writeup.pdf
, we
will deduct 5 marks from your final assignment mark.
• No late assignment will be accepted.
• This assignment is to be done individually. All code you write
must be your own
• The Due Date is March 5, 2024, 11:59pm Waterloo Time (EST)
• Lead TAs:
–
Max Ku (m3ku)
–
Mehdi Bolourian (mbolouri)
The TAs’ office hours will be scheduled on Piazza.
1
CS 486/686
Winter 2024
Assignment 2
Question 1 [60pts]: Text Categorization with Decision Trees
Text categorization is an important task in natural language processing and information retrieval. For instance, news
articles, emails or blogs are often classified by topics. In this assignment, you will implement (in the language of your
choice) a decision tree algorithm to learn a classifier that can assign a discussion board topic to an article.
Train and test your algorithm with the provided datasets in trainData.txt and testData.txt. These datasets contain a
subset of Reddit posts sourced from
https://files.pushshift.io/reddit/
and processed using Google
BigQuery. The dataset includes the first 1500 comments of August 2019 of each of the r/books and r/atheism sub-
reddits, cleaned by removing punctuation and some offensive language, and limiting the words to only those used
more than 3 times among all posts. These 3000 comments are split evenly into training and testing sets (with 1500
documents in each). To simplify your implementation, these posts have been pre-processed and converted to the
bag
of words
model. More precisely, each post is converted to a vector of binary values such that each entry indicates
whether the document contains a specific word or not.
Each line of the files trainData.txt and testData.txt are formatted “docId wordId” which indicates that word wordId
is present in document docId. The files trainLabel.txt and testLabel.txt indicate the label/category (1=
atheism
or
2=
books
) for each document (docId = line#). The file words.txt indicates which word corresponds to each wordId
(denoted by the line#). Your code will need to read this information from the provided files.
Implement a decision tree learning algorithm. Here, each decision node corresponds to a word feature, and the leaf
nodes correspond to a prediction of what subreddit the article belongs in.
You will experiment with two feature
selection mechanisms as follows:
1. average information gain (evenly weighted across the leaves)
I
G
=
I
(
E
)
−
[
1
2
∗
I
(
E
1
) +
1
2
∗
I
(
E
2
)]
2. information gain weighted by the fraction of documents on each side of the split, as we discussed in class
I
G
=
I
(
E
)
−
[
N
1
N
I
(
E
1
) +
N
2
N
I
(
E
2
)]
where
I
(
E
)
is the information content in the
N
examples before the split, and
I
(
E
i
)
is the information content of
the
N
i
examples in the
i
th
leaf after the split. Method (1) does not deal well with splits that put a lot of examples on
one side, and very few on the other. For example, suppose you have evenly distributed data across the target class (so
I
(
E
) = 1
),
N
examples, and
2
features. Suppose the first feature splits one example off from the other
N
−
1
, so
Method (2) gives:
I
(
E
1
) = 0
,
I
(
E
2
)
≈
1
,
I
G
≈
1
−
[
1
N
∗
0 +
N
−
1
N
∗
1] =
1
N
Using Method (2),
I
G
is a small value. On the other hand, Method (1) will make
I
G
look better than it should be, since
it will compute:
I
G
≈
1
−
[
1
2
∗
0 +
1
2
∗
1] =
1
2
.
Use
I
(
{}
) = 1
(the entropy of the empty set is maximal).
As discussed in class, build each decision tree by maintaining a priority queue of leaves, sorted by the maximum value
of the feature (the information gain of the best performing next feature to split on). At each iteration, split the leaf
Page 2 of
11
CS 486/686
Winter 2024
Assignment 2
where the split will give the largest information gain (according to each of the measures above), and do the split on the
word feature that gives that highest gain. Grow the tree one node at a time, up to 100 internal nodes, by training with
the training set only. The point estimate should be the majority class at the leaf (not a probability).
Plot the training and testing accuracy (i.e., percentage of correctly classified articles) after adding each new node.
Generate two graphs (one for each method of feature selection) and plot two curves on each (one for training accuracy
and one for test accuracy). These graphs should have accuracy as a percentage on the y-axis and number of nodes
(1-100) on the x-axis. You may generate them however you wish (e.g. Matplotlib for Python).
What to hand in:
a)
[20 pts]
A printout of your code. Your code must be your own, and you
must
use the iterative priority queue model.
If your code is recursive, or doesn’t expand nodes by order of information gain throughout the tree, you will not
get these 20pts.
SOLUTION
: Here is what I told the students:
The code should be implementing the PQ version of the decision tree. the markers will try to see if this is the
case and award the 20pts if they see this is done. If you make the code, correct, efficient, properly commented
and ”styled”, it will be easier for the markers to see what you are doing. If they cannot figure out what your
code is doing, or if they see you are doing the recursive version, you will not get the 20pts.
b)
[20 pts]
A printout (or hand drawing) showing two decision trees (one tree for each feature selection method) after
adding the first 10 word features. Show the (method-specific) information gain and word feature selected at each
internal node. Show the prediction at each leaf.
SOLUTION
:
There may be different decision trees submitted as there are ties in the data, however, the weighted one should
look a lot more balanced than the unweighted one. To mark this question, just ensure the overall structure of
the trees look ok and the words selected and IG values are roughly right. You can deduct marks if the tree is
hard to visualize (i.e. if they submit some text version)
dashed arrows mean the word is not in the document
solid arrows mean it is
Average decision tree: (DONE IN LOG BASE 2)
Page 3 of
11
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CS 486/686
Winter 2024
Assignment 2
0.4999
christians
0.4999
beliefs
0.4995
atheists
0.49877
brain
0.49825
aa
0.4977
murder
0.4973
proof
0.4969
logic
0.4966
atheism
atheism
atheism
atheism
atheism
atheism
atheism
atheism
atheism
...
atheism
christian
0.5
atheism
Weighted decision tree:
atheism
religion
0.0755
god
0.0294
faux
0.1147
sure
0.0977
spirit
0.0969
soon
0.9183
sent
0.0842
controlling 0.0588
call
0.0695
0.0770
books
bible
0.1154
0.0599
book
atheism
atheism
atheism
atheism
books
atheism
books
books
Each printed or drawn tree should have 10 internal nodes and 11 leaves.
c)
[20 pts]
Two graphs (one for each feature selection method) showing the training and testing accuracy as the
number of nodes increases from 1 to 100.
SOLUTION
: Here is the graph I got:
Page 4 of
11
CS 486/686
Winter 2024
Assignment 2
I also tried with a different tie-breaking method (take the last one not the first one) and got about the same
diagram.
Again, here, just look for the overall structure - they should be getting close to 70%/60% for
weighted and average methods.
Page 5 of
11
CS 486/686
Winter 2024
Assignment 2
Question 2 [40pts]: Bayesian Networks and Variable Elimination
Suppose you are working for a financial institution and you are asked to implement a fraud detection system. You plan
to use the following information:
• When the card holder is travelling abroad, fraudulent transactions are more likely since tourists are prime targets
for thieves. More precisely, 1% of transactions are fraudulent when the card holder is travelling, where as only
0.4% of the transactions are fraudulent when they are not travelling. On average, 5% of all transactions happen
while the card holder is travelling.
If a transaction is fraudulent, then the likelihood of a foreign purchase
increases, unless the card holder happens to be travelling. More precisely, when the card holder is not travelling,
10% of the fraudulent transactions are foreign purchases where as only 1% of the legitimate transactions are
foreign purchases.
On the other hand, when the card holder is travelling, then 90% of the transactions are
foreign purchases regardless of the legitimacy of the transactions.
• Purchases made over the internet are more likely to be fraudulent. This is especially true for card holders who
don’t own any computer. Currently, 60% of the population owns a computer and for those card holders, 1% of
their legitimate transactions are done over the internet, however this percentage increases to 2% for fraudulent
transactions.
For those who don’t own any computer, a mere 0.1% of their legitimate transactions is done
over the internet, but that number increases to 1.1% for fraudulent transactions. Unfortunately, the credit card
company doesn’t know whether a card holder owns a computer, however it can usually guess by verifying
whether any of the recent transactions involve the purchase of computer related accessories. In any given week,
10% of those who own a computer purchase with their credit card at least one computer related item as opposed
to just 0.1% of those who don’t own any computer.
a)
[10 pts]
Construct a Bayes Network that your fraud detection system can use to indentify fraudulent transactions.
What to hand in:
Show the graph defining the network and the Conditional Probability Tables associated with
each node in the graph. This network should encode the information stated above. Your network should contain
exactly six nodes, corresponding to the following binary random variables:
•
OC
– card holder owns a computer.
•
Fraud
– current transaction is fraudulent.
•
Trav
– card holder is currently travelling.
•
FP
– current transaction is a foreign purchase.
•
IP
– current purchase is an internet purchase.
•
CRP
– a computer related purchase was made in the past week.
The arcs defining your Bayes Network should accurately capture the probabilistic dependencies between these
variables.
SOLUTION
: The tables need to contain only a value for variable=true (or false) - as this is how I did it in
class
Page 6 of
11
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CS 486/686
Winter 2024
Assignment 2
b)
[15 pts]
What is the prior probability (i.e., before we search for previous computer related purchases and before
we verify whether it is a foreign and/or an internet purchase) that the current transaction is a fraud?
What is the probability that the current transaction is a fraud once we have verified the following 3 facts: (1)
it is a foreign transaction; (2) it is not an internet purchase; and (3) the card holder purchased computer related
accessories in the past week? (Note: we are describing one query, not three separate queries.)
What to hand in:
Indicate the two queries (i.e.,
Pr
(
variables
|
evidence
)
) you used to compute those probabilities.
Show the factors computed at each step of variable elimination. For each step, show a formula giving the new factor
to be created, e.g.
f
7
(
B
) =
X
A
[
f
2
(
B, A
)
∗
f
3
(
A
)
∗
f
6
(
A, B
)]
B
f
7
(
B
)
t
0.007
f
0.640
And show the resulting factor (you don’t need to show all steps of the calculation). Use the following variable
ordering when summing out variables in variable elimination:
Trav
,
FP
,
Fraud
,
IP
,
OC
,
CRP
. No marks are
awarded for answering the questions using external software.
SOLUTION
:
Query:
Pr
(
fraud
)
Here students can sum over just the relevant variable, Trav.
Pr
(
fraud
) =
Pr
(
fraud
|
trav
)
Pr
(
trav
) +
Pr
(
fraud
|¬
trav
)
Pr
(
¬
trav
)
= 0
.
01
∗
0
.
05 + 0
.
004
∗
0
.
95
= 0
.
0043
With variable elimination, we have:
Page 7 of
11
CS 486/686
Winter 2024
Assignment 2
Trav
f
1
(
Trav
) =
Pr
(
fraud
|
Trav
)
t
0.01
f
0.004
Trav
f
2
(
Trav
) =
Pr
(
Trav
)
t
0.05
f
0.95
Pr
(
fraud
) =
sumout
T rav
[
f
1(
Trav
)
f
2(
Trav
)]
= 0
.
01
∗
0
.
05 + 0
.
004
∗
0
.
95
= 0
.
0043
Query:
Pr
(
fraud
|
fp,
¬
ip, crp
)
OC
f
1
(
OC
) =
Pr
(
OC
)
t
0.6
f
0.4
Fraud
Trav
f
2
(
Fraud, Trav
) =
Pr
(
Fraud
|
Trav
)
t
t
0.01
t
f
0.004
f
t
0.99
f
f
0.996
Trav
f
3
(
Trav
) =
Pr
(
Trav
)
t
0.05
f
0.95
OC
f
4
(
OC
) =
Pr
(
crp
|
OC
)
t
0.1
f
0.001
OC
Fraud
f
5
(
OC, Fraud
) =
Pr
(
¬
ip
|
OC, Fraud
)
t
t
0.98
t
f
0.99
f
t
0.989
f
f
0.999
Trav
Fraud
f
6
(
Trav, Fraud
) =
Pr
(
fp
|
Trav, Fraud
)
t
t
0.9
t
f
0.9
f
t
0.1
f
f
0.01
Page 8 of
11
CS 486/686
Winter 2024
Assignment 2
We have:
Pr
(
fraud
|
fp,
¬
ip, crp
)
∝
sumout
OC,T rav
[
f
1
(
OC
)
∗
f
2
(
Fraud, Trav
)
∗
f
3
(
Trav
)
∗
f
4
(
OC
)
∗
f
5
(
OC, Fraud
)
∗
f
6
(
Trav, Fraud
)]
=
sumout
OC
[
f
1
(
OC
)
∗
f
4
(
OC
)
∗
f
5
(
OC, Fraud
)
∗
sumout
T rav
[
f
2
(
Fraud, Trav
)
∗
f
3
(
Trav
)
∗
f
6
(
Trav, Fraud
)]]
=
sumout
OC
[
f
1
(
OC
)
∗
f
4
(
OC
)
∗
f
5
(
OC, Fraud
)
∗
f
7
(
Fraud
)]
=
f
7
(
Fraud
)
∗
sumout
OC
[
f
1
(
OC
)
∗
f
4
(
OC
)
∗
f
5
(
OC, Fraud
)]
=
f
7
(
Fraud
)
∗
f
8
(
Fraud
)
Eliminate variable
Trav
:
f
7
(
Fraud
) =
sumout
T rav
[
f
2
(
Fraud, Trav
)
∗
f
3
(
trav
)
∗
f
6
(
Trav, Fraud
)]
Fraud
f
7
(
Fraud
)
t
0.0008
f
0.0540
Eliminate variable
OC
:
f
8
(
Fraud
) =
sumout
OC
[
f
1
(
OC
)
∗
f
4
(
OC
)
∗
f
5
(
OC, Fraud
)]
Fraud
f
8
(
Fraud
)
t
0.0592
f
0.0598
Pr
(
fraud
|
fp,
¬
ip, crp
) =
k
∗
f
7
(
fraud
)
∗
f
8
(
fraud
) = 0
.
0150
where
k
is a normalizing constant:
k
=
1
f
7
(
fraud
)
∗
f
8
(
fraud
) +
f
7
(
¬
fraud
)
∗
f
8
(
¬
fraud
)
c)
[10 pts]
After computing those probabilities, the fraud detection system raises a flag and recommends that the card
holder be called to confirm the transaction. An agent calls at the domicile of the card holder but she is not home.
Her spouse confirms that she is currently out of town on a business trip. How does the probability of fraud change
based on this new piece of information? Use all evidence you have from Question 2b and this new evidence.
What to hand in:
Same as for Question 2b.
SOLUTION
: The restriction operation can be done first - so they can define factors after restricting.
Page 9 of
11
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CS 486/686
Winter 2024
Assignment 2
OC
f
1
(
OC
) =
Pr
(
OC
)
t
0.6
f
0.4
Fraud
f
2
(
Fraud
) =
Pr
(
Fraud
|
trav
)
t
0.01
f
0.99
Trav
f
3
() =
Pr
(
trav
)
0.05
OC
f
4
(
OC
) =
Pr
(
crp
|
OC
)
t
0.1
f
0.001
OC
Fraud
f
5
(
OC, Fraud
) =
Pr
(
¬
ip
|
OC, Fraud
)
t
t
0.98
t
f
0.99
f
t
0.989
f
f
0.999
Fraud
f
6
(
Fraud
) =
Pr
(
fp
|
trav, Fraud
)
t
0.9
f
0.9
Eliminate variable
OC
:
f
7
(
Fraud
) =
sumout
OC
[
f
1
(
OC
)
∗
f
4
(
OC
)
∗
f
5
(
OC, Fraud
)]
Fraud
f
7
(
Fraud
)
t
0.0592
f
0.0598
Pr
(
fraud
|
fp,
¬
ip, crp, trav
) =
k
∗
f
2
(
fraud
)
∗
f
3
()
∗
f
6
(
fraud
)
∗
f
7
(
fraud
) = 0
.
0099
where
k
is a normalizing constant:
k
=
1
X
F raud
f
2
(
Fraud
)
∗
f
3
()
∗
f
6
(
Fraud
)
∗
f
7
(
Fraud
)
d)
[5 pts]
Suppose you are not a very honest employee and you just stole a credit card. You know that the fraud
detection system uses the Bayes net designed earlier but you still want to make an important purchase over the
internet. What can you do prior to your internet purchase to reduce the risk that the transaction will be rejected as
a possible fraud? This question is independent of question 2b and 2c.
What to hand in:
State the action taken and indicate by how much the probability of a fraud gets reduced. Follow
the same instructions as for Question 2b.
SOLUTION
:
When an internet purchase is made, the fraud detection system is likely to believe that the
transaction is fraudulent unless it has reasons to believe that the card holder owns a computer. Therefore, an
ingenious thief could simply make a computer related purchase first to fool the fraud detection system into
believing the card holder owns a computer. After that, the thief can make the intended internet purchase with
Page 10 of
11
CS 486/686
Winter 2024
Assignment 2
a reduced risk of being rejected.
One can verify that the probability of a fraudulent transaction decreases when a computer related purchase is
observed since
Pr
(
fraud
|
ip
) = 0
.
0109
whereas
Pr
(
fraud
|
ip, crp
) = 0
.
0086
.
Page 11 of
11
Recommended textbooks for you
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,

Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781305480537
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Recommended textbooks for you
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageOperations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks ColeMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,
- Programming with Microsoft Visual Basic 2017Computer ScienceISBN:9781337102124Author:Diane ZakPublisher:Cengage LearningEBK JAVA PROGRAMMINGComputer ScienceISBN:9781305480537Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENTCOMPREHENSIVE MICROSOFT OFFICE 365 EXCEComputer ScienceISBN:9780357392676Author:FREUND, StevenPublisher:CENGAGE L
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,

Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781305480537
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L