(a) Use Naïve Bayes Multinomial Model and Naïve Bayes Bernoulli Model to respectively calculate how Doc 8 and Doc 9 given above will be classified. Please use add-one smoothing to process the conditional probabilities in the calculation T1 T2 T3 T4 T5 T6 T7 T8 doc8 3 1 0 4 1 0 2 1 doc9 0 0 3 0 1 5 0 1 (b) Redo the classification, use the K-Nearest-Neighbor approach for document categorization with K = 3 to classify the following two new documents. Show calculation details. Note: no need to normalized the vectors, use raw tf*idf for the weight of each term and use cosine similarity for computing similarities. (c) Redo the classification, use the Rocchio-Based vector space model to determine how Doc 8 and Doc 9 will be classified. As (b), use non-normali zed vectors, and raw tf*idf for the weights of each term and cosine similarity.
Text categorization: given the following document-term matrix: (the value in the
matrix represents the frequency of a specific term in that document)
T1 T2 T3 T4 T5 T6 T7 T8
doc1 2 0 4 3 0 1 0 2
doc2 0 2 4 0 2 3 0 0
doc3 4 0 1 3 0 1 0 1
doc4 0 1 0 2 0 0 1 0
doc5 0 0 2 0 0 4 0 0
doc6 1 1 0 2 0 1 1 3
doc7 2 1 3 4 0 2 0 2
Assume that documents have been manually assigned to two pre-specified categories
as follows: Class_1 = {Doc1, Doc2, Doc5}, Class_2 = {Doc3, Doc4, Doc6, Doc7}
(a) Use Naïve Bayes Multinomial Model and Naïve Bayes Bernoulli Model to
respectively calculate how Doc 8 and Doc 9 given above will be classified. Please use
add-one smoothing to process the conditional probabilities in the calculation
T1 T2 T3 T4 T5 T6 T7 T8
doc8 3 1 0 4 1 0 2 1
doc9 0 0 3 0 1 5 0 1
(b) Redo the classification, use the K-Nearest-Neighbor approach for document
categorization with K = 3 to classify the following two new documents. Show
calculation details. Note: no need to normalized the
weight of each term and use cosine similarity for computing similarities.
(c) Redo the classification, use the Rocchio-Based vector space model to
determine how Doc 8 and Doc 9 will be classified. As (b), use non-normali zed vectors,
and raw tf*idf for the weights of each term and cosine similarity.
Trending now
This is a popular solution!
Step by step
Solved in 2 steps