Consider a text corpus consisting of N tokens of d distinct words and the number of times each distinct word w appears is given by . We want to apply a version of Laplace smoothing that estimates a word's probability as: xw+a/ N+ad for some constant a (Laplace recommended a = 1, but other values are possible.) In the following problems, assume N is 100,000, d is 10,000 and a is 2. A. Give both the unsmoothed maximum likelihood probability estimate and the Laplace smoothed estimate of a word that appears 1,000 times in the corpus. B. Do the same for a word that does not appear at all. C. You are running a Naive Bayes text classifier with Laplace Smoothing, and you suspect that you are overfitting the data. How would you increase or decrease the parameter a? D. Could increasing a increase or decrease training set error? Increase or decrease validation set error?

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
100%

Consider a text corpus consisting of N tokens of d distinct words and the number of times each distinct word w appears is given by . We want to apply a version of Laplace smoothing that estimates a word's probability as: xw+a/ N+ad for some constant a (Laplace recommended a = 1, but other values are possible.) In the following problems, assume N is 100,000, d is 10,000 and a is 2.

A. Give both the unsmoothed maximum likelihood probability estimate and the Laplace smoothed estimate of a word that appears 1,000 times in the corpus.

B. Do the same for a word that does not appear at all.

C. You are running a Naive Bayes text classifier with Laplace Smoothing, and you suspect that you are overfitting the data. How would you increase or decrease the parameter a?

D. Could increasing a increase or decrease training set error? Increase or decrease validation set error?

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Follow-up Questions
Read through expert solutions to related follow-up questions below.
Follow-up Question

I actually forgot to type "xw" after "given by" in my question. Consider a text corpus consisting of N tokens of d distinct words and the number of times each distinct word w appears is given by xw.

Solution
Bartleby Expert
SEE SOLUTION
Knowledge Booster
Bellman operator
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education