Concept explainers
The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips! (Andy Greenberg, “The Most Common Words In Spam Email,” Forbes website, March 17, 2010). Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes’ theorem. Suppose that for one email account, 1 in every 10 messages is spam and the proportions of spam messages that have the five most common words in spam email are given below.
shipping! | .051 |
today! | .045 |
here! | .034 |
available | .014 |
fingertips! | .014 |
Also suppose that the proportions of ham messages that have these words are
shipping! | .0015 |
today! | .0022 |
here! | .0022 |
available | .0041 |
fingertips! | .0011 |
- a. If a message includes the word shipping!, what is the
probability the message is spam? If a message includes the word shipping!, what is the probability the message is ham? Should messages that include the word shipping! be flagged as spam? - b. If a message includes the word today!, what is the probability the message is spam? If a message includes the word here!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- c. If a message includes the word available, what is the probability the message is spam? If a message includes the word fingertips!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- d. What insights do the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively?
a.
Compute the probability that the message is spam, given that messages included the word “shipping”. Also, compute the probability that the message is ham, given that messages included the word “shipping”. Check whether messages that contain the word “shipping” be flagged as spam.
Answer to Problem 60SE
The probability that the message is spam, given that messages included the word “shipping”, is 0.791.
The probability that the message is ham, given that messages included the word “shipping”, is 0.209.
The message that contains the word “shipping” should be flagged as spam.
Explanation of Solution
Calculation:
The given data contain a proportion of spam messages and ham messages.
Bayes Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that message included the word “shipping”, is 0.791.
Thus, the probability that the message is ham, given that messages included the word “shipping”, is 0.209.
Here, the probability that the message is spam, given that messages included the word “shipping”, is high compared to that of the message being ham. Therefore, the message that contains the word “shipping” should be flagged as spam.
b.
Compute the probability that the message is spam, given that messages included the word “Today”. Also, compute the probability that the message is spam, given that the message included the word “here”. Identify which among the words is a stronger indicator of spam. Explain the answer.
Answer to Problem 60SE
The probability that the message is spam, given that messages included the word “Today”, is 0.694.
The probability that the message is spam, given that messages included the word “here”, is 0.632.
Explanation of Solution
Calculation:
Bayes’ Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that messages included the word “Today”, is 0.694.
Thus, the probability that the message is spam, given that the message included the word “here”, is 0.632.
Here, the message is spam, given that messages included the word “Today” are high compared to those containing the word “here”. Therefore, the word “Today” is an indication of a stronger indicator of spam.
c.
Compute the probability that the message is spam, given that messages included the word “available”. Also, compute the probability that the message is spam, given that messages included the word “fingertips”. Compute the probability that the message is “spam”. Identify which of the word is a stronger indicator of spam. Explain the answer.
Answer to Problem 60SE
The probability that the message is spam, given that message included the word “available”, is 0.275.
The probability that the message is spam, given that message included the word “fingertips”, is 0.586.
Explanation of Solution
Calculation:
Bayes’ Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that message included the word “available”, is 0.275.
Thus, the probability that the message is spam, given that message included the word “fingertips”, is 0.586.
Here, the message is spam, given that message included the word “fingertips” is high compared to that containing the word “available”. Therefore, the word “fingertips” is an indication of a stronger indicator of spam.
d.
Explain what insight does the result of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively.
Explanation of Solution
From part (b), it is clear that it is easier to distinguish spam from ham in a message that includes the word “today”.
From part (c), it is clear that it is more difficult to distinguish spam from ham in a message that includes the word “available”.
Therefore, it is easier to distinguish spam from ham when the word occurs more often in unwanted messages or less often in legitimate messages.
Want to see more full solutions like this?
Chapter 4 Solutions
Modern Business Statistics with Microsoft Office Excel (with XLSTAT Education Edition Printed Access Card)
- What relationship subset, intersect, disjoint, or equivalent can be used to characterize the two shown in the Venn Diagram?arrow_forwardThe number of 5-element subsets from a set containing n elements is equal to the number of 6-element subsets from the same set. What is the value of n? (Hint: the order in which the elements for the subsets are chosen is not important.)arrow_forward
- Trigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALElementary Geometry For College Students, 7eGeometryISBN:9781337614085Author:Alexander, Daniel C.; Koeberlein, Geralyn M.Publisher:Cengage,Algebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage Learning