Concept explainers
The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips! (Andy Greenberg, “The Most Common Words In Spam Email,” Forbes website, March 17, 2010). Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes’ theorem. Suppose that for one email account, 1 in every 10 messages is spam and the proportions of spam messages that have the five most common words in spam email are given below.
shipping! | .051 |
today! | .045 |
here! | .034 |
available | .014 |
fingertips! | .014 |
Also suppose that the proportions of ham messages that have these words are
shipping! | .0015 |
today! | .0022 |
here! | .0022 |
available | .0041 |
fingertips! | .0011 |
- a. If a message includes the word shipping!, what is the
probability the message is spam? If a message includes the word shipping!, what is the probability the message is ham? Should messages that include the word shipping! be flagged as spam? - b. If a message includes the word today!, what is the probability the message is spam? If a message includes the word here!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- c. If a message includes the word available, what is the probability the message is spam? If a message includes the word fingertips!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- d. What insights do the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively?
a.
Compute the probability that the message is spam, given that messages included the word “shipping”. Also, compute the probability that the message is ham, given that messages included the word “shipping”. Check whether messages that contain the word “shipping” be flagged as spam.
Answer to Problem 60SE
The probability that the message is spam, given that messages included the word “shipping”, is 0.791.
The probability that the message is ham, given that messages included the word “shipping”, is 0.209.
The message that contains the word “shipping” should be flagged as spam.
Explanation of Solution
Calculation:
The given data contain a proportion of spam messages and ham messages.
Bayes Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that message included the word “shipping”, is 0.791.
Thus, the probability that the message is ham, given that messages included the word “shipping”, is 0.209.
Here, the probability that the message is spam, given that messages included the word “shipping”, is high compared to that of the message being ham. Therefore, the message that contains the word “shipping” should be flagged as spam.
b.
Compute the probability that the message is spam, given that messages included the word “Today”. Also, compute the probability that the message is spam, given that the message included the word “here”. Identify which among the words is a stronger indicator of spam. Explain the answer.
Answer to Problem 60SE
The probability that the message is spam, given that messages included the word “Today”, is 0.694.
The probability that the message is spam, given that messages included the word “here”, is 0.632.
Explanation of Solution
Calculation:
Bayes’ Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that messages included the word “Today”, is 0.694.
Thus, the probability that the message is spam, given that the message included the word “here”, is 0.632.
Here, the message is spam, given that messages included the word “Today” are high compared to those containing the word “here”. Therefore, the word “Today” is an indication of a stronger indicator of spam.
c.
Compute the probability that the message is spam, given that messages included the word “available”. Also, compute the probability that the message is spam, given that messages included the word “fingertips”. Compute the probability that the message is “spam”. Identify which of the word is a stronger indicator of spam. Explain the answer.
Answer to Problem 60SE
The probability that the message is spam, given that message included the word “available”, is 0.275.
The probability that the message is spam, given that message included the word “fingertips”, is 0.586.
Explanation of Solution
Calculation:
Bayes’ Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that message included the word “available”, is 0.275.
Thus, the probability that the message is spam, given that message included the word “fingertips”, is 0.586.
Here, the message is spam, given that message included the word “fingertips” is high compared to that containing the word “available”. Therefore, the word “fingertips” is an indication of a stronger indicator of spam.
d.
Explain what insight does the result of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively.
Explanation of Solution
From part (b), it is clear that it is easier to distinguish spam from ham in a message that includes the word “today”.
From part (c), it is clear that it is more difficult to distinguish spam from ham in a message that includes the word “available”.
Therefore, it is easier to distinguish spam from ham when the word occurs more often in unwanted messages or less often in legitimate messages.
Want to see more full solutions like this?
Chapter 4 Solutions
Modern Business Statistics with Microsoft Office Excel (with XLSTAT Education Edition Printed Access Card) (MindTap Course List)
- solve the question based on hw 1, 1.41arrow_forwardT1.4: Let ẞ(G) be the minimum size of a vertex cover, a(G) be the maximum size of an independent set and m(G) = |E(G)|. (i) Prove that if G is triangle free (no induced K3) then m(G) ≤ a(G)B(G). Hints - The neighborhood of a vertex in a triangle free graph must be independent; all edges have at least one end in a vertex cover. (ii) Show that all graphs of order n ≥ 3 and size m> [n2/4] contain a triangle. Hints - you may need to use either elementary calculus or the arithmetic-geometric mean inequality.arrow_forwardWe consider the one-period model studied in class as an example. Namely, we assumethat the current stock price is S0 = 10. At time T, the stock has either moved up toSt = 12 (with probability p = 0.6) or down towards St = 8 (with probability 1−p = 0.4).We consider a call option on this stock with maturity T and strike price K = 10. Theinterest rate on the money market is zero.As in class, we assume that you, as a customer, are willing to buy the call option on100 shares of stock for $120. The investor, who sold you the option, can adopt one of thefollowing strategies: Strategy 1: (seen in class) Buy 50 shares of stock and borrow $380. Strategy 2: Buy 55 shares of stock and borrow $430. Strategy 3: Buy 60 shares of stock and borrow $480. Strategy 4: Buy 40 shares of stock and borrow $280.(a) For each of strategies 2-4, describe the value of the investor’s portfolio at time 0,and at time T for each possible movement of the stock.(b) For each of strategies 2-4, does the investor have…arrow_forward
- Negate the following compound statement using De Morgans's laws.arrow_forwardNegate the following compound statement using De Morgans's laws.arrow_forwardQuestion 6: Negate the following compound statements, using De Morgan's laws. A) If Alberta was under water entirely then there should be no fossil of mammals.arrow_forward
- Negate the following compound statement using De Morgans's laws.arrow_forwardCharacterize (with proof) all connected graphs that contain no even cycles in terms oftheir blocks.arrow_forwardLet G be a connected graph that does not have P4 or C3 as an induced subgraph (i.e.,G is P4, C3 free). Prove that G is a complete bipartite grapharrow_forward
- Elementary Geometry For College Students, 7eGeometryISBN:9781337614085Author:Alexander, Daniel C.; Koeberlein, Geralyn M.Publisher:Cengage,Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALAlgebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningTrigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage Learning