Concept explainers
The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips! (Andy Greenberg, “The Most Common Words In Spam Email,” Forbes website, March 17, 2010). Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes’ theorem. Suppose that for one email account, 1 in every 10 messages is spam and the proportions of spam messages that have the five most common words in spam email are given below.
shipping! | .051 |
today! | .045 |
here! | .034 |
available | .014 |
fingertips! | .014 |
Also suppose that the proportions of ham messages that have these words are
shipping! | .0015 |
today! | .0022 |
here! | .0022 |
available | .0041 |
fingertips! | .0011 |
- a. If a message includes the word shipping!, what is the
probability the message is spam? If a message includes the word shipping!, what is the probability the message is ham? Should messages that include the word shipping! be flagged as spam? - b. If a message includes the word today!, what is the probability the message is spam? If a message includes the word here!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- c. If a message includes the word available, what is the probability the message is spam? If a message includes the word fingertips!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
- d. What insights do the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively?
a.
![Check Mark](/static/check-mark.png)
Compute the probability that the message is spam, given that messages included the word “shipping”. Also, compute the probability that the message is ham, given that messages included the word “shipping”. Check whether messages that contain the word “shipping” be flagged as spam.
Answer to Problem 60SE
The probability that the message is spam, given that messages included the word “shipping”, is 0.791.
The probability that the message is ham, given that messages included the word “shipping”, is 0.209.
The message that contains the word “shipping” should be flagged as spam.
Explanation of Solution
Calculation:
The given data contain a proportion of spam messages and ham messages.
Bayes Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that message included the word “shipping”, is 0.791.
Thus, the probability that the message is ham, given that messages included the word “shipping”, is 0.209.
Here, the probability that the message is spam, given that messages included the word “shipping”, is high compared to that of the message being ham. Therefore, the message that contains the word “shipping” should be flagged as spam.
b.
![Check Mark](/static/check-mark.png)
Compute the probability that the message is spam, given that messages included the word “Today”. Also, compute the probability that the message is spam, given that the message included the word “here”. Identify which among the words is a stronger indicator of spam. Explain the answer.
Answer to Problem 60SE
The probability that the message is spam, given that messages included the word “Today”, is 0.694.
The probability that the message is spam, given that messages included the word “here”, is 0.632.
Explanation of Solution
Calculation:
Bayes’ Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that messages included the word “Today”, is 0.694.
Thus, the probability that the message is spam, given that the message included the word “here”, is 0.632.
Here, the message is spam, given that messages included the word “Today” are high compared to those containing the word “here”. Therefore, the word “Today” is an indication of a stronger indicator of spam.
c.
![Check Mark](/static/check-mark.png)
Compute the probability that the message is spam, given that messages included the word “available”. Also, compute the probability that the message is spam, given that messages included the word “fingertips”. Compute the probability that the message is “spam”. Identify which of the word is a stronger indicator of spam. Explain the answer.
Answer to Problem 60SE
The probability that the message is spam, given that message included the word “available”, is 0.275.
The probability that the message is spam, given that message included the word “fingertips”, is 0.586.
Explanation of Solution
Calculation:
Bayes’ Theorem (Two-Event Case):
Here,
From data,
Substitute these values in Bayes’ theorem.
Therefore,
Thus, the probability that the message is spam, given that message included the word “available”, is 0.275.
Thus, the probability that the message is spam, given that message included the word “fingertips”, is 0.586.
Here, the message is spam, given that message included the word “fingertips” is high compared to that containing the word “available”. Therefore, the word “fingertips” is an indication of a stronger indicator of spam.
d.
![Check Mark](/static/check-mark.png)
Explain what insight does the result of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively.
Explanation of Solution
From part (b), it is clear that it is easier to distinguish spam from ham in a message that includes the word “today”.
From part (c), it is clear that it is more difficult to distinguish spam from ham in a message that includes the word “available”.
Therefore, it is easier to distinguish spam from ham when the word occurs more often in unwanted messages or less often in legitimate messages.
Want to see more full solutions like this?
Chapter 4 Solutions
Modern Business Statistics with Microsoft Office Excel (with XLSTAT Education Edition Printed Access Card)
- I need help with this problem and an explanation of the solution for the image described below. (Statistics: Engineering Probabilities)arrow_forward310015 K Question 9, 5.2.28-T Part 1 of 4 HW Score: 85.96%, 49 of 57 points Points: 1 Save of 6 Based on a poll, among adults who regret getting tattoos, 28% say that they were too young when they got their tattoos. Assume that six adults who regret getting tattoos are randomly selected, and find the indicated probability. Complete parts (a) through (d) below. a. Find the probability that none of the selected adults say that they were too young to get tattoos. 0.0520 (Round to four decimal places as needed.) Clear all Final check Feb 7 12:47 US Oarrow_forwardhow could the bar graph have been organized differently to make it easier to compare opinion changes within political partiesarrow_forward
- 30. An individual who has automobile insurance from a certain company is randomly selected. Let Y be the num- ber of moving violations for which the individual was cited during the last 3 years. The pmf of Y isy | 1 2 4 8 16p(y) | .05 .10 .35 .40 .10 a.Compute E(Y).b. Suppose an individual with Y violations incurs a surcharge of $100Y^2. Calculate the expected amount of the surcharge.arrow_forward24. An insurance company offers its policyholders a num- ber of different premium payment options. For a ran- domly selected policyholder, let X = the number of months between successive payments. The cdf of X is as follows: F(x)=0.00 : x < 10.30 : 1≤x<30.40 : 3≤ x < 40.45 : 4≤ x <60.60 : 6≤ x < 121.00 : 12≤ x a. What is the pmf of X?b. Using just the cdf, compute P(3≤ X ≤6) and P(4≤ X).arrow_forward59. At a certain gas station, 40% of the customers use regular gas (A1), 35% use plus gas (A2), and 25% use premium (A3). Of those customers using regular gas, only 30% fill their tanks (event B). Of those customers using plus, 60% fill their tanks, whereas of those using premium, 50% fill their tanks.a. What is the probability that the next customer will request plus gas and fill the tank (A2 B)?b. What is the probability that the next customer fills the tank?c. If the next customer fills the tank, what is the probability that regular gas is requested? Plus? Premium?arrow_forward
- 38. Possible values of X, the number of components in a system submitted for repair that must be replaced, are 1, 2, 3, and 4 with corresponding probabilities .15, .35, .35, and .15, respectively. a. Calculate E(X) and then E(5 - X).b. Would the repair facility be better off charging a flat fee of $75 or else the amount $[150/(5 - X)]? [Note: It is not generally true that E(c/Y) = c/E(Y).]arrow_forward74. The proportions of blood phenotypes in the U.S. popula- tion are as follows:A B AB O .40 .11 .04 .45 Assuming that the phenotypes of two randomly selected individuals are independent of one another, what is the probability that both phenotypes are O? What is the probability that the phenotypes of two randomly selected individuals match?arrow_forward53. A certain shop repairs both audio and video compo- nents. Let A denote the event that the next component brought in for repair is an audio component, and let B be the event that the next component is a compact disc player (so the event B is contained in A). Suppose that P(A) = .6 and P(B) = .05. What is P(BA)?arrow_forward
- Elementary Geometry For College Students, 7eGeometryISBN:9781337614085Author:Alexander, Daniel C.; Koeberlein, Geralyn M.Publisher:Cengage,Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALAlgebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningTrigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337614085/9781337614085_smallCoverImage.jpg)
![Text book image](https://www.bartleby.com/isbn_cover_images/9781938168383/9781938168383_smallCoverImage.gif)
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
![Text book image](https://www.bartleby.com/isbn_cover_images/9781285195780/9781285195780_smallCoverImage.gif)
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305652224/9781305652224_smallCoverImage.gif)