Modern Business Statistics with Microsoft Office Excel (with XLSTAT Education Edition Printed Access Card)
Modern Business Statistics with Microsoft Office Excel (with XLSTAT Education Edition Printed Access Card)
6th Edition
ISBN: 9780357191484
Author: David R. Anderson; Dennis J. Sweeney; Thomas A. Williams
Publisher: Cengage Learning US
bartleby

Concept explainers

bartleby

Videos

Textbook Question
Book Icon
Chapter 4, Problem 60SE

The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips! (Andy Greenberg, “The Most Common Words In Spam Email,” Forbes website, March 17, 2010). Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes’ theorem. Suppose that for one email account, 1 in every 10 messages is spam and the proportions of spam messages that have the five most common words in spam email are given below.

shipping! .051
today! .045
here! .034
available .014
fingertips! .014

Also suppose that the proportions of ham messages that have these words are

shipping! .0015
today! .0022
here! .0022
available .0041
fingertips! .0011
  1. a. If a message includes the word shipping!, what is the probability the message is spam? If a message includes the word shipping!, what is the probability the message is ham? Should messages that include the word shipping! be flagged as spam?
  2. b. If a message includes the word today!, what is the probability the message is spam? If a message includes the word here!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
  3. c. If a message includes the word available, what is the probability the message is spam? If a message includes the word fingertips!, what is the probability the message is spam? Which of these two words is a stronger indicator that a message is spam? Why?
  4. d. What insights do the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively?

a.

Expert Solution
Check Mark
To determine

Compute the probability that the message is spam, given that messages included the word “shipping”. Also, compute the probability that the message is ham, given that messages included the word “shipping”. Check whether messages that contain the word “shipping” be flagged as spam.

Answer to Problem 60SE

The probability that the message is spam, given that messages included the word “shipping”, is 0.791.

 The probability that the message is ham, given that messages included the word “shipping”, is 0.209.

The message that contains the word “shipping” should be flagged as spam.

Explanation of Solution

Calculation:

The given data contain a proportion of spam messages and ham messages.

Bayes Theorem (Two-Event Case):

P(F|D)=P(F)×P(D|F)P(F)×P(D|F)+P(M)×P(D|M)

Here,

P(spam|shipping)=P(spam)×P(shipping!|spam)P(spam)×P(shipping!|spam)+P(ham)×P(shipping!|ham)

P(ham|shipping)=P(ham)×P(shipping!|ham)P(spam)×P(shipping!|spam)+P(ham)×P(shipping!|ham)

From data,

P(spam)=0.10,P(shipping!|spam)=0.051,P(ham)=0.90,andP(shipping!|ham)=0.0015 

Substitute these values in Bayes’ theorem.

Therefore,

P(spam|shipping)=(0.10)×(0.051)(0.10)×(0.051)+(0.90)×(0.0015)=0.00510.0051+0.00135=0.00510.00645=0.791

Thus, the probability that the message is spam, given that message included the word “shipping”, is 0.791.

P(ham|shipping)=(0.90)×(0.0015)(0.10)×(0.051)+(0.90)×(0.0015)=0.001350.0051+0.00135=0.001350.00645=0.209

Thus, the probability that the message is ham, given that messages included the word “shipping”, is 0.209.

Here, the probability that the message is spam, given that messages included the word “shipping”, is high compared to that of the message being ham. Therefore, the message that contains the word “shipping” should be flagged as spam.

b.

Expert Solution
Check Mark
To determine

Compute the probability that the message is spam, given that messages included the word “Today”. Also, compute the probability that the message is spam, given that the message included the word “here”. Identify which among the words is a stronger indicator of spam. Explain the answer.

Answer to Problem 60SE

The probability that the message is spam, given that messages included the word “Today”, is 0.694.

The probability that the message is spam, given that messages included the word “here”, is 0.632.

Explanation of Solution

Calculation:

Bayes’ Theorem (Two-Event Case):

P(F|D)=P(F)×P(D|F)P(F)×P(D|F)+P(M)×P(D|M)

Here,

P(spam|today!)=P(spam)×P(today!|spam)P(spam)×P(today!|spam)+P(ham)×P(today!|ham)

P(spam|here!)=P(spam)×P(here!|spam)P(spam)×P(here!|spam)+P(ham)×P(here!|ham)

From data,

P(spam)=0.10,P(here!|spam)=0.034,P(ham)=0.90,P(today!|spam)=0.045andP(here!|ham)=0.0022,P(today!|ham)=0.0022.

Substitute these values in Bayes’ theorem.

Therefore,

P(spam|here!)=(0.10)×(0.045)(0.10)×(0.045)+(0.90)×(0.0022)=0.00450.0045+0.00198=0.00450.00648=0.694

Thus, the probability that the message is spam, given that messages included the word “Today”, is 0.694.

P(spam|here!)=(0.10)×(0.034)(0.10)×(0.034)+(0.90)×(0.0022)=0.00340.0034+0.00198=0.00340.00538=0.632

Thus, the probability that the message is spam, given that the message included the word “here”, is 0.632.

Here, the message is spam, given that messages included the word “Today” are high compared to those containing the word “here”. Therefore, the word “Today” is an indication of a stronger indicator of spam.

c.

Expert Solution
Check Mark
To determine

Compute the probability that the message is spam, given that messages included the word “available”. Also, compute the probability that the message is spam, given that messages included the word “fingertips”. Compute the probability that the message is “spam”. Identify which of the word is a stronger indicator of spam. Explain the answer.

Answer to Problem 60SE

The probability that the message is spam, given that message included the word “available”, is 0.275.

The probability that the message is spam, given that message included the word “fingertips”, is 0.586.

Explanation of Solution

Calculation:

Bayes’ Theorem (Two-Event Case):

P(F|D)=P(F)×P(D|F)P(F)×P(D|F)+P(M)×P(D|M)

Here,

P(spam|fingertips!)=P(spam)×P((spam|fingertips!)!|spam)P(spam)×P(ingertips!|spam)+P(ham)×P(ingertips!|ham)

P(spam|available!)=P(spam)×P(available!|spam)P(spam)×P(available!|spam)+P(ham)×P(available!|ham)

From data,

P(spam)=0.10,P(available!|spam)=0.014,P(ham)=0.90,P(fingertip!|spam)=0.014andP(available!|ham)=0.0041,P(fingertip!|ham)=0.0011

Substitute these values in Bayes’ theorem.

Therefore,

P(spam|available!)=(0.10)×(0.014)(0.10)×(0.014)+(0.90)×(0.0041)=0.00140.0014+0.00369=0.00140.00509=0.275

Thus, the probability that the message is spam, given that message included the word “available”, is 0.275.

P(spam|fingertips!)=(0.10)×(0.034)(0.10)×(0.034)+(0.90)×(0.0041)=0.00340.0034+0.00369=0.00340.00709=0.586

Thus, the probability that the message is spam, given that message included the word “fingertips”, is 0.586.

Here, the message is spam, given that message included the word “fingertips” is high compared to that containing the word “available”. Therefore, the word “fingertips” is an indication of a stronger indicator of spam.

d.

Expert Solution
Check Mark
To determine

Explain what insight does the result of parts (b) and (c) yield about what enables a spam filter that uses Bayes’ theorem to work effectively.

Explanation of Solution

From part (b), it is clear that it is easier to distinguish spam from ham in a message that includes the word “today”.

From part (c), it is clear that it is more difficult to distinguish spam from ham in a message that includes the word “available”.

Therefore, it is easier to distinguish spam from ham when the word occurs more often in unwanted messages or less often in legitimate messages.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Chapter 4 Solutions

Modern Business Statistics with Microsoft Office Excel (with XLSTAT Education Edition Printed Access Card)

Ch. 4.1 - Tri-State Smokers. A Gallup Poll of U.S. adults...Ch. 4.1 - 12. The Powerball lottery is played twice each...Ch. 4.1 - 13. A company that manufactures toothpaste is...Ch. 4.2 - 14. An experiment has four equally likely...Ch. 4.2 - 15. Consider the experiment of selecting a playing...Ch. 4.2 - 16. Consider the experiment of rolling a pair of...Ch. 4.2 - 17. Refer to the KP&L sample points and sample...Ch. 4.2 - Prob. 18ECh. 4.2 - 19. Do you think global warming will have an...Ch. 4.2 - 20. Junior Achievement USA and the Allstate...Ch. 4.2 - 21. Data on U.S. work-related fatalities by cause...Ch. 4.3 - 22. Suppose that we have a sample space with five...Ch. 4.3 - Prob. 23ECh. 4.3 - Prob. 24ECh. 4.3 - 25. The Eco Pulse survey from the marketing...Ch. 4.3 - Prob. 26ECh. 4.3 - Social Media Use. A marketing firm would like to...Ch. 4.3 - 28. A survey of magazine subscribers showed that...Ch. 4.3 - 29. High school seniors with strong academic...Ch. 4.4 - 30. Suppose that we have two events, A and B, with...Ch. 4.4 - 31. Assume that we have two events, A and B, that...Ch. 4.4 - Living with Family. Consider the following example...Ch. 4.4 - Students taking the Graduate Management...Ch. 4.4 - Prob. 34ECh. 4.4 - 35. To better understand how husbands and wives...Ch. 4.4 - 36. Jamal Crawford of the National Basketball...Ch. 4.4 - 37. A joint survey by Parade magazine and Yahoo!...Ch. 4.4 - 38. The Institute for Higher Education Policy, a...Ch. 4.5 - 39. The prior probabilities for events A1 and A2...Ch. 4.5 - 40. The prior probabilities for events A1, A2, and...Ch. 4.5 - 41. A consulting firm submitted a bid for a large...Ch. 4.5 - Prob. 42ECh. 4.5 - 43. In August 2012, tropical storm Isaac formed in...Ch. 4.5 - Prob. 44ECh. 4.5 - 45. The percentage of adult users of the Internet...Ch. 4 - 46. A survey of adults aged 18 and older conducted...Ch. 4 - Prob. 47SECh. 4 - Below are the results of a survey of 1364...Ch. 4 - 49. A study of 31,000 hospital admissions in New...Ch. 4 - 50. A telephone survey to determine viewer...Ch. 4 - 51. The U.S. Census Bureau serves as the leading...Ch. 4 - 52. An MBA new-matriculants survey provided the...Ch. 4 - Prob. 53SECh. 4 - 54. In February 2012, the Pew Internet & American...Ch. 4 - 55. A large consumer goods company ran a...Ch. 4 - Prob. 56SECh. 4 - 57. A company studied the number of lost-time...Ch. 4 - Prob. 58SECh. 4 - 59. An oil company purchased an option on land in...Ch. 4 - 60. The five most common words appearing in spam...Ch. 4 - Rob’s Market (RM) is a regional food store chain...
Knowledge Booster
Background pattern image
Statistics
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, statistics and related others by exploring similar questions and additional content below.
Recommended textbooks for you
Text book image
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning
Text book image
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Text book image
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax
Text book image
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Text book image
Elementary Geometry For College Students, 7e
Geometry
ISBN:9781337614085
Author:Alexander, Daniel C.; Koeberlein, Geralyn M.
Publisher:Cengage,
Text book image
Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning
Propositional Logic, Propositional Variables & Compound Propositions; Author: Neso Academy;https://www.youtube.com/watch?v=Ib5njCwNMdk;License: Standard YouTube License, CC-BY
Propositional Logic - Discrete math; Author: Charles Edeki - Math Computer Science Programming;https://www.youtube.com/watch?v=rL_8y2v1Guw;License: Standard YouTube License, CC-BY
DM-12-Propositional Logic-Basics; Author: GATEBOOK VIDEO LECTURES;https://www.youtube.com/watch?v=pzUBrJLIESU;License: Standard Youtube License
Lecture 1 - Propositional Logic; Author: nptelhrd;https://www.youtube.com/watch?v=xlUFkMKSB3Y;License: Standard YouTube License, CC-BY
MFCS unit-1 || Part:1 || JNTU || Well formed formula || propositional calculus || truth tables; Author: Learn with Smily;https://www.youtube.com/watch?v=XV15Q4mCcHc;License: Standard YouTube License, CC-BY