Null and Alternative Hypotheses in Spam Call Investigation

Data 8 - hw07 - email@berkeley.edu **Question 1.** Define the null hypothesis and alternative hypothesis for this investigation. *Hint: Dont forget that your null hypothesis should fully describe a probability model that we can use for simulation later.* The null hypothesis is that the spammers are choosing their area codes randomly from all available area code(200-999). The alternative hypothesis is that the spammers are picking Yanay's area code(781) on purpose to trick him that someone from his area is calling. Page 1

Data 8 - hw07 - email@berkeley.edu **Question 5.** Using the results from Question 4, generate a histogram of the empirical distribution of the number of times you saw the area code 781 in your simulation. **NOTE: Use the provided bins when making the histogram** bins = np.arange(0,5,1) # Use these provided bins simulation_result = Table().with_column("number of 781", test_statistics_under_null).hist(0, bins = bins) Page 2

Data 8 - hw07 - email@berkeley.edu **Question 7.** Suppose you use a P-value cutoff of 1%. What do you conclude from the hypothesis test? Why? Since the p-value is 0.00185, which is smaller than the p-value cutoff of 1%, we have evidence to reject the null hypothesis. The alternative hypothesis that the spammers are intentionally using Yanay's area code is supported. Page 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Data 8 - hw07 - email@berkeley.edu **Question 8.** Define the null hypothesis and alternative hypothesis for this investigation. *Reminder: Dont forget that your null hypothesis should fully describe a probability model that we can use for simulation later.* The null hypothesis is that the spammers are choosing their area codes randomly from all possible area codes between 200-999. The alternative hypothesis is that the spammers are intentionally choosing area codes of the 8 places that Yanay has recently been to(781, 617, 509, 510, 212, 858, 339, 626). Page 4

Data 8 - hw07 - email@berkeley.edu **Question 11.** Using the results from Question 10, generate a histogram of the empirical distribution of the number of times you saw any of the area codes of the places Yanay has been to in your simulation. **NOTE: Use the provided bins when making the histogram** bins_visited = np.arange(0,6,1) # Use these provided bins area_visited_table = Table().with_column("numbers of area codes visited", visited_test_statistics_under_null).hist(0, bins = bins_visited) Page 5

Data 8 - hw07 - email@berkeley.edu **Question 13.** Suppose you use a P-value cutoff of 0.05% (**Note: thats 0.05%, not our usual cutoff of 5%**). What do you conclude from the hypothesis test? Why? The p-value of 0.2% is higer than the p-value cutoff of 0.05%, therefore, the null hypothesis is supported by the data. This provides evidence that the spammers are choosing area codes randomly. Page 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Data 8 - hw07 - email@berkeley.edu **Question 14.** Is `p_value`: * (a) the probability that the spam calls favored the visited area codes, * (b) the probability that they didn't favor, or * (c) neither If you chose (c), explain what it is instead. b Page 7

Data 8 - hw07 - email@berkeley.edu **Question 15.** Is 0.05% (the P-value cutoff): * (a) the probability that the spam calls favored the visited area codes, * (b) the probability that they didn't favor, or * (c) neither If you chose (c), explain what it is instead. c. The p-value cutoff of 0.05% is not probability. It is a value that we choose to determine whether or not the probability of null hypothesis being correct is small enough for us to reject it. If the p-value is smaller than the cutoff, we can choose to say that the null hypothesis is not supported by the data, thus allowing us to reject it. Page 8

Data 8 - hw07 - email@berkeley.edu **Question 16.** Suppose you run this test for 4000 different people after observing each person's last 50 spam calls. When you reject the null hypothesis for a person, you accuse the spam callers of favoring the area codes that person has visited. If the spam callers were not actually favoring area codes that people have visited, can we compute how many times we will incorrectly accuse the spam callers of favoring area codes that people have visited? If so, what is the number? Explain your answer. Assume a 0.05% P-value cutoff. There will be 2 spam callers incorrectly accused for favoring area codes that people have visited, acoording to the p-value cutoff of 0.05% times the total number of samples(4000). Page 9

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Data 8 - hw07 - email@berkeley.edu **Question 19.** Generate 1,000 simulated test statistic values. Assign `test_stats` to an array that stores the result of each of these trials. *Hint*: Use the function you defined in Question 18. We also provided code that'll generate a histogram for you after generating a 1000 simulated test statistic values. trials = 1000 test_stats = make_array() for i in np.arange(trials): test_stats = np.append(test_stats, simulate_one_stat()) # here's code to generate a histogram of values and the red dot is the observed value Table().with_column("Simulated Proportion Difference", test_stats).hist("Simulated Proportion Difference"); plt.plot(observed_diff_proportion, 0, 'ro', markersize=15); Page 10

data 8 hw07

Related Documents