hw06

pdf

School

Santa Barbara City College *

*We aren’t endorsed by this school

Course

101

Subject

Computer Science

Date

Apr 3, 2024

Type

pdf

Pages

16

Uploaded by ConstableLarkPerson986

Report
hw06 March 25, 2024 [1]: import otter grader = otter . Notebook() 1 Homework 6: Probability, Simulation, Estimation, and Assess- ing Models Reading : * Randomness * Sampling and Empirical Distributions * Testing Hypotheses Please complete this notebook by filling in the cells provided. Before you begin, execute the following cell to load the provided tests. Each time you start your server, you will need to execute this cell again to load the tests. Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. For all problems that you must write our explanations and sentences for, you must provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use max_temperature in your answer to one question, do not reassign it later on. [2]: # Don't change this cell; just run it. import numpy as np from datascience import * # These lines do some fancy plotting magic. import matplotlib % matplotlib inline import matplotlib.pyplot as plt plt . style . use( 'fivethirtyeight' ) import warnings warnings . simplefilter( 'ignore' , FutureWarning ) import otter grader = otter . Notebook() 1
1.1 1. Probability We will be testing some probability concepts that were introduced in lecture. For all of the following problems, we will introduce a problem statement and give you a proposed answer. You must assign the provided variable to one of the following three integers, depending on whether the proposed answer is too low, too high, or correct. 1. Assign the variable to 1 if you believe our proposed answer is too high. 2. Assign the variable to 2 if you believe our proposed answer is too low. 3. Assign the variable to 3 if you believe our proposed answer is correct. You are more than welcome to create more cells across this notebook to use for arithmetic operations Question 1. You roll a 6-sided die 10 times. What is the chance of getting 10 sixes? Our proposed answer: ( 1 6 ) 10 Assign ten_sixes to either 1, 2, or 3 depending on if you think our answer is too high, too low, or correct. [3]: ten_sixes = 2 ten_sixes [3]: 2 [4]: grader . check( "q1_1" ) [4]: q1_1 results: All test cases passed! Question 2. Take the same problem set-up as before, rolling a fair dice 10 times. What is the chance that every roll is less than or equal to 5? Our proposed answer: 1 − ( 1 6 ) 10 Assign five_or_less to either 1, 2, or 3. [5]: less_or_five = 1 - ( 1/6 ) **10 print (less_or_five) five_or_less = 3 five_or_less 0.9999999834618283 [5]: 3 [6]: grader . check( "q1_2" ) 2
[6]: q1_2 results: All test cases passed! Question 3. Assume we are picking a lottery ticket. We must choose three distinct numbers from 1 to 1000 and write them on a ticket. Next, someone picks three numbers one by one from a bowl with numbers from 1 to 1000 each time without putting the previous number back in. We win if our numbers are all called in order. If we decide to play the game and pick our numbers as 12, 140, and 890, what is the chance that we win? Our proposed answer: ( 3 1000 ) 3 Assign lottery to either 1, 2, or 3. [7]: lottery = 3 [8]: grader . check( "q1_3" ) [8]: q1_3 results: All test cases passed! Question 4. Assume we have two lists, list A and list B. List A contains the numbers [20,10,30], while list B contains the numbers [10,30,20,40,30]. We choose one number from list A randomly and one number from list B randomly. What is the chance that the number we drew from list A is larger than or equal to the number we drew from list B? Our proposed solution: 1/5 Assign list_chances to either 1, 2, or 3. Hint: Consider the different possible ways that the items in List A can be greater than or equal to items in List B. Try working out your thoughts with a pencil and paper, what do you think the correct solutions will be close to? [9]: list_chances = 2 [10]: grader . check( "q1_4" ) [10]: q1_4 results: All test cases passed! 1.2 2. Monkeys Typing Shakespeare (…or at least the string “datascience”) A monkey is banging repeatedly on the keys of a typewriter. Each time, the monkey is equally likely to hit any of the 26 lowercase letters of the English alphabet, 26 uppercase letters of the English alphabet, and any number between 0-9 (inclusive), regardless of what it has hit before. There are no other keys on the keyboard. This question is inspired by a mathematical theorem called the Infinite monkey theorem ( https:// en.wikipedia.org/wiki/Infinite_monkey_theorem ), which postulates that if you put a monkey 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
in the situation described above for an infinite time, they will eventually type out all of Shakespeare’s works. Question 1. Suppose the monkey hits the keyboard 5 times. Compute the chance that the monkey types the sequence CS118 . (Call this data_chance .) Use algebra and type in an arithmetic equation that Python can evalute. [11]: data_chance = ( 1/62 ) **5 * 100 data_chance = ( 1/62 ) **5 data_chance [11]: 1.0915447684774164e-09 [12]: grader . check( "q2_1" ) [12]: q2_1 results: All test cases passed! Question 2. Write a function called simulate_key_strike . It should take no arguments , and it should return a random one-character string that is equally likely to be any of the 26 lower-case English letters, 26 upper-case English letters, or any number between 0-9 (inclusive). [13]: # We have provided the code below to compute a list called keys, # containing all the lower-case English letters, upper-case English letters, and the digits 0-9 (inclusive). Print it if you # want to verify what it contains. import string keys = list (string . ascii_lowercase + string . ascii_uppercase + string . digits) print (keys) def simulate_key_strike (): """Simulates one random key strike.""" return (np . random . choice(keys)) # An example call to your function: simulate_key_strike() ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] [13]: 'v' [14]: grader . check( "q2_2" ) [14]: q2_2 results: All test cases passed! Question 3. Write a function called simulate_several_key_strikes . It should take one ar- gument: an integer specifying the number of key strikes to simulate. It should return a string containing that many characters, each one obtained from simulating a key strike by the monkey. 4
Hint: If you make a list or array of the simulated key strikes called key_strikes_array , you can convert that to a string by calling "".join(key_strikes_array) [15]: def simulate_several_key_strikes (num_strikes): key_strikes_array = [] for keys in range ( 0 , num_strikes): key_strikes_array . append(simulate_key_strike()) return "" . join(key_strikes_array) # An example call to your function: simulate_several_key_strikes( 11 ) [15]: 'sPH9Eho20Aa' [16]: grader . check( "q2_3" ) [16]: q2_3 results: All test cases passed! Question 4. Call simulate_several_key_strikes 5000 times, each time simulating the monkey striking 5 keys. Compute the proportion of times the monkey types "CS118" , calling that proportion data_proportion . [17]: word_count = 0 final_count = [] for x in range ( 0 , 5000 ): cs118 = simulate_several_key_strikes( 5 ) if (cs118 == 'CS118' ): count += 1 final_count . append(cs118) data_proportion = word_count /5000 data_proportion [17]: 0.0 [18]: grader . check( "q2_4" ) [18]: q2_4 results: All test cases passed! Question 5. Check the value your simulation computed for data_proportion . Is your simulation a good way to estimate the chance that the monkey types "CS118" in 5 strikes (the answer to question 1)? Why or why not? No becuase the simulation suggests that after striking the typewriter 5000 different time, not once was “CS118” typed. In question the probability is greater than 0 making the the data_proportion simulation inaccurate. 5000 seems like a big number, but in this case it’s too small of a sample size 5
compared to the enourmous number of possibilites. This demonsatrates the law of averages, which says that the more an event occurs, the closer it would get to the theoretical probability, which is the answer for questions one. In this case, 5000 occurances isn’t suffcient enough. Question 6. Compute the chance that the monkey types the letter "t" at least once in the 5 strikes. Call it t_chance . Use algebra and type in an arithmetic equation that Python can evalute. [19]: t_chance = ( 1 - ( 61/62 ) **5 ) * 100 t_chance = 1 - ( 61/62 ) **5 t_chance [19]: 0.07808532616807251 [20]: grader . check( "q2_6" ) [20]: q2_6 results: All test cases passed! Question 7. Do you think that a computer simulation is more or less effective to estimate t_chance compared to when we tried to estimate data_chance this way? Why or why not? (You don’t need to write a simulation, but it is an interesting exercise.) I would say the simulation for t_chance is a little more effective than data_chance because the probability for data_chance is 0.000010915447684774164% whereas the probabilty for t_chance is 7.808532616807251%. 1.3 3. Sampling Basketball Players This exercise uses salary data and game statistics for basketball players from the 2019-2020 NBA season. The data was collected from Basketball-Reference . Run the next cell to load the two datasets. [21]: player_data = Table . read_table( 'player_data.csv' ) salary_data = Table . read_table( 'salary_data.csv' ) player_data . show( 3 ) salary_data . show( 3 ) <IPython.core.display.HTML object> <IPython.core.display.HTML object> Question 1. We would like to relate players’ game statistics to their salaries. Compute a table called full_data that includes one row for each player who is listed in both player_data and salary_data . It should include all the columns from player_data and salary_data , except the "Name" column. [22]: full_data = player_data . join( 'Player' , salary_data, 'Name' ) full_data [22]: Player | 3P | 2P | PTS | Salary Aaron Gordon | 1.2 | 4.1 | 14.2 | 19863636 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Aaron Holiday | 1.5 | 2.2 | 9.9 | 2239200 Abdel Nader | 0.7 | 1.3 | 5.7 | 1618520 Admiral Schofield | 0.5 | 0.6 | 3.2 | 898310 Al Horford | 1.4 | 3.4 | 12 | 28000000 Al-Farouq Aminu | 0.5 | 0.9 | 4.3 | 9258000 Alec Burks | 1.7 | 3.3 | 15.8 | 2320044 Alec Burks | 1.8 | 3.3 | 16.1 | 2320044 Alec Burks | 0 | 1 | 2 | 2320044 Alen Smailagić | 0.3 | 1.3 | 4.7 | 898310 … (552 rows omitted) [23]: grader . check( "q3_1" ) [23]: q3_1 results: All test cases passed! Basketball team managers would like to hire players who perform well but don’t command high salaries. From this perspective, a very crude measure of a player’s value to their team is the number of 3 pointers and free throws the player scored in a season for every $100000 of salary ( Note : the Salary column is in dollars, not hundreds of thousands of dollars). For example, Al Horford scored an average of 5.2 points for 3 pointers and free throws combined, and has a salary of $28 million. This is equivalent to 280 thousands of dollars, so his value is 5.2 280 . The formula is: ”PTS” − 2 ∗ ”2P” ”Salary” / 100000 Question 2. Create a table called full_data_with_value that’s a copy of full_data , with an extra column called "Value" containing each player’s value (according to our crude measure). Then make a histogram of players’ values. Specify bins that make the histogram informative and don’t forget your units! Remember that hist() takes in an optional third argument that allows you to specify the units! Refer to the python reference to look at tbl.hist(...) if necessary. Just so you know: Informative histograms contain a majority of the data and exclude outliers [24]: bins = np . arange( 0 , 0.7 , .1 ) # Use this provided bins when you make your histogram full_data_with_value = full_data . with_column( "Value" , (full_data[ "PTS" ] - 2 * full_data[ "2P" ]) / (full_data[ "Salary" ] / 100000 )) full_data_with_value . select( "Value" ) . hist(bins = bins, unit = "Per $100k" ) plt . title( "NBA Player's Value" ) plt . show() 7
Now suppose we weren’t able to find out every player’s salary (perhaps it was too costly to interview each player). Instead, we have gathered a simple random sample of 50 players’ salaries. The cell below loads those data. [25]: sample_salary_data = Table . read_table( "sample_salary_data.csv" ) sample_salary_data . show( 3 ) sample_salary_data . sample( 5 ) sample_salary_data <IPython.core.display.HTML object> [25]: Name | Salary D.J. Wilson | 2961120 Yante Maten | 100000 Abdel Nader | 1618520 Jaren Jackson | 6927480 Cameron Johnson | 4033440 Malik Newman | 555409 Luol Deng | 4990000 Terrance Ferguson | 2475840 Maurice Harkless | 11511234 Nicolò Melli | 3902439 … (40 rows omitted) 8
Question 3. Make a histogram of the values of the players in sample_salary_data , using the same method for measuring value we used in question 2. Make sure to specify the units again in the histogram as stated in the previous problem. Use the same bins, too. Hint: This will take several steps. [26]: sample_tbl = player_data . join( 'Player' , sample_salary_data, 'Name' ) sample_tbl = sample_tbl . with_column( 'Value' , (sample_tbl[ 'PTS' ] - 2 * sample_tbl[ '2P' ]) / (sample_tbl[ 'Salary' ] / 100000 )) sample_tbl . select( 'Value' ) . hist(bins = bins, unit = "Per $100k" ) plt . title( 'Sample of Size 50' ) plt . show() Now let us summarize what we have seen. To guide you, we have written most of the summary already. Question 4. Complete the statements below by setting each relevant variable name to the value that correctly fills the blank. • The plot in question 2 displayed a(n) [ distribution_1 ] distribution of the population of [ player_count_1 ] players. The areas of the bars in the plot sum to [ area_total_1 ]. • The plot in question 3 displayed a(n) [ distribution_2 ] distribution of the sample of [ player_count_2 ] players. The areas of the bars in the plot sum to [ area_total_2 ]. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
distribution_1 and distribution_2 should be set to one of the following strings: "empirical" or "probability" . player_count_1 , area_total_1 , player_count_2 , and area_total_2 should be set to integers. Remember that areas are represented in terms of percentages. Hint 1: For a refresher on distribution types, check out Section 10.1 Hint 2: The hist() table method ignores data points outside the range of its bins, but you may ignore this fact and calculate the areas of the bars using what you know about histograms from lecture. [27]: distribution_1 = 'empirical' player_count_1 = 585 area_total_1 = 100 distribution_2 = 'empirical' player_count_2 = 50 area_total_2 = 100 [28]: grader . check( "q3_4" ) [28]: q3_4 results: All test cases passed! Question 5. For which range of values does the plot in question 3 better depict the distribution of the population’s player values : 0 to 0.3, or above 0.3? Explain your answer. The plot better depicts the distribution of the player values for the range of values 0 to 0.3 because that is where the majority of the players are found. So by doing more random samples on that part of the population will result in more values and depict a better distribution whereas doing random samples above 0.3 will will give fewer values because there are much fewers players in that range. 1.4 4. Earthquakes The next cell loads a table containing information about every earthquake with a magnitude above 5 in 2019 (smaller earthquakes are generally not felt, only recorded by very sensitive equipment), compiled by the US Geological Survey. (source: https://earthquake.usgs.gov/earthquakes/search/) [29]: earthquakes = Table() . read_table( 'earthquakes_2019.csv' ) . select([ 'time' , 'mag' , 'place' ]) earthquakes [29]: time | mag | place 2019-12-31T11:22:49.734Z | 5 | 245km S of L'Esperance Rock, New Zealand 2019-12-30T17:49:59.468Z | 5 | 37km NNW of Idgah, Pakistan 2019-12-30T17:18:57.350Z | 5.5 | 34km NW of Idgah, Pakistan 2019-12-30T13:49:45.227Z | 5.4 | 33km NE of Bandar 'Abbas, Iran 2019-12-30T04:11:09.987Z | 5.2 | 103km NE of Chichi-shima, Japan 2019-12-29T18:24:41.656Z | 5.2 | Southwest of Africa 10
2019-12-29T13:59:02.410Z | 5.1 | 138km SSW of Kokopo, Papua New Guinea 2019-12-29T09:12:15.010Z | 5.2 | 79km S of Sarangani, Philippines 2019-12-29T01:06:00.130Z | 5 | 9km S of Indios, Puerto Rico 2019-12-28T22:49:15.959Z | 5.2 | 128km SSE of Raoul Island, New Zealand … (1626 rows omitted) If we were studying all human-detectable 2019 earthquakes and had access to the above data, we’d be in good shape - however, if the USGS didn’t publish the full data, we could still learn something about earthquakes from just a smaller subsample. If we gathered our sample correctly, we could use that subsample to get an idea about the distribution of magnitudes (above 5, of course) throughout the year! In the following lines of code, we take two different samples from the earthquake table, and calculate the mean of the magnitudes of these earthquakes. [30]: sample1 = earthquakes . sort( 'mag' , descending = True ) . take(np . arange( 100 )) sample1_magnitude_mean = np . mean(sample1 . column( 'mag' )) sample2 = earthquakes . take(np . arange( 100 )) sample2_magnitude_mean = np . mean(sample2 . column( 'mag' )) [sample1_magnitude_mean, sample2_magnitude_mean] [30]: [6.4589999999999987, 5.2790000000000008] Question 1. Are these samples representative of the population of earthquakes in the original table (that is, the should we expect the mean to be close to the population mean)? Hint: Consider the ordering of the earthquakes table. Sample 1 is not representative of the entire population because the sample consists of the 100 earthquakes with the highest magnitude. It takes the ‘mag’ column and sorts it in descending order. This does not represent the whole population, it only looks at the 100 earthquakes with the highest magnitude. This is a deterministic sample. Sample 2 also takes a sample of the first 100 earthquakes, but it doesn’t sort them in any specific order. While it is better than sample 1 as far as representation, it is still a small sample considering there are 1636 earthquakes on the table. We can’t be sure that it will represent the full range of all the magnitudes. To get an more accurate representation of the population, you would need a big enough, truly random sample from the entire population. Question 2. Write code to produce a sample of size 200 that is representative of the popula- tion. Then, take the mean of the magnitudes of the earthquakes in this sample. Assign these to representative_sample and representative_mean respectively. Hint: In class, we learned what kind of samples should be used to properly represent the population. [31]: representative_sample = earthquakes . sample( 200 ) representative_mean = np . mean(representative_sample . column( 'mag' )) representative_mean [31]: 5.3219000000000003 11
[32]: grader . check( "q4_2" ) [32]: q4_2 results: All test cases passed! Question 3. Suppose we want to figure out what the biggest magnitude earthquake was in 2019, but we only have our representative sample of 200. Let’s see if trying to find the biggest magnitude in the population from a random sample of 200 is a reasonable idea! Write code that takes many random samples from the earthquakes table and finds the maximum of each sample. You should take a random sample of size 200 and do this 5000 times. Assign the array of maximum magnitudes you find to maximums . [33]: maximums = make_array() for i in np . arange( 5000 ): sample = earthquakes . sample( 200 ) sample_max = max (sample . column( 'mag' )) maximums = np . append(maximums, sample_max) maximums [33]: array([ 6.8, 7. , 7.1, …, 7.2, 7.1, 6.4]) [34]: grader . check( "q4_3" ) [34]: q4_3 results: All test cases passed! [35]: #Histogram of your maximums Table() . with_column( 'Largest magnitude in sample' , maximums) . hist( 'Largest magnitude in sample' ) 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 4. Now find the magnitude of the actual strongest earthquake in 2019 (not the maximum of a sample). This will help us determine whether a random sample of size 200 is likely to help you determine the largest magnitude earthquake in the population. [36]: strongest_earthquake_magnitude = max (earthquakes . column( 'mag' )) strongest_earthquake_magnitude [36]: 8.0 [37]: grader . check( "q4_4" ) [37]: q4_4 results: All test cases passed! Question 5. Explain whether you believe you can accurately use a sample size of 200 to determine the maximum. What is one problem with using the maximum as your estimator? Use the histogram above to help answer. The sample size of 200 is too small to accuratley determine the maximum magnitude of all the earthquakes in 2019. The actual strgonest earthquake had a magnitude of 8, but according to the histogram of our sample size of 200, the maximum is 7. This means that even after 5000 samples of 200, we still did not come close to the actual strongest earthquakes. This is because a sample size of 200 is too small compared to all the earthwuakes in 2019. 13
1.5 5. Assessing Jade’s Models Games with Jade Our friend Jade comes over and asks us to play a game with her. The game works like this: We will draw randomly with replacement from a simplified 13 card deck with 4 face cards (A, J, Q, K), and 9 numbered cards (2, 3, 4, 5, 6, 7, 8, 9, 10). If we draw cards with replacement 13 times, and if the number of face cards is greater than or equal to 4, we lose. Otherwise, Jade wins. We play the game once and we lose, observing 8 total face cards. We are angry and accuse Jade of cheating! Jade is adamant, however, that the deck is fair. Jade’s model claims that there is an equal chance of getting any of the cards (A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K), but we do not believe her. We believe that the deck is clearly rigged, with face cards (A, J, Q, K) being more likely than the numbered cards (2, 3, 4, 5, 6, 7, 8, 9, 10). Question 1 Assign deck_model_probabilities to a two-item array containing the chance of drawing a face card as the first element, and the chance of drawing a numbered card as the second element under Jade’s model. Since we’re working with probabilities, make sure your values are between 0 and 1. [38]: deck_model_probabilities = make_array( 1/2 , 1/2 ) deck_model_probabilities [38]: array([ 0.5, 0.5]) [39]: grader . check( "q5_1" ) [39]: q5_1 results: All test cases passed! Question 2 We believe Jade’s model is incorrect. In particular, we believe there to be a larger chance of getting a face card. Which of the following statistics can we use during our simulation to test between the model and our alternative? Assign statistic_choice to the correct answer. 1. The actual number of face cards we get in 13 draws 2. The distance (absolute value) between the actual number of face cards in 13 draws and the expected number of face cards in 13 draws (4) 3. The expected number of face cards in 13 draws (4) [40]: statistic_choice = 2 statistic_choice [40]: 2 [41]: grader . check( "q5_2" ) 14
[41]: q5_2 results: All test cases passed! Question 3 Define the function deck_simulation_and_statistic , which, given a sample size and an array of model proportions (like the one you created in Question 1), returns the number of face cards in one simulation of drawing a card under the model specified in model_proportions . Hint: Think about how you can use the function sample_proportions . [68]: def deck_simulation_and_statistic (sample_size, model_proportions): model_proportions = make_array( 4/13 , 9/13 ) simulated_draws = sample_proportions(sample_size, model_proportions) num_face_cards = simulated_draws . item( 0 ) * sample_size return num_face_cards deck_simulation_and_statistic( 13 , deck_model_probabilities) [68]: 2.0 [43]: grader . check( "q5_3" ) [43]: q5_3 results: All test cases passed! Question 4 Use your function from above to simulate the drawing of 13 cards 5000 times under the proportions that you specified in Question 1. Keep track of all of your statistics in deck_statistics . [45]: repetitions = 5000 deck_statistics = make_array() deck_simulation_and_statistic( 13 , deck_model_probabilities) deck_model_probabilities = make_array( 1/2 , 1/2 ) deck_model_probabilities [45]: array([ 11., 5., 6., …, 9., 9., 6.]) [46]: grader . check( "q5_4" ) [46]: q5_4 results: q5_4 - 1 result: Trying: len(deck_statistics) == repetitions Expecting: True ********************************************************************** 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Line 4, in q5_4 0 Failed example: len(deck_statistics) == repetitions Exception raised: Traceback (most recent call last): File "/opt/conda/lib/python3.7/doctest.py", line 1337, in __run compileflags, 1), test.globs) File "<doctest q5_4 0[0]>", line 1, in <module> len(deck_statistics) == repetitions NameError: name 'deck_statistics' is not defined q5_4 - 2 result: Trying: all([0 <= k <= 13 for k in deck_statistics]) Expecting: True ********************************************************************** Line 4, in q5_4 1 Failed example: all([0 <= k <= 13 for k in deck_statistics]) Exception raised: Traceback (most recent call last): File "/opt/conda/lib/python3.7/doctest.py", line 1337, in __run compileflags, 1), test.globs) File "<doctest q5_4 1[0]>", line 1, in <module> all([0 <= k <= 13 for k in deck_statistics]) NameError: name 'deck_statistics' is not defined Let’s take a look at the distribution of simulated statistics. [ ]: #Draw a distribution of statistics Table() . with_column( 'Deck Statistics' , deck_statistics) . hist() Question 5 Given your observed value, do you believe that Jade’s model is reasonable, or is our alternative more likely? Explain your answer using the distribution drawn in the previous problem. Write your answer here, replacing this text. 1.6 Congratulations! You’re done with Homework 6! Be sure to run the tests and verify that they all pass, then choose Download as PDF from the File menu and submit the .pdf file on canvas. 16