hw06
pdf
keyboard_arrow_up
School
Santa Barbara City College *
*We aren’t endorsed by this school
Course
101
Subject
Computer Science
Date
Apr 3, 2024
Type
Pages
16
Uploaded by ConstableLarkPerson986
hw06
March 25, 2024
[1]:
import
otter
grader
=
otter
.
Notebook()
1
Homework 6: Probability, Simulation, Estimation, and Assess-
ing Models
Reading
: *
Randomness
*
Sampling and Empirical Distributions
*
Testing Hypotheses
Please complete this notebook by filling in the cells provided.
Before you begin, execute the
following cell to load the provided tests. Each time you start your server, you will need to execute
this cell again to load the tests.
Directly sharing answers is not okay, but discussing problems with the course staff or with other
students is encouraged.
For all problems that you must write our explanations and sentences for, you
must
provide your
answer in the designated space. Moreover, throughout this homework and all future ones, please be
sure to not re-assign variables throughout the notebook! For example, if you use
max_temperature
in your answer to one question, do not reassign it later on.
[2]:
# Don't change this cell; just run it.
import
numpy
as
np
from
datascience
import
*
# These lines do some fancy plotting magic.
import
matplotlib
%
matplotlib
inline
import
matplotlib.pyplot
as
plt
plt
.
style
.
use(
'fivethirtyeight'
)
import
warnings
warnings
.
simplefilter(
'ignore'
,
FutureWarning
)
import
otter
grader
=
otter
.
Notebook()
1
1.1
1. Probability
We will be testing some probability concepts that were introduced in lecture. For all of the following
problems, we will introduce a problem statement and give you a proposed answer. You must assign
the provided variable to one of the following three integers, depending on whether the proposed
answer is too low, too high, or correct.
1. Assign the variable to 1 if you believe our proposed answer is too high.
2. Assign the variable to 2 if you believe our proposed answer is too low.
3. Assign the variable to 3 if you believe our proposed answer is correct.
You are more than welcome to create more cells across this notebook to use for arithmetic operations
Question 1.
You roll a 6-sided die 10 times. What is the chance of getting 10 sixes?
Our proposed answer:
(
1
6
)
10
Assign
ten_sixes
to either 1, 2, or 3 depending on if you think our answer is too high, too low, or
correct.
[3]:
ten_sixes
= 2
ten_sixes
[3]:
2
[4]:
grader
.
check(
"q1_1"
)
[4]:
q1_1 results: All test cases passed!
Question 2.
Take the same problem set-up as before, rolling a fair dice 10 times. What is the
chance that every roll is less than or equal to 5?
Our proposed answer:
1 − (
1
6
)
10
Assign
five_or_less
to either 1, 2, or 3.
[5]:
less_or_five
= 1 -
(
1/6
)
**10
print
(less_or_five)
five_or_less
= 3
five_or_less
0.9999999834618283
[5]:
3
[6]:
grader
.
check(
"q1_2"
)
2
[6]:
q1_2 results: All test cases passed!
Question 3.
Assume we are picking a lottery ticket. We must choose three distinct numbers from
1 to 1000 and write them on a ticket. Next, someone picks three numbers one by one from a bowl
with numbers from 1 to 1000 each time without putting the previous number back in. We win if
our numbers are all called in order.
If we decide to play the game and pick our numbers as 12, 140, and 890, what is the chance that
we win?
Our proposed answer:
(
3
1000
)
3
Assign
lottery
to either 1, 2, or 3.
[7]:
lottery
= 3
[8]:
grader
.
check(
"q1_3"
)
[8]:
q1_3 results: All test cases passed!
Question 4.
Assume we have two lists, list A and list B. List A contains the numbers [20,10,30],
while list B contains the numbers [10,30,20,40,30]. We choose one number from list A randomly
and one number from list B randomly. What is the chance that the number we drew from list A is
larger than or equal to the number we drew from list B?
Our proposed solution:
1/5
Assign
list_chances
to either 1, 2, or 3.
Hint: Consider the different possible ways that the items in List A can be greater than or equal
to items in List B. Try working out your thoughts with a pencil and paper, what do you think the
correct solutions will be close to?
[9]:
list_chances
= 2
[10]:
grader
.
check(
"q1_4"
)
[10]:
q1_4 results: All test cases passed!
1.2
2. Monkeys Typing Shakespeare
(…or at least the string “datascience”)
A monkey is banging repeatedly on the keys of
a typewriter.
Each time, the monkey is equally likely to hit any of the 26 lowercase letters of
the English alphabet, 26 uppercase letters of the English alphabet, and any number between 0-9
(inclusive), regardless of what it has hit before. There are no other keys on the keyboard.
This question is inspired by a mathematical theorem called the Infinite monkey theorem (
https://
en.wikipedia.org/wiki/Infinite_monkey_theorem
), which postulates that if you put a monkey
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
in the situation described above for an infinite time, they will eventually type out all of Shakespeare’s
works.
Question 1.
Suppose the monkey hits the keyboard 5 times. Compute the chance that the monkey
types the sequence
CS118
. (Call this
data_chance
.) Use algebra and type in an arithmetic equation
that Python can evalute.
[11]:
data_chance
=
(
1/62
)
**5 * 100
data_chance
=
(
1/62
)
**5
data_chance
[11]:
1.0915447684774164e-09
[12]:
grader
.
check(
"q2_1"
)
[12]:
q2_1 results: All test cases passed!
Question 2.
Write a function called
simulate_key_strike
. It should take
no arguments
, and
it should return a random one-character string that is equally likely to be any of the 26 lower-case
English letters, 26 upper-case English letters, or any number between 0-9 (inclusive).
[13]:
# We have provided the code below to compute a list called keys,
# containing all the lower-case English letters, upper-case English letters,
␣
↪
and the digits 0-9 (inclusive).
Print it if you
# want to verify what it contains.
import
string
keys
=
list
(string
.
ascii_lowercase
+
string
.
ascii_uppercase
+
string
.
digits)
print
(keys)
def
simulate_key_strike
():
"""Simulates one random key strike."""
return
(np
.
random
.
choice(keys))
# An example call to your function:
simulate_key_strike()
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',
'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
[13]:
'v'
[14]:
grader
.
check(
"q2_2"
)
[14]:
q2_2 results: All test cases passed!
Question 3.
Write a function called
simulate_several_key_strikes
.
It should take one ar-
gument:
an integer specifying the number of key strikes to simulate.
It should return a string
containing that many characters, each one obtained from simulating a key strike by the monkey.
4
Hint:
If you make a list or array of the simulated key strikes called
key_strikes_array
, you can
convert that to a string by calling
"".join(key_strikes_array)
[15]:
def
simulate_several_key_strikes
(num_strikes):
key_strikes_array
=
[]
for
keys
in
range
(
0
, num_strikes):
key_strikes_array
.
append(simulate_key_strike())
return
""
.
join(key_strikes_array)
# An example call to your function:
simulate_several_key_strikes(
11
)
[15]:
'sPH9Eho20Aa'
[16]:
grader
.
check(
"q2_3"
)
[16]:
q2_3 results: All test cases passed!
Question 4.
Call
simulate_several_key_strikes
5000 times, each time simulating the monkey
striking 5 keys. Compute the proportion of times the monkey types
"CS118"
, calling that proportion
data_proportion
.
[17]:
word_count
= 0
final_count
=
[]
for
x
in
range
(
0
,
5000
):
cs118
=
simulate_several_key_strikes(
5
)
if
(cs118
==
'CS118'
):
count
+= 1
final_count
.
append(cs118)
data_proportion
=
word_count
/5000
data_proportion
[17]:
0.0
[18]:
grader
.
check(
"q2_4"
)
[18]:
q2_4 results: All test cases passed!
Question 5.
Check the value your simulation computed for
data_proportion
. Is your simulation
a good way to estimate the chance that the monkey types
"CS118"
in 5 strikes (the answer to
question 1)? Why or why not?
No becuase the simulation suggests that after striking the typewriter 5000 different time, not once
was “CS118” typed. In question the probability is greater than 0 making the the data_proportion
simulation inaccurate. 5000 seems like a big number, but in this case it’s too small of a sample size
5
compared to the enourmous number of possibilites. This demonsatrates the law of averages, which
says that the more an event occurs, the closer it would get to the theoretical probability, which is
the answer for questions one. In this case, 5000 occurances isn’t suffcient enough.
Question 6.
Compute the chance that the monkey types the letter
"t"
at least once in the 5
strikes. Call it
t_chance
. Use algebra and type in an arithmetic equation that Python can evalute.
[19]:
t_chance
=
(
1 -
(
61/62
)
**5
)
* 100
t_chance
= 1 -
(
61/62
)
**5
t_chance
[19]:
0.07808532616807251
[20]:
grader
.
check(
"q2_6"
)
[20]:
q2_6 results: All test cases passed!
Question 7.
Do you think that a computer simulation is more or less effective to estimate
t_chance
compared to when we tried to estimate
data_chance
this way? Why or why not? (You don’t need
to write a simulation, but it is an interesting exercise.)
I would say the simulation for t_chance is a little more effective than data_chance because the
probability for data_chance is 0.000010915447684774164% whereas the probabilty for t_chance is
7.808532616807251%.
1.3
3. Sampling Basketball Players
This exercise uses salary data and game statistics for basketball players from the 2019-2020 NBA
season. The data was collected from
Basketball-Reference
.
Run the next cell to load the two datasets.
[21]:
player_data
=
Table
.
read_table(
'player_data.csv'
)
salary_data
=
Table
.
read_table(
'salary_data.csv'
)
player_data
.
show(
3
)
salary_data
.
show(
3
)
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
Question 1.
We would like to relate players’ game statistics to their salaries. Compute a table
called
full_data
that includes one row for each player who is listed in both
player_data
and
salary_data
. It should include all the columns from
player_data
and
salary_data
, except the
"Name"
column.
[22]:
full_data
=
player_data
.
join(
'Player'
, salary_data,
'Name'
)
full_data
[22]:
Player
| 3P
| 2P
| PTS
| Salary
Aaron Gordon
| 1.2
| 4.1
| 14.2 | 19863636
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Aaron Holiday
| 1.5
| 2.2
| 9.9
| 2239200
Abdel Nader
| 0.7
| 1.3
| 5.7
| 1618520
Admiral Schofield | 0.5
| 0.6
| 3.2
| 898310
Al Horford
| 1.4
| 3.4
| 12
| 28000000
Al-Farouq Aminu
| 0.5
| 0.9
| 4.3
| 9258000
Alec Burks
| 1.7
| 3.3
| 15.8 | 2320044
Alec Burks
| 1.8
| 3.3
| 16.1 | 2320044
Alec Burks
| 0
| 1
| 2
| 2320044
Alen Smailagić
| 0.3
| 1.3
| 4.7
| 898310
… (552 rows omitted)
[23]:
grader
.
check(
"q3_1"
)
[23]:
q3_1 results: All test cases passed!
Basketball team managers would like to hire players who perform well but don’t command high
salaries. From this perspective, a very crude measure of a player’s
value
to their team is the number
of 3 pointers and free throws the player scored in a season for every
$100000 of salary
(
Note
:
the
Salary
column is in dollars, not hundreds of thousands of dollars). For example, Al Horford
scored an average of 5.2 points for 3 pointers and free throws combined, and has a salary of
$28
million.
This is equivalent to 280 thousands of dollars, so his value is
5.2
280
. The formula is:
”PTS”
− 2 ∗
”2P”
”Salary”
/ 100000
Question 2.
Create a table called
full_data_with_value
that’s a copy of
full_data
, with an
extra column called
"Value"
containing each player’s value (according to our crude measure). Then
make a histogram of players’ values.
Specify bins that make the histogram informative and
don’t forget your units!
Remember that
hist()
takes in an optional third argument that allows
you to specify the units! Refer to the python reference to look at
tbl.hist(...)
if necessary.
Just so you know:
Informative histograms contain a majority of the data and
exclude outliers
[24]:
bins
=
np
.
arange(
0
,
0.7
,
.1
)
# Use this provided bins when you make your
␣
↪
histogram
full_data_with_value
=
full_data
.
with_column(
"Value"
, (full_data[
"PTS"
]
- 2 *
␣
↪
full_data[
"2P"
])
/
(full_data[
"Salary"
]
/ 100000
))
full_data_with_value
.
select(
"Value"
)
.
hist(bins
=
bins, unit
=
"Per $100k"
)
plt
.
title(
"NBA Player's Value"
)
plt
.
show()
7
Now suppose we weren’t able to find out every player’s salary (perhaps it was too costly to interview
each player). Instead, we have gathered a
simple random sample
of 50 players’ salaries. The cell
below loads those data.
[25]:
sample_salary_data
=
Table
.
read_table(
"sample_salary_data.csv"
)
sample_salary_data
.
show(
3
)
sample_salary_data
.
sample(
5
)
sample_salary_data
<IPython.core.display.HTML object>
[25]:
Name
| Salary
D.J. Wilson
| 2961120
Yante Maten
| 100000
Abdel Nader
| 1618520
Jaren Jackson
| 6927480
Cameron Johnson
| 4033440
Malik Newman
| 555409
Luol Deng
| 4990000
Terrance Ferguson | 2475840
Maurice Harkless
| 11511234
Nicolò Melli
| 3902439
… (40 rows omitted)
8
Question 3.
Make a histogram of the values of the players in
sample_salary_data
, using the
same method for measuring value we used in question 2. Make sure to specify the units again in
the histogram as stated in the previous problem.
Use the same bins, too.
Hint:
This will take several steps.
[26]:
sample_tbl
=
player_data
.
join(
'Player'
, sample_salary_data,
'Name'
)
sample_tbl
=
sample_tbl
.
with_column(
'Value'
, (sample_tbl[
'PTS'
]
- 2 *
␣
↪
sample_tbl[
'2P'
])
/
(sample_tbl[
'Salary'
]
/ 100000
))
sample_tbl
.
select(
'Value'
)
.
hist(bins
=
bins, unit
=
"Per $100k"
)
plt
.
title(
'Sample of Size 50'
)
plt
.
show()
Now let us summarize what we have seen. To guide you, we have written most of the summary
already.
Question 4.
Complete the statements below by setting each relevant variable name to the value
that correctly fills the blank.
• The plot in question 2 displayed a(n) [
distribution_1
] distribution of the population of
[
player_count_1
] players. The areas of the bars in the plot sum to [
area_total_1
].
• The plot in question 3 displayed a(n) [
distribution_2
] distribution of the sample of
[
player_count_2
] players. The areas of the bars in the plot sum to [
area_total_2
].
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
distribution_1
and
distribution_2
should be set to one of the following strings:
"empirical"
or
"probability"
.
player_count_1
,
area_total_1
,
player_count_2
, and
area_total_2
should be set to integers.
Remember that areas are represented in terms of percentages.
Hint 1:
For a refresher on distribution types, check out
Section 10.1
Hint 2:
The
hist()
table method ignores data points outside the range of its bins, but you may
ignore this fact and calculate the areas of the bars using what you know about histograms from
lecture.
[27]:
distribution_1
=
'empirical'
player_count_1
= 585
area_total_1
= 100
distribution_2
=
'empirical'
player_count_2
= 50
area_total_2
= 100
[28]:
grader
.
check(
"q3_4"
)
[28]:
q3_4 results: All test cases passed!
Question 5.
For which range of values does the plot in question 3 better depict the distribution
of the
population’s player values
: 0 to 0.3, or above 0.3? Explain your answer.
The plot better depicts the distribution of the player values for the range of values 0 to 0.3 because
that is where the majority of the players are found. So by doing more random samples on that part
of the population will result in more values and depict a better distribution whereas doing random
samples above 0.3 will will give fewer values because there are much fewers players in that range.
1.4
4. Earthquakes
The
next
cell
loads
a
table
containing
information
about
every
earthquake
with
a
magnitude
above
5
in
2019
(smaller
earthquakes
are
generally
not
felt,
only
recorded
by
very
sensitive
equipment),
compiled
by
the
US
Geological
Survey.
(source:
https://earthquake.usgs.gov/earthquakes/search/)
[29]:
earthquakes
=
Table()
.
read_table(
'earthquakes_2019.csv'
)
.
select([
'time'
,
'mag'
,
␣
↪
'place'
])
earthquakes
[29]:
time
| mag
| place
2019-12-31T11:22:49.734Z | 5
| 245km S of L'Esperance Rock, New Zealand
2019-12-30T17:49:59.468Z | 5
| 37km NNW of Idgah, Pakistan
2019-12-30T17:18:57.350Z | 5.5
| 34km NW of Idgah, Pakistan
2019-12-30T13:49:45.227Z | 5.4
| 33km NE of Bandar 'Abbas, Iran
2019-12-30T04:11:09.987Z | 5.2
| 103km NE of Chichi-shima, Japan
2019-12-29T18:24:41.656Z | 5.2
| Southwest of Africa
10
2019-12-29T13:59:02.410Z | 5.1
| 138km SSW of Kokopo, Papua New Guinea
2019-12-29T09:12:15.010Z | 5.2
| 79km S of Sarangani, Philippines
2019-12-29T01:06:00.130Z | 5
| 9km S of Indios, Puerto Rico
2019-12-28T22:49:15.959Z | 5.2
| 128km SSE of Raoul Island, New Zealand
… (1626 rows omitted)
If we were studying all human-detectable 2019 earthquakes and had access to the above data, we’d
be in good shape - however, if the USGS didn’t publish the full data, we could still learn something
about earthquakes from just a smaller subsample. If we gathered our sample correctly, we could use
that subsample to get an idea about the distribution of magnitudes (above 5, of course) throughout
the year!
In the following lines of code, we take two different samples from the earthquake table, and calculate
the mean of the magnitudes of these earthquakes.
[30]:
sample1
=
earthquakes
.
sort(
'mag'
, descending
=
True
)
.
take(np
.
arange(
100
))
sample1_magnitude_mean
=
np
.
mean(sample1
.
column(
'mag'
))
sample2
=
earthquakes
.
take(np
.
arange(
100
))
sample2_magnitude_mean
=
np
.
mean(sample2
.
column(
'mag'
))
[sample1_magnitude_mean, sample2_magnitude_mean]
[30]:
[6.4589999999999987, 5.2790000000000008]
Question 1.
Are these samples representative of the population of earthquakes in the original
table (that is, the should we expect the mean to be close to the population mean)?
Hint:
Consider the ordering of the
earthquakes
table.
Sample 1 is not representative of the entire population because the sample consists of the 100
earthquakes with the highest magnitude.
It takes the ‘mag’ column and sorts it in descending
order. This does not represent the whole population, it only looks at the 100 earthquakes with the
highest magnitude. This is a deterministic sample.
Sample 2 also takes a sample of the first 100 earthquakes, but it doesn’t sort them in any specific
order. While it is better than sample 1 as far as representation, it is still a small sample considering
there are 1636 earthquakes on the table. We can’t be sure that it will represent the full range of
all the magnitudes. To get an more accurate representation of the population, you would need a
big enough, truly random sample from the entire population.
Question 2.
Write code to produce a sample of size 200 that is representative of the popula-
tion. Then, take the mean of the magnitudes of the earthquakes in this sample. Assign these to
representative_sample
and
representative_mean
respectively.
Hint:
In class, we learned what kind of samples should be used to properly represent the population.
[31]:
representative_sample
=
earthquakes
.
sample(
200
)
representative_mean
=
np
.
mean(representative_sample
.
column(
'mag'
))
representative_mean
[31]:
5.3219000000000003
11
[32]:
grader
.
check(
"q4_2"
)
[32]:
q4_2 results: All test cases passed!
Question 3.
Suppose we want to figure out what the biggest magnitude earthquake was in 2019,
but we only have our representative sample of 200. Let’s see if trying to find the biggest magnitude
in the population from a random sample of 200 is a reasonable idea!
Write code that takes many random samples from the
earthquakes
table and finds the maximum
of each sample. You should take a random sample of size 200 and do this 5000 times. Assign the
array of maximum magnitudes you find to
maximums
.
[33]:
maximums
=
make_array()
for
i
in
np
.
arange(
5000
):
sample
=
earthquakes
.
sample(
200
)
sample_max
=
max
(sample
.
column(
'mag'
))
maximums
=
np
.
append(maximums, sample_max)
maximums
[33]:
array([ 6.8,
7. ,
7.1, …,
7.2,
7.1,
6.4])
[34]:
grader
.
check(
"q4_3"
)
[34]:
q4_3 results: All test cases passed!
[35]:
#Histogram of your maximums
Table()
.
with_column(
'Largest magnitude in sample'
, maximums)
.
hist(
'Largest
␣
↪
magnitude in sample'
)
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 4.
Now find the magnitude of the actual strongest earthquake in 2019 (not the maximum
of a sample). This will help us determine whether a random sample of size 200 is likely to help you
determine the largest magnitude earthquake in the population.
[36]:
strongest_earthquake_magnitude
=
max
(earthquakes
.
column(
'mag'
))
strongest_earthquake_magnitude
[36]:
8.0
[37]:
grader
.
check(
"q4_4"
)
[37]:
q4_4 results: All test cases passed!
Question 5.
Explain whether you believe you can accurately use a sample size of 200 to determine
the maximum. What is one problem with using the maximum as your estimator? Use the histogram
above to help answer.
The sample size of 200 is too small to accuratley determine the maximum magnitude of all the
earthquakes in 2019. The actual strgonest earthquake had a magnitude of 8, but according to the
histogram of our sample size of 200, the maximum is 7. This means that even after 5000 samples
of 200, we still did not come close to the actual strongest earthquakes. This is because a sample
size of 200 is too small compared to all the earthwuakes in 2019.
13
1.5
5. Assessing Jade’s Models
Games with Jade
Our friend Jade comes over and asks us to play a game with her. The game
works like this:
We will draw randomly with replacement from a simplified 13 card deck with 4 face
cards (A, J, Q, K), and 9 numbered cards (2, 3, 4, 5, 6, 7, 8, 9, 10). If we draw cards
with replacement 13 times, and if the number of face cards is greater than or equal to
4, we lose.
Otherwise, Jade wins.
We play the game once and we lose, observing 8 total face cards. We are angry and accuse Jade of
cheating! Jade is adamant, however, that the deck is fair.
Jade’s model claims that there is an equal chance of getting any of the cards (A, 2, 3, 4, 5, 6, 7,
8, 9, 10, J, Q, K), but we do not believe her. We believe that the deck is clearly rigged, with face
cards (A, J, Q, K) being more likely than the numbered cards (2, 3, 4, 5, 6, 7, 8, 9, 10).
Question 1
Assign
deck_model_probabilities
to a two-item array containing the chance of
drawing a face card as the first element, and the chance of drawing a numbered card as the second
element under Jade’s model.
Since we’re working with probabilities, make sure your values are
between 0 and 1.
[38]:
deck_model_probabilities
=
make_array(
1/2
,
1/2
)
deck_model_probabilities
[38]:
array([ 0.5,
0.5])
[39]:
grader
.
check(
"q5_1"
)
[39]:
q5_1 results: All test cases passed!
Question 2
We believe Jade’s model is incorrect. In particular, we believe there to be a larger chance of getting
a face card. Which of the following statistics can we use during our simulation to test between the
model and our alternative? Assign
statistic_choice
to the correct answer.
1. The actual number of face cards we get in 13 draws
2. The distance (absolute value) between the actual number of face cards in 13 draws and the
expected number of face cards in 13 draws (4)
3. The expected number of face cards in 13 draws (4)
[40]:
statistic_choice
= 2
statistic_choice
[40]:
2
[41]:
grader
.
check(
"q5_2"
)
14
[41]:
q5_2 results: All test cases passed!
Question 3
Define the function
deck_simulation_and_statistic
, which, given a sample size
and an array of model proportions (like the one you created in Question 1), returns the number of
face cards in one simulation of drawing a card under the model specified in
model_proportions
.
Hint:
Think about how you can use the function
sample_proportions
.
[68]:
def
deck_simulation_and_statistic
(sample_size, model_proportions):
model_proportions
=
make_array(
4/13
,
9/13
)
simulated_draws
=
sample_proportions(sample_size, model_proportions)
num_face_cards
=
simulated_draws
.
item(
0
)
*
sample_size
return
num_face_cards
deck_simulation_and_statistic(
13
, deck_model_probabilities)
[68]:
2.0
[43]:
grader
.
check(
"q5_3"
)
[43]:
q5_3 results: All test cases passed!
Question 4
Use your function from above to simulate the drawing of 13 cards 5000 times under the proportions
that you specified in Question 1. Keep track of all of your statistics in
deck_statistics
.
[45]:
repetitions
= 5000
deck_statistics
=
make_array()
deck_simulation_and_statistic(
13
, deck_model_probabilities)
deck_model_probabilities
=
make_array(
1/2
,
1/2
)
deck_model_probabilities
[45]:
array([ 11.,
5.,
6., …,
9.,
9.,
6.])
[46]:
grader
.
check(
"q5_4"
)
[46]:
q5_4 results:
q5_4 - 1 result:
Trying:
len(deck_statistics) == repetitions
Expecting:
True
**********************************************************************
15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Line 4, in q5_4 0
Failed example:
len(deck_statistics) == repetitions
Exception raised:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/doctest.py", line 1337, in __run
compileflags, 1), test.globs)
File "<doctest q5_4 0[0]>", line 1, in <module>
len(deck_statistics) == repetitions
NameError: name 'deck_statistics' is not defined
q5_4 - 2 result:
Trying:
all([0 <= k <= 13 for k in deck_statistics])
Expecting:
True
**********************************************************************
Line 4, in q5_4 1
Failed example:
all([0 <= k <= 13 for k in deck_statistics])
Exception raised:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/doctest.py", line 1337, in __run
compileflags, 1), test.globs)
File "<doctest q5_4 1[0]>", line 1, in <module>
all([0 <= k <= 13 for k in deck_statistics])
NameError: name 'deck_statistics' is not defined
Let’s take a look at the distribution of simulated statistics.
[ ]:
#Draw a distribution of statistics
Table()
.
with_column(
'Deck Statistics'
, deck_statistics)
.
hist()
Question 5
Given your observed value, do you believe that Jade’s model is reasonable, or is our
alternative more likely? Explain your answer using the distribution drawn in the previous problem.
Write your answer here, replacing this text.
1.6
Congratulations! You’re done with Homework 6!
Be sure to run the tests and verify that they all pass, then choose Download as PDF from the File
menu and submit the .pdf file on canvas.
16
Related Documents
Recommended textbooks for you

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Recommended textbooks for you
- EBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENTOperations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks ColeProgramming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:Cengage
- C++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,