Understanding Data Proportions and Computer Simulations in Data 8

Data 8 - hw06 - email@berkeley.edu **Question 5.** Check the value your simulation computed for `data_proportion`. Is your simulation a good way to estimate the chance that the monkey types `"Data8"` in 5 strikes (the answer to question 1)? Why or why not? No, this is not a good way to estimate the chance that the monkey types "Data8" because my estimation returns a value of 0. There is only a very very small chance that a money types "Data8;" in fact, the chance is very close to 0. The simulation is not large enough in size for this rare event to occur, nor does it provides an accurate estimation to test whether the chance we predicted is correct. Page 1

Data 8 - hw06 - email@berkeley.edu **Question 7.** Do you think that a computer simulation is more or less effective to estimate `t_chance` compared to when we tried to estimate `data_chance` this way? Why or why not? (You don't need to write a simulation, but it is an interesting exercise.) A computer simulation is more effective to estimate t_chance than data_chance this way. There is a bigger chance for a monkey to type "t" than "Data8," therefore, we can see whether or not the money typed a "t" in 5 tries more easily. This reduces the amount of data we and the computer has to work through and makes it easier to visualize. Page 2

Data 8 - hw06 - email@berkeley.edu **Question 2.** Create a table called `full_data_with_value` that's a copy of `full_data`, with an extra column called `"Value"` containing each player's value (according to our crude measure). Then make a histogram of players' values. **Specify bins that make the histogram informative and don't forget your units!** Remember that `hist()` takes in an optional third argument that allows you to specify the units! Refer to the python reference to look at `tbl.hist(...)` if necessary. *Just so you know:* Informative histograms contain a majority of the data and **exclude outliers** bins = np.arange(0, 0.7, .1) # Use this provided bins when you make your histogram value = (full_data.column(3) - 2 * full_data.column(2)) / full_data.column(4) * 100000 full_data_with_value = full_data.with_column("Value", value) full_data_with_value.hist("Value", bins = bins, unit = "none two points per $100000") Page 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Data 8 - hw06 - email@berkeley.edu **Question 3.** Make a histogram of the values of the players in `sample_salary_data`, using the same method for measuring value we used in question 2. Make sure to specify the units again in the histogram as stated in the previous problem. **Use the same bins, too.** *Hint:* This will take several steps. sample_data = player_data.join('Player', sample_salary_data, 'Name') value = (sample_data.column(3) - 2 * sample_data.column(2)) / sample_data.column(4) * 100000 sample_data_with_value = sample_data.with_column("Value", value) sample_data_with_value.hist("Value", bins = bins, unit = "none two points per $100000") Page 4

Data 8 - hw06 - email@berkeley.edu **Question 5.** For which range of values does the plot in question 3 better depict the distribution of the **population's player values**: 0 to 0.3, or above 0.3? Explain your answer. The plot in question 3 better depict the distribution of the population's player values for the range of 0 to 0.3. This is because more players have a value between 0 to 0.3, therefore being more likely to be selected in a random sample of 50 players. There are so few players with a value above 0.3 that their population is hard to be accurately represented by the small sample. Page 5

Data 8 - hw06 - email@berkeley.edu **Question 1.** Are these samples representative of the population of earthquakes in the original table (that is, the should we expect the mean to be close to the population mean)? *Hint:* Consider the ordering of the `earthquakes` table. These samples are not representative of the population of earthquakes in the original table. The first sample is taking the 100 earthquakes with the largest magnitude, thus not representing the total population, making the mean higher than the mean of the population. The second sample is not a random sample either, therefore making it not very representative of the population either. But it is more representative of the first sample because it didn't sort the table Page 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Data 8 - hw06 - email@berkeley.edu **Question 5.** Explain whether you believe you can accurately use a sample size of 200 to determine the maximum. What is one problem with using the maximum as your estimator? Use the histogram above to help answer. A sample size of 200 cannot accurately determine the maximum. The actual maximum is 8, whereas the maximum estimated is most likely to be between 7 and 7.75. There is 200/1636 of chance for the data point 8 to be selected. A maimum is also an extreme value that doesn't represent the rest of the population, therefore using a sample of 200 is not an accurate way to evluate it. Page 7

Data 8 - hw06 - email@berkeley.edu #### Question 5 Given your observed value, do you believe that Jade's model is reasonable, or is our alternative more likely? Explain your answer using the distribution drawn in the previous problem. I believe that our alternative is more likely because there is a very slim chance of getting 8 face cards, about 2%. This chance is smaller than the p-value of 5%, so we can conclude that we did not drew 8 face cards because of random chance. Therefore, the deck of cards is not fair Page 8

data 8 hw06

Related Documents