Homework--2

pdf

School

University Of Chicago *

*We aren’t endorsed by this school

Course

13200

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

3

Uploaded by LieutenantCat7671

Report
Homework #2 Ana Guedes 2023-01-19 1. Consider the random process of flipping a fair coin three times. (1a) Write an R object, Omega , that is a vector whose elements describe the sample space in terms of heads and tails. E.g., three heads in a row could be described as ‘HHH’. omega <- c( "HHH" , "HHT" , "HTH" , "THH" , "TTH" , "THT" , "HTT" , "TTT" ) (1b) The random variable X that we’re interested in is the number of heads that we get from our random process. Write a data.frame object with two columns. One column, X , describes all of the possible number of heads we could get. The second column, probs , describes the probability each of these events occurs. Print your data.frame so that it shows in your report. Hint: the coin is fair, so each of the outcomes in the sample space above occurs with equal probability. Note how many heads we get in each outcome. Then look at the proportion of times we get no heads, one head, etc. These proportions are equal to the probability. x <- c( 0 , 1 , 2 , 3 ) probs <- c( 1 / 8 , 3 / 8 , 3 / 8 , 1 / 8 ) coins <- data.frame(x,probs) print(coins) ## x probs ## 1 0 0.125 ## 2 1 0.375 ## 3 2 0.375 ## 4 3 0.125 (1c) Calculate the mean of X. sum(coins$x*coins$probs) ## [1] 1.5 (1d) Write out code to simulate this random process, where the output is a single realization of the random variable (i.e., a number that represents the number of heads in your coin flips). Note: I set a random seed here, so that every time you recompile your assignment, you’ll get the same number. For analyses that involve sampling or random processes, it is really important to set a random seed so that you can get reproducible results. Feel free to change the seed number to anything you want. In general you should only set your random seed ONCE per script. 1
set.seed( 60637 ) sample( x = coins$x, size = 1 , prob = coins$probs) ## [1] 3 (1e) Now run your random process so you sample from it 10,000 times [PLEASE DON’T OUTPUT ALL 10,000 OBSERVATIONS IN YOUR HOMEWORK, just save it to an R object]. What is the average number of heads across these 10k observations? This is the sample mean for a given sample. . trials <- sample( x = coins$x, size = 10000 , prob = coins$probs, replace = TRUE) mean(trials) ## [1] 1.5062 (1f) Write your own function called mymean() to calculate the sample mean from a vector. Apply your function to your size 10k sample that you saved in the last problem. (Don’t use mean() inside your function, and don’t call the specific object you created in the last question inside your function. Your mymean() function should work when applied to any vector.) mymean <- function (x){sum(x)/length(x)} mymean(trials) ## [1] 1.5062 (1g) Re-run the code from 1f to get another length 10k sample from the same random process. [DON’T PRINT THIS WHOLE OBJECT.] Apply your my_mean() function to it. trials_2 <- sample( x = coins$x, size = 10000 , prob = coins$probs, replace = TRUE) mymean(trials_2) ## [1] 1.4985 2. Using the same random process of flipping three fair coins, code the random variable Y as 1 if we get three heads, and 0 otherwise. (2a) Write a data.frame object with two columns. One column, Y , describes all of the possible values of Y we could get. The second column, probs , describes the probability each of these events occurs. Print your data.frame so that it shows in your report. heads <- c( 0 , 1 ) prob <- c( 7 / 8 , 1 / 8 ) heads_df <- data.frame(heads,prob) print(heads_df) ## heads prob ## 1 0 0.875 ## 2 1 0.125 2
(2b) Write a new data.frame object that has three columns. Two columns, X and Y , jointly describe the values that X and Y can take on together. The third column, probs , describes the probability each of these pairs of events occurs jointly. Print your data.frame so that it shows in your report. x2 <- c( 0 , 1 , 2 , 3 ) y2 <- c( 0 , 0 , 0 , 1 ) joint_prob <- c( 1 / 8 , 3 / 8 , 3 / 8 , 1 / 8 ) new_df <- data.frame(x2,y2,joint_prob) print(new_df) ## x2 y2 joint_prob ## 1 0 0 0.125 ## 2 1 0 0.375 ## 3 2 0 0.375 ## 4 3 1 0.125 (2c) Report the conditional mean of X given that Y equals 0. Recall that conditional probability can be written as: P [ A | B ] = P [ AB ] P [ B ] #mean of x given y=0 #when y=0, x can be 0,1, or 2 x_interest <- sum(new_df$x2[which(new_df$y2== 0 )]*new_df$joint_prob[which(new_df$y2== 0 )]) total_prob <- sum(new_df$joint_prob[which(new_df$y2== 0 )]) print(x_interest/total_prob) ## [1] 1.285714 (3) Assume there is a drug that is used by 0.5% of the population. We also know that a blood test #’ for the drug has a 3% positive rate. Let’s further assume that if a person uses the drug, the test will return a positive result 95% of the time. If a person tests positive for the drug, what is the likelihood that she is actually a user of that drug. (Referring to the Bayes Rule example that we discussed in class will help you complete this problem.) #usage = 0.5% #positive rate = 3% #true positive = 95% #Using Baye ' s theorem we have the following logic: # Pr{ + | user } = Pr{ user | positive}*Pr{user}/Pr{positive} prob <- 0.95 * 0.005 / 0.03 print(prob) ## [1] 0.1583333 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help