Lab_Wk05_2023a (1)

docx

School

University of Wollongong *

*We aren’t endorsed by this school

Course

251

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

12

Uploaded by PresidentMusicCaterpillar33

Report
STAT251 Fundamentals of Biostatistics LABORATORY NOTES Week 5 Probability Distributions Aim: The focus of this lab is computing theoretical probabilities, means and standard deviations for discrete random variables, and simulating random observations from discrete probability distributions. 1. Discrete probability distributions Reference Table: Refer to this table to help you complete the Logbook questions: Table 1: Some properties of discrete distributions. General discrete distribution Binomial distribution (Week 4 Lectures) Poisson distribution (Week 4 Lectures) Assumptions 1. 0 ≤P ( X = x ) 0 0 ≤P ( X = x ) 1 2. ∑ P ( X = x ) = 1 1. Two possible outcomes, success and failure 2. Fixed number of trials, n 3. Trials independent 4. Fixed probability of success In sufficiently short time: 1. Only 0 or 1 events can occur. 2. Prob. of 1 event is proportional to the length of the interval. 3. Numbers of events in non-overlapping intervals are independent. Probability (X=x) P ( X = x ) p ( x ) = ( n x ) p x ( 1 p ) n x , where Cx n = ( n x ) = n! x! ( n x ) ! p ( x ) = λ x e λ x! Mean (Expected value) μ = E ( x ) = ∑ x p ( x ) E ( x ) = np E ( x ) = λ Standard deviation σ σ = ( x μ ) 2 p ( x ) np ( 1 p ) λ Variance σ 2 σ 2 = ( x μ ) 2 p ( x ) np ( 1 p ) λ 2. The Binomial Distribution 2.1 Binomial Probabilities Log book questions: 1. Check the capabilities of your calculator, and use it to evaluate a. 10! b. 0! c. 10 C 2 d. 10 C 0 2. Bill is very clumsy when it comes to using laboratory equipment. His supervisor has estimated that the probability that Bill will break at least one test tube in any laboratory classes in which he takes part is 0.3. The supervisor believes that this occurs independently of what happens in any other of Bill’s laboratory sessions. This semester, Bill will take part in 10 laboratory classes; the supervisor is interested in the number of labs in which at least one test tube was broken.
a. Can the binomial probability formula be applied in this scenario? Justify your answer. b. Define an appropriate binomial random variable X , and state the number of trials n . c. If Bill takes part in 10 laboratory classes, use a hand calculator to find the probability that Bill breaks test tubes in i. exactly two classes. ii. at least two classes. 2.2. Finding Binomial Probabilities using Jamovi Now we will use the set-up of logbook question 2 to demonstrate how binomial probabilities may be computed with Jamovi. In order to do so, we will need to install the distrACTION module. Installation instructions : On the top right corner of Jamovi, click on the Modules button, then select Jamovi library. A list of modules will appear. Click on the Available tab, then scroll down (about halfway) to find the distrACTION module and click Install . Once the distraction module has been installed, it will appear along the other modules on the top bar. To compute binomial probabilities, select distrACTION > Binomial Distribution . The binomial distribution has two parameters: size (the number of trials) and probability (the probability of success). For this example, set size = 10 and probability = 0.3. Now to calculate the probability of observing x successes, tick the Compute probability option under Function. In the x1 = ” box, we will enter the values of x (the number of successes) whose probabilities we want to compute. For this example, we will calculate the probability of having 0, 1, …, up to 10 successes. Note that you can only enter one value of x at a time. The resulting probabilities will appear in the output on the right-hand side. To reset the number of decimal places: Click on the 3 dots on top right of window (just above the Modules button. Change Results – Number format to 5.
Log book questions: 3. For question 2 above, a. Complete the table of outcomes (possible x values) by copying the associated probabilities from Jamovi. From the output check your answers in question 2 above. x P ( X = x ) 0 0.02825 1 2 3 4 5 6 7 8 9 10 Total 1 b. Find the probability that X = 3, P(X=3). c. Find the probability that X is less than 3, P(X<3). d. Find the probability that X is at least 3.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2.3 Expected Value and Standard Deviation of a Binomial Random Variable The expected value or mean E( X ) of a random variable is the value expected on average for a large number of repetitions of the random experiment. Due to sampling variation, an observed value of a random variable is often higher or lower than expected. This variation between observations is described by the standard deviation . Log book questions: 4. Use your calculator and the appropriate entries in the Reference Table on page 1, to find the a. expected value (mean); b. standard deviation of the number of laboratory classes during which Bill breaks at least one test tube. 2.4 What happens when we have data from a sample? When we have data that come from a population, we will not directly observe the EXACT expectations or SDs we have calculated from the THEORY . Why? Well, remember those theoretical values are for the whole population , or “on the long run”, hence sampling error (which is just the fact that we do not have the FULL POPULATION) will creep in. We will now check the theoretical calculations in Sections 2.2 and 2.3 by using simulated data. Simulated data are data that via the computer are generated as if they had been sampled from a population with the specified distribution behaviour. Dow\nload and open the “Xbinom##.csv” from Moodle where “##” is the month you were born. If you are really adventurous (and I suggest, after the lab) go to the appendix section where I show how to simulate the data yourself. 2.4.1 Using simulated Binomial Data We will now check the theoretical calculations in Sections 2.2 and 2.3 by using the simulated data Each dataset contains 3 samples, each of which contain 100 values from the Binomial distribution with number of trials n = 10 and probability of success p = 0.3. Now that we have a randomly generated sample from the binomial distribution with n = 10 and p = 0.3, we can compute summary statistics and a bar plot for this sample. Note: these 3 variables are the observed outcomes of a discrete random variable. However, to get the required graphical output in Jamovi, we need to keep the variable type as set to Nominal (this is not ideal but necessary within limitations of the package). To compute the mean and standard deviation, use Analyses > Exploration > Descriptives : put sample1, sample2, and sample3 into the Variables box. Under Plots , select Bar plot . Note: These are particular to your “month” randomly generated data values. Yours are likely to be different from a neighbour’s! Compare with your neighbour. Paste in your own! Log book questions: 5. For your randomly generated sample1 , write down the a. Sample mean. b. Sample standard deviation. c. Comment on these values in comparison to the theoretical mean and standard deviation calculated in Section 2.3. d. An estimate for p can be determined from the sample using the sample mean to estimate the mean µ ; that is, x≈np x = np can be rearranged to find an estimate of p. Use the sample mean of your sample1 to estimate p as ^ p = x / n . ( Note that here, n is the number of binomial trials for each random draw, not the number of random numbers you’ve generated, which Jamovi calls N .) Optional Log book question: 6. Compare the mean and standard deviation of your sample2 and sample3 against sample1 . Estimate p based on the value of sample2 and do the same for sample 3, then compare these estimated values of p to the theoretical ones in 2.2. Example output: Descriptives sample1 sample2 sample3 N 100 100 100
Descriptives sample1 sample2 sample3 Missing 0 0 0 Mean 3.3000 2.9400 3.0400 Standard deviation 1.3962 1.3395 1.5037 3. Poisson Distribution e = alpha + (x10x) 3.1 Poisson Probabilities Log book questions: The following example has been taken from Daniel, W. (1999) Biostatistics: A Foundation for Analysis in the Health Sciences, John Wiley & Sons: New York . In a study of suicides, Gibbons et al , found that the monthly distribution of adolescent suicides between 1977 and 1987 closely followed a Poisson distribution with parameter λ = 2.75 . Let X represent the number of suicides in a month. 7. Find the probabilities that: a. There are 2 adolescent suicides in a month. b. There are fewer than 2 adolescent suicides in a month. c. There are 2 or more adolescent suicides in a month. d. What are the mean and the standard deviation of random variable X ? 8. How would you answer questions 7 (a) – (d) for a two-month period? (i.e., what is the probability that there is a total of two adolescent suicides in two months?) 3.2 OPTIONAL: Simulating Poisson Data Again if you feel adventurous try the simulation of data, but I suggest to do it after the lab, seeing appendix A.2 Optional question: Find the mean and standard deviation and include the ‘bar plot’. Comment on the mean and standard deviation with reference to your answer in question 7(d). 4. General Discrete Random Variable Log book questions : 9. Consider all the possible outcomes of tossing two 4-sided dice, each labelled from 1 to 4. Now define the random variable X as the maximum of the two numbers obtained on each possible outcome. a. Draw a tree diagram to list the 16 possible outcomes of tossing two 4-sided dice. b. What are the possible values that X can take? c. Determine how frequently each unique value of X appears given the list of all possible outcomes of tossing two 4-sided dice. d. Using your answers to (a), (b), and (c), determine the probability distribution of X , and write it in the following table. x 1 2 3 4 P(X = x) e. Calculate the mean, variance and standard deviation of X using the formulae in the Reference Table in Section 1.
Continuous Random Variables: The Normal Distribution Aim: The aim of this part of the lab is to determine probabilities and quantiles for the normal distribution, determine and interpret confidence intervals for population means. The normal and t-distributions are required for some of these intervals. Note: Starred (*) exercises do not require Jamovi. 1. The Normal Distribution Mean μ and standard deviation σ. 1.1 Finding probabilities using the properties of the normal curve. Log book questions* : 1. For the following questions: match the interval with the appropriate diagram of the shaded area under the Normal curve; and use the diagram above to calculate the area shaded. i. Mean = 0 and sd = 1; between -1 and 3. ii. mean = 0 and sd = 1; between -1 and 1. iii. mean = 55 and sd = 4; between 47 and 59. a) iv. μ = 30 and σ = 5; between 30 and 35. v. μ = 100 and σ = 15; less than or equal to 100. vi. μ = 546.6 and σ = 73.1; between 619.7 and 692.8. b) c) d) e) f)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1.2 The Standard Normal Distribution The Standard Normal distribution has a mean μ = 0 and a standard deviation σ =1. To calculate probabilities from the normal distribution in Jamovi, we need the distrACTION module. Instructions: 1. Click on distrACTION > Normal distribution . 2. Specify the mean and standard deviation under Parameters. 3. To compute the probability P(X ≤ x 1 ), tick the Compute probability option. 4. Specify the value x 1 . 5. The resulting probability will appear on the right hand side output under Results . Alternatively, you can also use the online calculator in Appendix A to calculate probabilities from the normal distribution. Log book questions : 2. For the following questions: Illustrate on the curve the area represented by the probability. Note: If using Word, use the Insert -> Shapes function, draw the line to outline the region of in- terest, and use a shape (e.g., a star) to mark the area you need. Alternatively, print these pages out and draw and shade by hand. Find the probability using Jamovi, an online calculator, or the Standard Normal Distribution Ta- bles (on the Moodle site under Probability Calculators and Statistical Tables ). Reference: derived from those in the Stat131 Laboratory Manual compiled by Assoc. Prof. Anne Porter. STAT 251 Laboratory Notes Week 5 7
a. P ( Z <-1.2)= b. P ( Z >-1.2) = c. P ( Z >1.8)= d. P (-1.2< Z <1.8)= e. Show P ( Z <-1.96) = P ( Z >1.96) STAT 251 Laboratory Notes Week 5 8
1.3 Standardising values and finding probabilities for a given distribution Two steps are involved: 1. Standardise the x values to find the corresponding z-score: z = x μ σ 2. Use the z -score to find the probabilities. Log book questions : 3. The right hand span of males is normally distributed with a mean of 27 cm and standard deviation of 4.5 cm. Let X denote the right hand span of males. For parts (a) to (e), complete the following steps: Determine the z -score. Illustrate on the curve the area represented by the expression. Determine the probability requested. a. P (X < 22.5) = Z b. P ( X > 22.5) = Z c. P ( X < 19.5) = Z STAT 251 Laboratory Notes Week 5 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
d. P (19.5 < X < 22.5) = Z e. P ( X < 19) or P ( X > 23) = Z STAT 251 Laboratory Notes Week 5 10
APPENDIX : Simulating the data A.1 Simulating Binomial Data We will now check the theoretical calculations in Sections 2.2 and 2.3 by simulation. Simulating Binomial Data requires the Rj module, which you can install by clicking the Modules button on the top right of the Analyses screen, and selecting jamovi library . Find the Rj module and click INSTALL . Once the Rj module has been installed, a new icon labelled ‘ R ’ should appear along the taskbar next to the other modules. Click on this button and select Rj Editor . Now, to simulate samples from the binomial distribution, copy and paste the following block of code into notepad or textedit, then copy and paste again (or type it all) intothe Editor window: NOTE: important to copy paste first to a simple text editor as WORD adds weird stuff that we can not see. And hence if you copy paste directly you may run into trouble. for(i in 1:12) { sample1 <- rbinom(100, 10, 0.3) sample2 <- rbinom(100, 10, 0.3) sample3 <- rbinom(100, 10, 0.3) XBinom <- data.frame(sample1, sample2, sample3) write.csv(XBinom, file = paste("…/Xbinom",i,".csv",sep=""), row.names = FALSE) } Replace the “…” in the last line with the location on your device where you would like to save the samples. For example, you could write file = " C:/Users/Documents/STAT251/XBinom.csv " . Make sure the slashes in your file path are forward slashes (/) and NOT backward slashes (\). The first 3 lines generate 3 samples, each of which contain 100 values from the Binomial distribution with number of trials n = 10 and probability of success p = 0.3. These variables are then stored in a “data frame” (data table) and then saved to a XBinom ##.csv (## goes from 1 to 12) files to the specified location on your device. Click the button to run the code. Once you have run the code, a XBinom##.csv file containing the binomial samples should appear in the location you just specified. In your Jamovi screen, click the triple-barred icon on the top left, select Open, Use Browse to navigate to the location you specified, and open the new file. Now that we have computed a randomly generated sample from the binomial distribution with n = 10 and p = 0.3, we can compute summary statistics and a bar plot for this sample. STAT 251 Laboratory Notes Week 5 11
Note: these 3 variables are the observed outcomes of a discrete random variable. However, to get the required output in Jamovi, we need to keep the variable type as set to Nominal (this is not ideal but necessary within limitations of the package). A.2 Poisson simulation If time permits, repeat the steps in A.1 to simulate a sample of 100 from the Poisson distribution with λ = 2.75 . Note that the Jamovi procedure is exactly the same, except instead of rbinom , we use rpois for the Poisson distribution, and specify the value for lambda . In Jamovi, click on the R module and select Rj Editor . Then copy and paste the following block of code into the editor window: Instead of sample1 <- rbinom(100, 10, 0.3) Do sample1 <- rpois(100, lambda = 2.75) And do this for sample2 and sample3 Change XBinom to Xpois wherever you saw it in the previous code Make sure to replace the “…” in the filepath with the path where you want the file to be saved. OK here is the full code for(i in 1:12) { sample1 <- rpois(100, lambda = 2.75) sample2 <- rpois(100, lambda = 2.75) sample3 <- rpois(100, lambda = 2.75) XPois <- data.frame(sample1, sample2, sample3) write.csv(XPois, file = paste("…/XPois",i,".csv",sep=""), row.names = FALSE) } STAT 251 Laboratory Notes Week 5 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help