Notes4-RandomVariables_S

docx

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

324

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by AmbassadorDanger11020

Statistics University of Wisconsin Madison – Chelsey Green chelseygreen@wisc.edu N OTES 4: D EFINING R ANDOM V ARIABLES AND T WO C OMMON ONES Q UANTIFYING THE POSSIBLE OUTCOMES OF A STUDY Before a study is performed, we do not know with certainty what outcome will be observed. Sometimes outcomes have qualitative descriptions (E.g.: moderate, severe, or no nausea in response to a medicine) and sometimes numerical values (E.g.: length of bolt produced). By focusing on the numeric summaries of qualitative data, we can associate a numerical value with each outcome of an experiment. In these notes we will learn some theoretical tools to describe characteristics of the population we are drawing from and also the numeric outcomes of some studies. We will also apply the probability rules learned in Notes 3 and continue to develop our simulation and coding skills. R ANDOM V ARIABLE T ERMINOLOGY A random variable (RV) associates a numerical value with each outcome of a random process. It is customary to denote random variables with uppercase letters when considering all values it could take. It is called “random” because we don’t know the value observed until the experiment is completed. *You can think of the RV as the population of values that could be observed. E.g. Many measurements are taken on babies moments after birth. Examples of a few random variables of interest for newborn babies include: W=birth weight, X=Apgar score, Y=length of baby at birth, N=number of medical staff in room, G=number of hemlock seeds that germinate out of 4 A realization of the RV is the value that is observed when the experiment is performed/recording is made. Realizations are usually denoted by lower-case letters. *You can think of a sample as a collection of realizations of a RV. E.g.: w=3,321 grams is the weight of a recently born baby; x=9 is the Apgar score for a recently born baby, y1=20.0, y2=19.3, y3=19.4 are the lengths of the last 3 babies born, g=2 if only 2 of the 4 hemlock seeds germinate Weld Failures Example: According to past study of weld failures in a certain assembly, 85% of them occur in the weld metal itself, 10% occur in the base metal, and the cause is unknown in 5% of failures. Consider a scenario where 3 weld failures in this type of assembly are observed. 1 Weld Metal Failure 0.85 Base Metal Failure 0.10 Unknown Failure 0.05

Statistics University of Wisconsin Madison – Chelsey Green chelseygreen@wisc.edu Weld Failures Example a: Define a random variable X: the number of weld failures caused by the weld metal in the 3 failures observed . Discuss what we know about this random variable before and after the experiment is conducted. X: the number of weld metal cause failures in the observed. Before: After: Types of Random Variables A RV is called discrete if it has a countable number of values. If the values are arranged in order, there is a gap between each value and the next. The set of possible values may be infinite. E.g.: X: The Apgar score is on a scale from 0-10 based on skin condition, heart rate, muscle tone, breathing, and response when stimulated. Also, N: the number of nurses in the room for a procedure is a discrete random variable. A RV is called continuous if it is capable of taking an uncountable number of values in an interval. It represents some measurement on a continuous scale. E.g.: T: the daily maximum temperature in Madison, WI can be measured to any precision. Similarly for birthweight, W. D ESCRIBING THE V ALUES O F A D ISCRETE R ANDOM V ARIABLE A probability distribution of a random variable consists of the RV’s possible values along with the probabilities of realizations occurring. The descriptions of the possible values and probabilities can take the form of a probability histogram, table (discrete RV only), or formula. *probability distributions are often approximated from empirical studies Probability Mass Function (pmf) is the probability distribution for a discrete random variable and is a list of values that can be obtained, together with the probabilities of each value. *Values are mutually exclusive *Each value has a probability between 0 and 1 *Sum of the probabilities is 1 E.g.1: We can write out the probability mass function for the random variable F: number of dots on the face that lands up when rolling a fair 6-sided die. There are 6 possible outcomes: [1,2,3,4,5,6] which we assume are equally likely so each outcome has a probability of 1/6. E.g.2: Researchers recorded the Apgar scores of over 2 million newborn in a single year. The approximate probability mass function for Apgar score (X) 2 F 1 2 3 4 5 6 P(F= f) 1/6 1/6 1/6 1/6 1/6 1/6

Statistics University of Wisconsin Madison – Chelsey Green chelseygreen@wisc.edu based on these 2 million newborns is given below. Each probability is based on the relative frequency observed in the 2 million newborns (ex, 2% of newborns in the 2 million had an Apgar score of 5). X 0 1 2 3 4 5 6 7 8 9 10 P(X= x) 0.00 1 0.00 6 0.00 7 0.00 8 0.01 2 0.02 0 0.03 8 0.09 9 0.31 9 0.43 7 0.05 3 Weld Failures b: According to past study of weld failures in a certain assembly, 85% of them occur in the weld metal itself, 10% occur in the base metal, and the cause is unknown in 5% of failures. Consider a scenario where 3 weld failures in this type of assembly are observed. Complete the probability distribution for X: the number of weld metal caused failures in the observed weld failures . What assumptions are we making in our calculations? *Consider weld metal caused failure a Success (S) and all other outcomes Failure (F) Meaning x P(X=x) 0 weld metal , 3 other P(x=0)=P(FFF)=.15*.15*.15 =0.003375 1 weld metal , 2 other 1 2 weld metal , 1 other 2 P(x=2)= P(SSF or SFS or FSS)=3*.85^2*.15^1=0.325125 3 Notice: P(X=0)+P(X=1)+P(X=2)+P(X=3)= 3 Weld Metal Failure 0.85 Base Metal Failure 0.10 Unknown Failure 0.05

Your preview ends here