Hughes,Madeleine_HSCI_190_Homework3

docx

School

Queens University *

*We aren’t endorsed by this school

Course

HSCI 190

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

8

Uploaded by AmbassadorNeutronWolverine39

Report
Module 03 Homework Problems Note About Module 03 Homework: Module 03 homework is slightly different from previous modules. This week you do not need to use Excel or SPSS as these questions are more about connecting concepts and practicing some calculations/logic by hand. For this reason, you only need to submit one summary document for your Module 03 homework. Be sure to show the formulas/your work for each question and upload the document as a PDF. 1a. Explain the differences between independent and dependent events. Give original examples of each (i.e., different than in the module). (1 mark) Independent events refer to events in which the occurrence of one event does not play a role in the in occurrence of another. For example, if a person were to roll a dice, the probability of rolling a two on the first attempt at a toss has no impact on the result of any subsequent dice rolls; thus, two separate attempts at rolling one same dice can be considered independent events. In contrast, dependent events refer to events in which the occurrence of one event does play a role in the occurrence of another. For example, the probability of a person being arrested and sent to jail may depend on the probability of that person committing a crime; the occurrence of committing a crime influences (i.e. increases, in this case) the probability of that same person being arresting and ending up in jail. Ultimately, the difference between dependent and independent events centres around whether the occurrence of one event influences the occurrence of another (dependent), or not (independent). 1b. Explain what it means for an event to be mutually exclusive. (1 mark) Mutually exclusive events, or disjoint events, refers to events that cannot occur at the same time. 1c. Provide an original example of two events that are mutually exclusive to one another. (1 mark) Consider the following events: - The suit of a randomly drawn card from a given deck is spades. - The suit of a randomly drawn card from a given deck is clubs. As it is impossible for a single card from any given deck to be more than one distinct suit, the above-mentioned events qualify as mutually exclusive (i.e., they cannot occur simultaneously). 1
2. State whether the scenarios below are an example of a complement, an intersection, or a union. Explain your reasoning. (Note: you do not have to solve for the probability) (3 marks) a. The probability of having hypertension is 21%, and the probability of having acute kidney failure is 15%. What is the probability of having both hypertension and acute kidney failure? A union refers to the probability of an event, or of another event, or of both events occurring. In this case, the above scenario is looking to find the probability of having both hypertension and acute kidney failure. As such, they are looking for the combined probability of both events occurring, which makes this a perfect example of an intersection. b. The probability of having tuberculosis (TB) in Ontario is currently 2%. If anyone in a household has TB, they all must isolate. What is the probability that a household of 3 must isolate? An intersection refers to the probability of both of two events occurring. This scenario, which asks for the probability that a household of 3 must isolate, is looking to identify the probability of a person living in a household of three and of having TB (i.e., the probability of both two events occurring). The scenario is, therefore, an example of an intersection. c. The probability of having type 1 diabetes or type 2 diabetes is 29%. What is the probability of being nondiabetic (Note: for the purposes of this question, consider type 1 and 2 as the only types of diabetes) A complement refers to the probability of an event not occurring. Seeing as being nondiabetic entails the fact of not being diabetic, this scenario is expressing the probability of event A not occurring, if event A refers to the probability of having type 1 or type 2 diabetes. Thus, this is an example of a complement. 3. A researcher is collecting data on maternal smoking and giving birth to a low birth weight (LBW) baby (defined by birth weight <2500g). The participant data is summarized in the table. Calculate the probabilities for the questions below. 2
*Note: think of the frequentist definition and use the findings directly from the table to calculate the probabilities a. What is the probability of giving birth to LBW baby? (1 mark) If event A is giving birth to LBW baby: P ( A ) = m n = 39 600 = 0.065 b. What is the probability of a mother not being a smoker? (1 mark) If event B is a mother not being a smoker: P ( B ) = m n = 502 600 = 0.837 c. What is the probability of a mother being a non-smoker and having a LBW baby? (1 mark) If event C is a mother being a non-smoker and having a LBW baby: P ( C ) = P ( A ) x P ( B ) = ( 39 600 ) × ( 502 600 ) = 0.055 d. Given that a mother is a smoker, what is the probability that they give birth to a LBW baby? (1 mark) If event D is a mother being a smoker and giving birth to a LBW P ( D ) = P ( A ) x [ 1 P ( B ) ] = ( 39 600 ) × ( 1 502 600 ) = 0.011 Table 1. Participant Characteristics LBW Non-LBW Total Maternal Smoker 28 70 98 Maternal Nonsmoker 11 491 502 Total 39 561 600 4. Compare the Binomial, Poisson, and Gaussian distributions. Identify their similarities, differences, and when each one is used. (1 mark) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The Binomial, Poisson, and Gaussian are all ways of visualizing discrete and continuous variables. These distributions all plot the probability of a variable; as such, the sum of the area under the bars in each figure (i.e., Binomial, Poisson and Gaussian distributions) is equal to 1. Notwithstanding their similarities, each of the above-mentioned probability distributions have differences that are specific to the circumstance designed for their uses, often with regard to the type of event and the number of independent trials. o Binomial distributions, or Bernoulli distributions, plot the probability of discrete, dichotomous, and mutually exclusive variables over a fixed number of independent trials. For example, shooting a basketball into a hoop 10 times can be represented by a binomial distribution; the events of making a basket and of missing the shot are mutually exclusive, the variable in question presents only 2 possible outcomes (i.e., miss or making the basket), and the number of trails is fixed at 10. o Poisson distributions also plot the probability of binomial (i.e., discrete, dichotomous and mutually exclusive) events, however, these must occur infrequently, and the number of independent trials is not restricted. For example, the number of people struck by lightning in any given year can be represented by a Poisson distribution; this variable presents only two possible outcomes that cannot mutually occur (i.e., being struck by lightning or not), the number of trials in this case would not be restricted, and the event is, in itself, quite rare. o Gaussian distributions, or normal distributions, plot the probability of continuous and random variables which have an infinite number of possible outcomes. The most frequent observations are shown in the middle, and the least frequent observations on either side of these. For example, the heights of a given population can be plotted as a Gaussian distribution; this variable has an infinite number of possible outcomes, as height is unfixed and quite variable. The most common heights within a population would be shown in the middle of the distribution, with the less frequent height measurements within the same population shown on either side – this gives Gaussian distributions the appearance of a bell-shaped curve. 5. Compare point estimation, interval estimation, and hypothesis testing. Describe how these concepts are similar and how they are different from one another? (2 marks) Both point and interval estimation, as well as hypothesis testing, fall under the umbrella of inferential statistics as general categories which are utilized in identifying something about a larger population with the help of a sample. o Estimation looks to estimate an unknown population parameter (i.e., a certain characteristic or calculation of an entire population). As mentioned, this category of inferential statistics can be grouped into two strategies: point and interval estimation. 4
Point estimation utilizes one single value to estimate a population parameter. For example, using the mean of a population sample to estimate the true population mean. This strategy is not always accurate, as samples of a population often vary due to random chance. Point estimation is not indicative of sampling variability. Interval estimation utilizes a range of values that likely contain a population parameter. This range can also be referred to as confidence intervals, or CIs. Confidence intervals consist of a point estimate as well as a range of values that estimate the variability around the point estimate to some degree of confidence – typically, 95% is used as the level of confidence. This means the confidence that the true population mean falls within a range of values is 95%. As per the above-mentioned estimated range of values and degree of confidence with regard to the true population mean, an interval estimation is less likely to be wrong than is a point estimation. o Hypothesis testing (i.e., the other general category of statistical inference) makes a decision about the value of an unknown population parameter using a sample of data, along with probability. This is another method used to estimate a population parameter; when using hypothesis testing, one can either reject the null hypothesis (i.e., suggest that the true population mean is not equal to the hypothesized population mean), or fail to reject the null hypothesis (i.e., suggest that the true population mean is equal to the hypothesized population mean). Ultimately, each of these categories of inferential statistical uses samples to make inferences about the larger population in question. 6. A researcher randomly samples 100 people who had a certain subtype of influenza A at Kingston General Hospital in the last month. For this sample, the mean time of being symptomatic was 8.6 days and the standard deviation was of 3.3 days. A) Assuming this data follows a normal distribution, what is the 95% confidence interval? (1 mark) CI = ( 8.6 1.96 ( 3.3 100 ) , 8.6 + 1.96 ( 3.3 100 ) ) =( 7.95 , 9.25 ) B) A patient comes into Kingston General Hospital with symptoms similar to the subtype of influenza A. They say they have been symptomatic for 15 days. How do you interpret this information given A? Do you think this patient is from the same population as the study? (1 mark) 5
The 95% level of confidence for the symptomatic period of the entire population of patients with a certain subtype of influenza A was found as 8.6 0.65 days, or (7.95, 9.25) days. As such, there is a 95% certainty that the true symptomatic period mean of the subtype of Influenza A falls between 7.95 days and 9.25 days. Seeing as the patient in question presented as symptomatic for 15 days, the calculated confidence level suggests that it is unlikely that a patient with this subtype of Influenza A is symptomatic for up to 15 days, as the average symptomatic period is likely much shorter. The patient has perhaps been infected by a different subtype of Influenza A or is not sick with Influenza at all. As such, I don’t believe the patient is from the same population study. 7. You are interested in exploring the relationship between serum homocysteine levels and endometriosis. You collect data from 100 randomly selected people with endometriosis. Your sample has a mean serum homocysteine level of 47 µmol/L. You know that the mean homocysteine for healthy people who can develop endometriosis (ie. Have an endometrium) is 25 µmol/L with a standard deviation of 15 µmol/L. Use hypothesis testing to determine whether your sample (mean 47 µmol/L) has different levels of homocysteine than healthy people who can develop endometriosis (ie. We want to know whether those with endometriosis have different levels of homocysteine than the population of people who can develop endometriosis but are healthy) a. Identify a null and alternative hypothesis for a two-sided test . (1 mark) Null hypothesis - H 0 : µ = 25µmol/L Alternate hypothesis – H 1 µ 25µmol/L b. Calculate a z score for this data and depict the z score on a standard normal curve. You are welcome to draw the distribution on your computer or draw it by hand. Include the image in your summary document. (1 mark) z = X µ = 47 25 15 = 1.4 6 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c. Using the standard normal chart provided below, identify the area to the left and right of your z score. (1 mark) Area to the left = 0.9279 Area to the right = 1 - Area to the left = 1 - 0.9279 = 0.0721 d. Assuming this is a two-sided test, identify your p value. Based on your findings, identify whether you would reject or fail to reject the null hypothesis with a significance level of α = 0.05. (2 marks) p = 0.0721 × 2 = ¿ 0.1442 Given that p is greater than the significance level of 0.05, I would fail to reject the null hypothesis and can conclude that mean level of homocysteine from the collected sample of people with endometriosis is likely consistent with the levels of homocysteine from the population of people who can develop endometriosis but are healthy. 7
8