Homework1

docx

School

University of Florida *

*We aren’t endorsed by this school

Course

4504

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

9

Uploaded by gogoyes1234

Report
STA4504 Homework 1, Spring 2021 Please turn in your own work, though you may discuss the problems with classmates, the TA, the Professor, the internet, etc. The most important thing is that you understand the problems and how they are solved as they will prepare you for future exams. Please turn in your work by Thursday, February 4th at midnight. Question 1: (Textbook problem 1.3) Each of 100 multiple-choice questions on an exam had four possible answers but one correct response. For each question, a student randomly selects one response as the answer. a. Specify the probability distribution of the student’s number of correct answers on the exam. Binomial distribution with n=100 and π =0.25 b. Based on the mean and standard deviation of that distribution, would it be surprising if the student made at least 50 correct responses? Explain your reasoning. μ = 100 × 0.25 = 25 σ = 100 × 0.25 × ( 1 0.25 ) = 4.33 Z test statistic=(50-25)/4.33=5.77 It would be surprising if the student made at least 50 correct response because 50 is 5.77 standard deviations above the mean. Question 2: (Textbook problem 1.6c) Genotypes AA, Aa, and aa occur with probabilities ( π 1 , π 2 , π 3 ). For n = 3 independent observations, the observed frequencies are ( y 1 .y 2 , y 3 ). Suppose ( π 1 , π 2 , π 3 ) = (0 . 25 , 0 . 5 , 0 . 25). What probability distribution does y 1 alone have? y 1 alone has the binomial distribution with sample size 3 and parameter of the distribution π 1 =0.25
Question 3: (Textbook problem 1.8) When the 2010 General Social Survey asked subjects in the US whether they would be willing to accept cuts in their standard of living to protect the environment, 486 of 1374 subjects said yes. a. Estimate the population proportion who would say yes. Construct and interpret a 99% confidence interval for this proportion. The sample proportion who would say yes=π π =486/1374=0.354 The standard error= 0.354 ( 1 0.354 ) 1374 = 0.0129 Z 0.005 =2.58 based on the standard normal distribution table 99% confidence interval=0.354±2.58*0.0129=(0.321,0.387) We are 99% confident that the population proportion who would say yes is between 0.321 and 0.387. b. Conduct a significance test to determine whether a majority or minority of the population would say yes. Report and interpret the P-value. Let H o : π=0.5 , H a : π ≠0.5 Sample proportion who would say yes=0.354 Test statis tic= 0.354 0.5 0.5 × ( 1 0.5 ) 1374 = 0.146 0.0135 =− 10.815 Based on the standard normal distribution table, the corresponding p-value for the test statistic -10.815 is <0.0001. Two-sided p-value is <0.0001. Under the null hypothesis, it is unlikely to obtain observed result or extreme. At the 1% significance level, there is enough evidence to reject the null hypothesis. There is strong evidence that the minority of the population would say yes.
Question 4: (Textbook problem 1.12) To collect data in an introductory statistics course, I gave the students a questionnaire. One question asked whether the student was a vegetarian. Of 25 students, 0 answered yes. They were not a random sample, but use these data to illustrate inference for a proportion. Let π denote the population proportion who would say yes. Consider H 0 : π = 0 . 50 and H a : π ̸ = 0 . 5 a. What happens when you conduct the Wald test, which uses the estimated standard error in the z test statistic? Sample proportion who answered yes π ¿ =0/25=0 Z test statistic= 0 0.5 0 × ( 1 0 ) 25 =− Z test statistic does not give a specific test statistic because the estimated standard error is 0 and the test statistic ∞ . But is in the critical region, so there is enough evidence to reject the null hypothesis at any significance level. b. Find the 95% Wald confidence interval for π . Is it believable? 95% Wald confidence interval =0±1.96*0=(0,0) It does not contain any range, so it is not believable. c. Conduct the score test, which uses the null standard error in the z test statistic. Report and interpret the P-value. For the score test, Z test statistic= 0 0.5 0.5 × ( 1 0.5 ) 25 = 0.5 0.1 =− 5 P-value for the two sided test= 2*P(Z>|z|)=2*0=0 p-value is 0, there is enough evidence to reject the null hypothesis at the 5% significance level.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
There is a strong evidence that the population proportion who would say yes is not 0.5. d. Verify that the 95% score confidence interval equals (0 . 0 , 0 . 133). (This is similar to the interval (0 . 0 , 0 . 137) obtained with a small-sample method of Section 1.4.3, inverting the binomial test with the mid P-value.)
× Question 5: (Textbook problem 2.2) For diagnostic testing, let X = true status (1 = disease, 2 = no disease) and Y = diagnosis (1 = positive, 2 = negative). Let π 1 = P ( Y = 1 X = 1) and π 2 = P ( Y = 1 X = 2). Let γ denote the probability that a subject has the disease. a. Given that the diagnosis is positive, use Bayes’ Theorem to show that the probability a subject truly has the disease is P ( X = 1 | Y = 1) = π 1 γ /[ π 1 γ + π 2 (1− γ )]. π 1 =P(Y=1|X=1) π 2 =P(Y=1|X=2) γ =P(X=1) P(X=1|Y=1)= P ( X = 1 ,Y = 1 ) P ( Y = 1 ) = P ( X = 1 ,Y = 1 ) P ( X = 1 ,Y = 1 ) + P ( X = 2 ,Y = 1 ) = P ( X = 1 ) P ( Y = 1 X = 1 ) P ( X = 1 ) P ( Y = 1 | X = 1 ) + P ( X = 2 ) P ( Y = 1 x = 2 ) = γ π 1 γ π 1 +( 1 γ ) π 2 b. For mammograms for detecting breast cancer, suppose γ = 0 . 01, sensitivity = π 1 = 0 . 86, and specificity = 1− π 2 = 0 . 88. Find the positive predictive value. The positive predictive value=P(X=1|Y=1)= γ π 1 γ π 1 +( 1 γ ) π 2 = 0.01 0.86 0.01 0.86 + ( 1 0.01 ) 0.12 =0.0086/0.1274=0.0675 c. To better understand the answer in (b), find the joint probabilities for the 2 2 cross-classification of X and Y . Discuss their relative sizes in the two cells that refer to a positive test result. P(X=1,Y=1)=P(X=1)P(Y=1|X=1)=0.01*0.86=0.0086 P(X=1,Y=2)=P(X=1)P(Y=2|X=1)=0.01*(1-0.86)=0.0014 P(X=2,Y=1)=P(X=2)P(Y=1|X=2)=0.99*(1-0.88)=0.1188 P(X=2,Y=2)=P(X=2)P(Y=2|X=2)=0.99*0.88=0.8712
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Disease Diagnosis Total Y=1 Y=2 X=1 0.0086 0.0014 0.01 X=2 0.1188 0.8712 0.99 total 0.1274 0.8726 1 Among women who tested positive, there are more women who do not have breast cancer. P ( X = 2 ,Y = 1 ) P ( X = 1 ,Y = 1 ) = 0.1188 0.0086 = 13.814 Among women who tested positive, the probability of not having breast cancer is 13.814 times that of having breast cancer. Question 6: (Textbook problem 2.7) For adults who sailed on the Titanic on its fateful voyage, the odds ratio between gender (female, male) and survival (yes, no) was 11 . 4. a. What is wrong with the interpretation, “The probability of survival for females was 11 . 4 times that for males?” Give the correct interpretation. “11.4” is the odds ratio between gender and survival. The odds ratio compares the odds instead of the proportion of survival between genders. In this interpretation of odds ratio, “probability” cannot be used. This interpretation is the interpretation of relative risk. The correct interpretation: The odds of survival for females were 11.4 times those for males. b. The odds of survival for females equaled 2.9. For each gender, find the proportion who survived. Find the value of RR in the interpretation, “The probability of survival for females was RR times that for males.” Let the proportion π Odds= π/(1- π) => π=odds/1+odds The proportion who survived for females=2.9/(1+2.9)=0.744 The proportion who survived for males=0.254/(1+0.254)=0.203 Relative risk=0.744/0.203=3.67
Thus, RR=3.67 The probability of survival for females was 3.67 times that for males. Question 7: (Textbook problem 2.11) Table 2.11 cross-classifies votes in the 2008 and 2012 US Presidential elections. Estimate and find a 95% confidence interval for the population odds ratio. Interpret. The sample odds ratio θ ¿ = 802 / 53 34 / 494 = 802 494 34 53 = 219.86 The sample log θ ¿ =log(219.86)=5.393 The standard error of log θ ¿ = 1 802 + 1 53 + 1 34 + 1 494 =0.227 A 95% confidence interval for log ϴ = ¿ 5.393±1.96*(0.227)=5.393±0.44492 =(4.948,5.838) The corresponding confidence interval for is (exp(4.948),exp(5.838)) ϴ =(140.9,343.1) We are 95% confident that the population odds ratio lies between 140.9 and 343.1.
| | Question 8: The odds ratio can be defined as θ = P ( Y = 1 | X = 1)/( P ( Y = 2 | X = 1)) P ( Y = 1 | X = 2)/( P ( Y = 2 | X = 2)) In case control studies we are not able to estimate P ( Y = y X = x ) because the number of subjects that have each outcome level y is fixed by design. Instead we are able to estimate P ( X = x Y = y ). Show mathematically why this enables us to estimate odds ratios from case-control studies, i.e. show that the odds ratio can be written in terms of things we can estimate. P(Y=y|X=x) = P ( X = x | Y = y ) P ( Y = y ) P ( X = x ) ϴ= P ( Y = 1 | X = 1 ) / P ( Y = 2 X = 1 ) P ( Y = 1 | X = 2 ) / P ( Y = 2 X = 2 ) = P ( Y = 1 | X = 1 ) P ( Y = 2 X = 2 ) P ( Y = 1 | X = 2 ) P ( Y = 2 X = 1 ) = P ( X = 1 Y = 1 ) P ( Y = 1 ) P ( X = 1 ) · P ( X = 2 Y = 2 ) P ( Y = 2 ) P ( X = 2 ) P ( X = 2 Y = 1 ) P ( Y = 1 ) P ( X = 2 ) · P ( X = 1 Y = 2 ) P ( Y = 2 ) P ( X = 1 ) = P ( X = 1 Y = 1 ) P ( X = 2 Y = 2 ) P ( X = 2 Y = 1 ) P ( X = 1 Y = 2 ) Since we know the number of subjects that have exposed or not among cases and controls in this study design, we can estimate the odds ratio.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help