exam

pdf

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

319

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

10

Uploaded by BailiffDiscoveryRam38

Report
Fall 2023 STAT 240 Final Exam i | A Acpr wal - _Va'bhaw . 105818069 1st Letter of Last/Family Name Last/ Family Name as in Canvas First /Given Name as in Canva.s Student ID Instructor (Circle) Bret Larget : / Bi Cheng \Xfu\ Lecture (Circle). MWF 8:50-9:40 MWF 9:55-10:45 WF 2:25-3:15 MWF 3:30-4:20 Instructions: 1. You may use both sides of two regular sheet of paper Wlth self- prepared notes. 2 You may not consult other resources, your phone a computer, online mfo nor your neighbor’s exam. : 3. Do all of your Work in the space provided. Use the backs of pages if necessary, indicating clearly that you have done so (so the grader can easily find your complete answer). Scoring Question | Name/Course | 1-3 [ 4-8 [ 912 | 13 [ 14] 15 | 16 | Total Points ey - 0. bW w2 ol T S (L Possible |2 12 720 |16 [12]13 117141007
Multiple Choice and Short Answer. (4 points each) For each multiple choice problem, circle the letter for all correct answers and cross out the‘ letter for all incorrect answers. Answer briefly for other problems. j Problem 1. Sketch the density plot of a normal distribution, 1nclud1ng labeling the z axis, where the mean and median are 100 and the standard deviation is about 20, using a dotted or dashed line. (// Over this, sketch another density plot using a solid line with the same median and about.the same . - ‘standard deviation, but with a slight/moderate right skewness. B g diras i ) oo /»«F?’D Problem 2. A data set bm has Boston Marathon data from the year 2010 with a row for each runner who completed the race and variables Age, Age Range, and Time. Identify code that calculates the mean time for all runners between the ages of 35 and 39. Note: Age_ Range equals “35-39” when Age is in this range and code which calculates the desired mean along with other things should be circled as correct. Circle correct answers and cross out incorrect answers. @ bm %>% filter(Age_Range == "35-39") %>Y summarize (mean = mean (Time) ) 2 (\b{f bm %>Y% group_by(Age_Range) %>}, summarize (mean = mean(Time)) (©) bm %>% mutate(mean = mean(Time)) %>% filter(between(Age, 35, 39)) ‘gfifj bm %>% select(Age Range == "35-39") %>% summarize(mean = mean (Time)) Problem 3. The probablllty mass function of a discrete random varlable X is plotted here. It has L/ a mean u and a standard dev1at10n . 0.3 0.2 I X . @ 0.1- I 0.0 : ; , 10 20 | 30 : 40 50 : X Write the following four numbers under their corresponding valfieé: 12.7;.33; 40, 70 | Wy o 08 quantile 100 x P(X > 20) FX S | S o . 70
Pnoblems 4 and 5. | h ’Eb A data frame matches contains 4746 rows, one for each match between two teams, with the variables index, W, and L, where index is the row number and W and L have the names of one of 332 teams that won and lost a match, respectlvely Each of the 332 teams appears at least once in columns W and L. The data frame ncaa has 64 rows and columns Team and Conference where each value “in Team is distinct and is one of the same 332 team names in matches. Conference is another & ) & categomcal variable with 32 distinct values. : Vo, W L The data frame df is created by the following code. i - df = matches %>% . A _ a7 pivot_longer(W:L, names_to = "Result", values to = "Team") %>% count (Team, Result) %>% ¥ - . e pivot_wider(names_from = Result, values_from = n) %>% I - semi_join(ncaa, by = "Team") o ' s Problem 4. How 'manyyrows are in df? - W1 [Ms2 ©6s /@H;T:asz . (8 4746 Problem 5. List the column names (in any order) in df. Teari, W, L, conlerence Problem 6. A random variéble X is created by adding together the number of heads in five tosses of a fair coin plus the number of tails in a different set of five tosses of the same coin. Circle correct answers and cross out incorrect answers. @ X has a binomial distribution X is not binomial because the number of trials is not fixed - . X is not binomial because the trial success probability changes (@) X is not binomial because the trials are not independent. Problem 7. When constructing a 95% confidence interval for a single population proportion from sample of size n = 105, the margin of error is some quantity a times an estimated standard - error. How is the value of ¢ determined? Circle all correct answers and cross out 1nc0rrect } answers bl anorn(0.95) (@) quorm(0.975) () qt(0.975, 104) (M qt(0.975, 103) Problem 8 In the test of a hypoth651s test for a population proportion p with Hp : p = 0.5 versus H, : p > 0.5, the p-value is equal to 0 043. Circle correct answers and cross out incorrect answers. : We have proven that p > 0.5. The probability that p > 0.5 is more than 95%. : g There is evidence that p > 0.5. : If we had tested with the two-sided alternative hypothesis, the test would have been statistically significant at the o = 0.05 level.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Problem 9. Put the following four quantities in order from smallest to largest. (a) qnorm(0. 1) (b) gti(0.1,:5) fe) g0 L, 10) (d) gt (0.1, 100) jronlon) < ghlo-l 00 < ghlorl, 10) < gHlofi t ~ Problem 10. The correlation coefficient between the avérage height in inches (plotted on the x axis) and weight in pounds (plotted on the y axis) of a sample of 100 people is r = 0 68. Clrcle correct answers and cross out incorrect answers. . (6) If height were measured in feet instead of inches, the value of r would be r = 0.68/12. / ‘.Q In this sample, relatively tall people tend to weigh more than relatlvely short people do. : @" 68% of the points fall exactly on a straight line. ' j L If we switched the axes, the new correlation coefficient Would be equal to —0 68. Problem 11. A linear regression model predicts Welght from he1ght from a sample of 100 people with a mean height of 67 inches. The correlation coefficient is r = 0.68. How much heavier than - average is the predicted weight of a person who is 73 inches tall if the standard deviation of heights- in the data is 4 inches and the standard deviation of weights is 30 pounds? (Note that 78 1is 6 inches above the mean height of 67 inches.) Do not simplify your answer. /&{L/ be weipht L 5 be haight \/,,}7: ZY’} Iuhu(, Zfi'g;..}-(, : . / S 2 \/ y M) f‘Sy - ' :" ' : ’;<w\\(05?)(}a/ T\/ . @uz}f” gy ] Problem 12a. Usmg the same settlng as the previous problem Which of the follovvlng 1ntervals is the Wldest7 Circle the correct answer and cross out the 1ncorrect answers. A 95% confidence. interval for the mean weight of all people who are 65 inches tall. % A 95% confidence interval for the mean weight of all people who are 73 inches tall. / A 95% prediction interval for the weight of a single individual who is 65 inches tall. @ A 95% prediction 1nterva1 for the weight of a smgle individual who is 73 inches tall. Problem 12b. 'Using the same setting as the previous problem: Which of the following intervals 1s ~ the narrowest? Circle the correct answer and cross out the incorrect answers. / @ A confidence interval for the mean Weight- of all people who are 65 inches tall. A confidence interval for the mean weight of all people who are 73 inches tall. 4@@) A prediction interval for the weight of a single individual who is 65 inches tall. ;fi) A prediction interval for the weight of a smgle 1nd1v1dual who is 73 inches tall.
U 4 Problem 13 (12 points_) In acertain genetics experiment with fruit flies and simple Tecessive traits a and b, the probability of offspring with the double recessive genotype aabb, call the probability p, is expected to be 1/16 = 0.0625 if the traits are unlinked. If the traits are linked, then p < 0.0625. In the experiment, there are 500 offsprlng Let X be the number of offspring with genotype aabb. Assume that the genotypes of all offspring are mdependent of one another. (a) Isit reasonéble to aséume that X ~ Binomial(500, p) for some p? Briefly explain. | \7/@ ,‘)~ i reasonble oy i+ Pllow BINS ie . flw,y oo teomei= Mgt sre orly how oufzemey, eilhur w'/ have jenolype Méé;f:L_ ;M)e[ma’mw 2 grespce o paleb la ot wen't affect /’f/‘f‘w(" 8 e iphay; & p 2,"') /’ ¥ L')r{){f//fl/kgflfifl ) Fintd Somfle izt > Tol sqmpple 5i> Ll oy J, ; i Voo Spwend Yy obm 5 oame {”ibbaflr M7' = /lla'/@fw# A’V,Hf/ ey Nt / Y f»”f"”"’l}’/fiy (b) Assume that X ~ Binomial(500,0. 0625) Write an R expression to calculate the exact probability P(X < 28). ~ L( . _/br‘mm (7»%/ 5‘0&/0,@4p5) (c) If you wanted to approximate the probability in (b) with an area under a normal curve using the function pnorm(x, m, s), write expressions to calculate the values of x, m, and s to do this calculation accurately. You do not need to simplify these expressmns ){C ngAfln /( 1 = (590 ) 0 06257) ' @fao)(u‘fle-w)('/—v-fi’wf/
Problem 14 (13 points) . Assume the same setting as in Problem 13. Suppose that in the genetic cross, 28 out of 500 offsprmg have the genotype aabb For context, 28/500 = 0.056. (a) Write an expression for a 95% confidence interval for p, the probability of the genotype aabb in the genetic cross using the Agresti-Coull method You do not need to simplify any numerical expre551ons L= 282 4 grornl0it7%, W Vv’ mmé/ oY . | (b) In a test of the hypothesis Hy : p = 0.0625 versus the one-sided alternative H, :'p < 0.0625, write an R expression to calculate the p-value. /bw (28) 508,000/ (c) Suppose that the calculated p-value is equal to 0. 313 Circle the single letter label (A-C, below) of all appropriate conclusions from statistical inference in context and cross out those not supported by the data analysis. Recall that if the traits are unlinked, the probablhty of genotype aabb is exactly equal to 0.0625 and if the traits are linked, then this probability is smaller than 0.0625. /@ There is strong evidence that the genetic traits are unlmked L K There is strong evidence that the genetlc traits _are linked. | @The observed data is consistent with the genetic traits being unlinked. (d) Without doing any numerical calculations, do yofi think that the upper limit of the 95% - confidence interval from the correct solution to (a) above is larger than or smaller than 0.0625? Briefly justify your response by referring to your answer in part (c). 7// c//’,p/. ‘-/}flr; W’?d} lee m/ 64" ,}!fi/mp 04;"., - },‘w‘(@"‘ )"lu }!,»f..wa('u/-/h i/»,.y‘fl‘\/l)’ . (s s beantly hal'ay/w,fl /)0 Wore i ,,rr’)p—fi @/m’g;flcz i le //VM/J i dc(ua o 0618 50 Inis Jfl(UK mu)/ e within Mu |owtr X .///7;/ figfitfl%’f'g}) r"( }"\j [} i 2 / Q;«%/K‘{;’f el i'n I&"W/(
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Problem 15 (11 pomts) Treat the calendar years from 1870-1899 as a sample of 30 years from a tlme period in the late 1800s and the calendar years from 1990-2019 as a sample of 30 years in a more recent time period. Consider the expected monthly population mean temperature in December for each of these years, with p; representing a late 1800s population mean December temperature and ps representing a recent population mean December temperature, in each case ignoring the effects of random annual temperature fluctuations, but instead representing unobserved chmate COIldlthIlS The following table summarizes the sampled temperature data. perlod _ n | mean | sd late 1800s | 30 | 22.71 | 7.00 recent 30 | 25.17 | 5.25 Here is the output of the function t.test() using these two samples. Welch Two Sample t-test’ data' late_1800s and recent = -1.56399, df = 53.757, p-value = 0.1295 alternatlve hypothe31s true difference in means is not equal to O 95 percent confidence interval: . -5.6636964 0.7432663 sample estimates: mean of x mean of y 22.71344 25.17366 | (a) Using values in the summary data table and other R functions (such as gnorm() or qt()) if needed, write an expression which shows how to compute the upper endpoint of the confidence interval, 0.7432663. Do not simplify or-evaluate the expression. @241 % }frl’?) + ‘LPF‘Z@'475/.55'7)'7) L5 (b) Using values in the summary data table and other R functions (such as qnorm() or qt()) if needed, write an expression which shows how to compute the value ¢ = —1.5399. This problem continues on the next page.
(c) Circle the smgle letter label (A-E, below) of all approprlate conclusions and cross out those not supported by the data analysis. : ?!( There is strong evidence that the observed dlfference in average December temperatures, which \'is about 2.5 degrees Fahrenheit higher in recent years than in the late 1800s, cannot plausibly be explained by random annual fluctuations in Weather prov1d1ng evrdence that a changlng climate 1s _ making December warmer in Madison. ' There is strong evidence that the mean temperature in December 2023 will be higher than what " the mean temperature in Madison was in December 1900. : There is strong evidence that the observed mean temperature in December between 1870 and 1899 is exactly equal to that observed between 1990 and 2019. @ The observed data is consistent Wlth random annual temperature variation alone explalmng the observed difference in mean December temperature between the 18008 and more recently. “F % A change in climate could in part explain part of the observed difference in mean December temperatures between the two tlme periods.
Problem 16 (14 points) - Researchers have a theory that mammals sleep to heal brain cell damage. This theory suggests a power law relationship between the ratio of the average daily sleep time versus awake time and body mass, or , _ e ' (sleep ratio) = C x (body mass)™ where 0.16 < 6 < 0.19 (for reasons argued in the paper). In contrast, if sleep’s primary function is for whole body cellular repair, the researchers ‘expect that ¢ will be closer to 0.25. Taking natural . logs on both sides of the equatlon results in the equation ln(sleep. ratio) = lnC + ( 9) X ln(body mass) s e A (Note that the slope of thls equatlon is —0 ) Sleep ratio is a positive number and has no units. For example, an animal that sleeps 16 hours and is awake 8 hours per day has a sleep ratio of 2. Body mass is measured in kilograms. All values are averages for entire species or other animal group. Study data is from 83 animal groups that span several orders of magnitude in body sizes, ranging . - from small shrews (average body mass equals 0.005 kg or 5 grams) to African elephants (average body mass equals 6654 kg). Small animals tend to sleep much longer (have larger sleep ratlos) than larger a,mmals ST The folloW1ng graph shows the relationship between the natural logs of this sleep ratio (z) versus ~ the natural log of the body mass (y) for these 83 animal groups. Sleep ratio versus body mass Iog—log plot 9 (S o [0} 9 0 0+ S [@)] S -14 © > T 27 =z [} 5 o. . : 5 ' Natural log of body mass (In(kg)) Coefficients: Estimate Std. Error t value Pr(>lt]) (Intercept) -0.17977 . 0.08046 -2.234 0.0282 * X -0.16071 0.02405 -6.681 2.76e-09 *** Signif: ‘codes: 0 “#xk’ 0.001 “*%’ 0.01 ‘%’ 0.05 ‘.’ 0.1 ¢ > 1 Residual stafidard error: 0.7093 on 81 degrees of freedom Multiple R-squared: 0.3553, - Adjusted R-squared: 0.3473 F-statistic: 44.64 on.1 and 81 DF, 'p-value: 2.764e-09 Answer questions on the following page
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(a) Write an R expression to calculate the upper and lower limits of a 95% confidence interval for f. Use numbers from the regression summary on the previous page when poss1ble and code when needed (such as using either qnorm( ) or gt ()). [—Ml)+c(—r,z)7*5?'l75/Xf)bsoz-**‘) ~ Z/S (b) Wr1te an expression to calculate a test statistic for the hypothesis test Hy : 6 = 0. 25 Versus the alternative H, : 6 # 0.25. : 2 .—Qfi:‘:‘w" Z/g .09 (c) Assume that the value of the test statistic in (b) equals —3.71. Write an R expression to .calculate the p-value of this hypothesis test. _ JFpH(-37),8) fi (d) Assume that the p-value for this hypothesis test calculated in (c) is p = 0.00038 and that _ the numerical limits of the confidence interval calculated in (a) are 0.113 and 0.209. Circle the S s1ngle—letter label (A-E, below) of all appropriate conclusions from statistical inference in context and cross out those not supported by the data analysis. l }( The observed data is consistent with 6 = O 25, implying that sleep S primary funct1on In mammals could be whole body cellular repair. - l \’K There is strong evidence that § = 0.25 and that the primary function of sleep in mammals is whole body cellular repair. : ' e @There is strong evidence that g <i.25, suggestmg that whole body cellular repan" is not the primary funct1on of sleep. in mammals. ; =V if /D) There is strong evidence that 0.16 < § < 0.19, implying that the primary function of sleep in mammals is to heal brain cell damage. l @The observed data is consistent with 0.16 < 6 < 0 19, consistent with the biological hypothesis that the primary function of sleep in mammals is to heal brain cell damage.