Midterm1_PracticeProblems_GH1-7_Solutions

docx

School

University of California, Merced *

*We aren’t endorsed by this school

Course

180

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

9

Uploaded by MagistrateWaterBuffaloMaster2012

Report
Multiple Choice Questions: 1. What is the first thing we do when sizing up a data set? A) Make a graph of the data B) Calculate numerical descriptors (e.g. mean, median, standard deviation, etc.) C) Individually check the numerical values of all the data stored in a vector D) Run an ANOVA 2. The standard deviation (σ) is: A) The added squared differences of each value from the mean divided by the degrees of freedom B) The exact center value of the dataset C) An explanatory variable D) The square root of the variance 3. Which of the following would you expect to be a categorical variable? A) The weight of a patient B) The gender of a patient C) The height of the patient D) The drug dose given to a patient 4. μ = y±ε In the expression for the confidence limits on the estimated mean (y̅) from n measurements, the term “ε” is a value that: A) Gets bigger as the number of measurements (n) gets bigger B) Gets smaller as the number of measurements (n) gets bigger C) Does not depend on the number of measurements (n) D) We can never estimate by statistics 5. What does the central line in a box plot represent? A) First Quartile B) Mean C) Median D) Mode 6. The key idea in Analysis of Variance is calculating the ratio of the mean square variation of the group means around the grand mean to the: A) F value B) Mean square variation of the data around the group means C) Number of data points used D) Mean square variation of the data around the grand mean 7. In linear regression (i.e. a model with one continuous explanatory variable), the best fit line is one that: A) Has the same number of points below it and above it
B) Minimizes the squared deviations between the fitted and actual values in the horizontal (x) direction C) Goes through the origin (x=0, y=0) D) Minimizes the squared deviations between the fitted and actual values in the vertical (y) direction 8. In the General Linear Model the actual statistical model (e.g. linear regression, analysis of variance, etc.) performed is determined by: A) The user indicating the specific model to be run by entering a key word: “regression”, “analysis_of_variance”, etc. B) The number and type of the explanatory variables C) The number (but not the type) of the explanatory variables D) The total number of data points included in the model 9. A researcher observes that the p -value for a model relating how the photosynthetic rate is affected by available nitrogen (a continuous explanatory variable) is 0.03. What is the null hypothesis for this situation? A) Photosynthetic rate does not depend on the amount of available nitrogen B) Photosynthetic rate is positively correlated with the amount of available nitrogen C) The amount of available nitrogen is positively correlated with the photosynthetic rate D) The amount of available nitrogen does not depend on the photosynthetic rate 10. A researcher observes that the p-value for a model relating photosynthetic rate to the amount of available nitrogen (a continuous explanatory variable) is 0.03. What is the maximum confidence level that the researcher can reject his/her null hypothesis? A) 93% B) 30% C) 0.07% D) 97%
Written Answer Questions: 1. A company commissions two studies to determine if there was a correlation between the number of advertisements they put in the local newspaper and the sales of their product (measured in thousands of dollars of sales). Use the graphs, ANOVA and coefficients tables given below for the two studies to answer the questions below. a. What is the Null Hypothesis that these studies are testing? Sales cannot be predicted by # advertisements. b. What amount of the sales variation is explained by the variation in the number of advertisements in each of these two studies? 74.89% in Study 1 and 99.28% in Study 2. c. Does the first study allow you to reject the Null Hypothesis at the 90% confidence level? What about at the 95% confidence level? Yes; yes d. Does the second study allow you to reject the Null Hypothesis at the 90% confidence level? What about at the 95% confidence level? Yes; no e. Compare the overall p-values and R^2 values for the models created from these two studies. Does the model that best explains the variation in the sales ( Study 2 ) results give you the most significant p-value? ( No ) If not, describe how/why this result is possible. Study 2 has only 3 data points, so not enough confidence to produce a low p-
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
value even though they nearly fall on a straight line ( high R squared). Study 1 on the other hand has a lot of data points, so high confidence in the model (low p-value), even though there is more ‘noise’ around the model (lower R squared). f. Based on the most statistically significant of these models (meaning lowest p-value) approximately how many more sales dollars are generated for each advertisement placed (round answer to nearest 100$)? Study 1 model equation: Sales1 (in thousands of dollars) = 5.4348 * advt1 + 31.7499 For each additional advertisement, 5.4348*1000 more sales dollars are generated. $5434.80 $5400.00 g. Based on the most statistically significant of these models, approximately how many sales dollars are generated if no advertisements are placed? Sales1 (in thousands of dollars) = 5.4348 * advt1 + 31.7499 Sales1 (in thousands of dollars) = 5.4348 * (0) + 31.7499 = 31.7499 $31,749.90
2. This question involves a study of the cure rates for a particular bacterial infection (measured in terms of percent patients cured) using different dosages (a continuous variable) of three different antibiotics (labeled “A”, “B”, and “C”). Use the information in the following analysis of variance table to answer the questions below the table. a. Fill in the Sums of Squares and p-values in the following table using the R output above. Explanatory variable Raw Sum of Squares Adjusted Sum of Squares p-value from raw SS p-value from adjusted SS Antibiotic 856.71 856.71 0.001541 0.001541 Dosage 1586.77 1586.77 7.401e-06 7.401e-06 b. Based on your results in part a, what can you say about the experimental design used in this study? Raw SS = Adjusted SS, so must have been an orthogonal experimental design c. Based on your results in part a, is dosage or antibiotic responsible for describing the largest fraction of the variation in the cure percentages? Dosage (higher Raw SS) Use the information in the following coefficients table to answer the questions below.
d. Fill in the blanks in the following model: Cure % = ¿ + [ antibiotic A ¿ antibiotic B ¿ antibioticC ¿ ] + ¿ 2.5320 ¿ ×dosage e. What is the overall Null Hypothesis for this model? Neither the dosage nor the antibiotic type can be used to predict the cure rates. f. At what confidence level can you reject that Null Hypothesis? At the (1-4.361*10^(-6))*100% confidence level. g. If you made a graph of your model from step d, what would the graph look like? Just give a qualitative description, for example: “a plane”, “a pair of nonparallel lines”, “a set of four parallel lines”, etc. A set of 3 parallel lines (3 lines b/c 3 different antibiotics, parallel because there are no interactions in this model). h. How would your answer to g. change if there were an interaction between dosage and antibiotic ? 3 nonparallel lines.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. In the following ANOVA and coefficients tables, 4 different p-values are labeled (A, B, C, and D). Use these outputs to answer the questions below. For each of the four p-values pointed to in the R outputs above, state the Null Hypothesis and indicate whether it can be rejected at the 99% confidence level. A: QUALITY cannot be used to predict PRICE. Cannot be rejected @ 99% confidence level. B: QUANTITY cannot be used to predict PRICE, once QUALITY has been taken into account. Can be rejected @ 99% confidence level. C: QUALITY cannot be used to predict PRICE -or- The coefficient of QUALITY is actually zero. Can be rejected @ 99% confidence level. D: Neither QUALITY nor QUANTITY can be used to predict PRICE. Can be rejected @ 99% confidence level.
4. This question involves a data set that describes some biological response (RESPONSE) and two categorical explanatory variables, SEX (Male or Female) and LEVEL (five different levels). The ANOVA tables are given below for model formulae involving both possible orderings of the explanatory variables. Use the information in the table below to fill in the diagram on the next page and answer the questions below the diagram. a. Fill in this table using the ANOVA tables given above Explanatory variable Raw Sum of Squares Adjusted Sum of Squares p-value from raw SS p-value from adjusted SS SEX 678567 39630 5.669e-12 0.09132 LEVEL 12696217 12057280 <2e-16 <2.2e-16 The following is a diagram you saw several times in class, showing two possible paths from the grand mean (M) to a fitted model (F). [Note that the diagram does not show the point corresponding to the response values (Y), which would be directly above the point F].
b. Fill in the two model formulae on this diagram (to figure out which goes where, use the relative lengths of the raw (unadjusted) sums of squares). Y ~ SEX + LEVEL Y ~ LEVEL + SEX c. Fill in the values of the raw and adjusted sums of squares in the blank given on the diagram. 678567 12057280 12696217 39630 d. What models do points D, E, and F refer to on the diagram? (just give the relevant model formulae) D: Y ~ SEX E: Y ~ LEVEL F: Y ~ SEX + LEVEL --or— Y ~ LEVEL + SEX e. Based on the information you’ve added to this diagram, if you had to build a model with just one of the explanatory variables, which would be best? LEVEL (higher raw SS; Level gets us much closer to F that SEX does) 5. Name and describe the four principles of experimental design. Replication, Randomization, Blocking, Orthogonality (descriptions in lecture…)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help