Demo Questions Only V63

docx

School

Grant MacEwan University *

*We aren’t endorsed by this school

Course

151

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

80

Uploaded by JusticeBoulder1386

Report
DEMO QUESTIONS: STATISTICS 151 LAB Please title, label, and include appropriate scales on graphs, charts, plots and tables. 1. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column ALBBEST, a variable that measures which Alberta political party students think had the best platform (Conservative, Green, Liberal, NDP). a) What type of variable is ALBBEST? b) Use SOFTWARE to tally the counts and percents for the various outcomes for this data. Fill in the chart below. SOFTWARE OUTPUT Party Count Percent Conservative 21 35.0000% Green 5 8.3333% Liberal NDP 60 99.9999% c) Use the information to help you fill in the table below (by hand). Frequency Relative Frequency Conservative 21 21/60= 0.3500 etc. Green 5 0.0833 Liberal NDP TOTAL 60 0.9999 1
d) Use the information above to draw a frequency and percent bar chart for describing this data. Frequency Bar Chart Percent Bar Chart e) The mode is the outcome (choice) of a categorical value that occurs most often (i.e. has the highest count). In this case the mode for our column of ALBBEST is _____ as that bar on the graphs has the highest count of ___ students. 2. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column LIKEENGLISH, a categorical ordinal variable that measures how much students like learning the discipline English at school (1_Dislike_Very_Much, 2_Dislike, 3_Neutral, 4_Like, 5_Like_Very_Much). a) What kind of a variable is LIKEENGLISH? b) Use SOFTWARE to create a bar chart and a pie chart to illustrate the counts of the data for the various choices in the LIKEENGLISH variable. Paste them below. c) The mode is the most common outcome (choice) of a categorical value. This outcome has highest bar in its bar chart. Here, we have _____ , as both ________ and ___________ have the _____ count of values (and the 2
_____ height of bar in their bar chart. d) Do you prefer the bar chart or the pie chart for illustrating this data. Explain your choice. 3. Allan recently recorded the raw scores obtained by 20 of his students on a calculus midterm. You are interested in the number of students in the following classes (also called intervals or bins). [40,45), [45, 50), [50, 55), [55,60), [60,65), [65,70). Data can be found in the file ALLANSMIDTERMDATA. a) A [ bracket means that the left end of the interval is closed, while a ) bracket means that the right end of the interval is open. For example, this means that a value of 45 exactly would be assigned to the interval [45, 50). To what interval would a value of 60 be assigned? b) Cutpoints are values that occur at the left [ edge and the right ) edge of each interval. For example, the cutpoints for the interval [45,50) are 45 and 50. What are the cutpoints for the interval [60,65)? c) The midpoints of an interval are the points in the middle of an interval. For example, the midpoint of the interval [40,45) is 42.5. What is the midpoint of the interval [60,65)? d) Use SOFTWARE to make a frequency histogram of this data. It will automatically choose the cutpoints on the edges of each interval and midpoints in the middle of each interval, and in this case, it happens to choose the ones that interest Allan. (This is not always the case!) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4.Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column WKHRSNEWS (a variable that measures weekly hours of news consumed by students) and the column RNGOOD (a variable that measures whether the student believes Rachel Notley is doing a good job). a) What kind of variables are WKHRSNNEWS and RNGOOD? . b) Create stacked percent histograms for this data (one histogram for each value of the variable RNGOOD). Paste your output below. c) Create side by side boxplots for this data. Paste your output below. 4
d) Calculate descriptive measures of center and dispersion for this data. Paste your output below. . e) What measures of centrality and dispersion are best used to describe this data. . f) STATISTICSSTUDENTSSURVEYFORR also contains the column WKHRSMUSIC (a variable that measures weekly hours of music consumed by students and the column UNDERGORGRAD (a variable that measures whether the student is pursuing an undergraduate or graduate degree). What kind of variables are WKHRSNMUSIC and UNDERGORGRAD? . g) Create stacked percent histograms for this data (one histogram for each value of the variable UNDERGORGRAD). Paste your output below. 5
h) Create side by side boxplots for this data. Paste your output below. i) Calculate descriptive measures of center and dispersion for this data. Paste your output below. . j) What measures of centrality and dispersion are best used to describe this data. . k) STATISTICSSTUDENTSSURVEYFORR also contains the column BEFPULSEMIN (a variable that measures student pulse in beats per min (bpm) before doing the survey) and the column UNDERGORGRAD (a variable that measures whether the student is pursuing an undergraduate or graduate degree). Create side by side boxplots for this data. Paste your output below. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
l) Fill in the blanks or bold your choices in the brackets in the sentences. The median pulse rate for students pursuing an undergraduate degree is (below, the same, above) the median pulse rate for students pursing a graduate/professional degree. The data for those pursuing a graduate/professional degree has an upper whisker that is (less spread out than, about the same spread as, more spread out than) than the upper whisker for those pursuing an undergraduate degree. The percent of data in the interquartile range for the graduate/professional data is (lower than, the same as, higher than) the percent of data in the interquartile range for the undergraduate data. ___% of the data always lies between Q1 and Q3. 5. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the columns BEFPULSEMIN (a variable that measures students’ pulse rate (beats per minute) prior to doing the survey) and BEFBREATHMIN (a variable that measures students’ breaths per minute prior to doing the survey). A scatterplot of these two variables can be found below. 7
a) What is the range of data for the breaths per minute for the students? b) What is the range of data for the pulse rate (beats per minute) for the students? c) Is there a discernable pattern to this data? Why do you think you see what you see? There is a clear _______trend to the data. As breaths per minute increase, pulse beats per minute _______. d) Do you see any discernable outliers (values that do not fit to the pattern you see in the data)? Suggest a possible reason for your answer. There are _____________ outliers. In real life, it is unusual for a person to have a high pulse rate and a low respiration rate, or vice versa. Our data mirrors what we would expect to find. e) Find the equation for the line that would fit best through the points (line of best fit) for this data. Use 2 decimal places. OUTPUT: EQUATION: f) Predict the pulse rate minutes for a student who takes 18 breaths per minute. Are you comfortable with this prediction? Why or why not? 8
Prediction: Your level of comfort and why: g) Predict the pulse rate minutes for a student who takes 5 breaths per minute. Are you comfortable with this prediction? Why or why not? Prediction: Your level of comfort and why: h) Find the correlation for this data. Correlation is a measure between -1 and 1 that indicates how well the points fit to the line. It is positive if the slope of the line of best fit is positive, and negative if the slope of the line of best fit is negative. A number close to 1 or -1 for correlation means a strong correlation. A number closer to 0 means a weak correlation. Output: Answer: i) Bold the correct answers below. The correlation between these two variables is (mild, moderate, strong) and (positive, negative). The slope of the line of best fit for this data is (positive, negative). The correlation of two variables and the slope of the line of best fit through the two variables will always (be different, be the same). 6. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column MOALPHABET (a variable that measures the age (in months) at which students were able to recite the alphabet in their first language). a) Create a frequency histogram of this data. Comment on its shape. Note any peaks. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The middle 3 bars of the histogram are somewhat ___________, with a slight peak at the _________ month bar. The bars on the left and right of the graph contain very _____ values. Although it might be tempting to view this data as normal, the drop off from the middle of the graph to the left and right is not a tidy tapering bell shape, but a more extreme drop. (or something sensible) b) Create a boxplot of this data. Explain what the whiskers tell us about the shape of the data. Are there any outliers? The whiskers are _______ in size to the interquartile range. That is, the range in data between the minimum and the first quartile is _____________ as the range in data between the maximum and the third quartile and the range in data of the interquartile range is also ___________. ______outliers are noted. c) Create a normal probability plot of this data. Comment on your findings It would be correct to say that as the points don’t deviate too far from a line, there is not a lot of evidence of non-normality in the data when we examine the probability plot. This answer suffices. 10
However: It would also be correct to notice the point off the line at the left side and the increased spacing between points on the right side and wonder about “outliers”. It would also be correct to notice the slight spikiness about the line in the middle and suggest it might indicate that the data is not a “perfect” bell over the middle of the data . A sensible answer is what we seek! 7. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. You will create some crosstab tables from this dataset and calculate some probabilities. For all probability questions, include a probability statement, and a solution in fractions, proportion (3 decimals) and percent (1 decimal). a) STATISTICSSTUDENTSURVEYFORR contains the columns FEDBEST (a variable that measures what federal party a student views as having the best party platform (Conservative, Green, Liberals, or NDP) ) and JTGOOD (a variable that indicates whether a student feels Justin Trudeau is doing a good job as Prime Minister (Yes or No). Find the totals for each of the outcomes in the FEDBEST column. Find the totals for each of the outcomes in the JTGOOD column. Make a crosstab table of the counts for each of the (JTGOOD, FEDBEST) pairs. Use this information to fill in the provided table by hand. If we randomly select a student, what is the probability that they are a Liberal platform fan? b) SOFTWARE OUTPUT: JT GOOD (JUSTIN TRUDEAU GOOD)_ FEDBEST (BEST FEDERAL PARTY) Conservative Green Liberal NDP Total No Yes Total P(Liberal) = 11
b) If we randomly select a student, what is the probability that they are a Liberal party platform fan and think that Justin Trudeau is doing a good job? P( ) = c) If we randomly select a student, what is the probability that they are a Liberal platform fan or think that Justin Trudeau is doing a good job? P(Liberal or Yes) = P(Liberal) + P(Yes) – P(Liberal and Yes) = d) If we randomly select a Liberal platform fan, what is the probability that they think Justin Trudeau is doing a good job? P( ) = e) If we randomly select a Justin Trudeau fan (yes group), what is the probability they are a Liberal platform fan? P( ) = f) STATISTICSSTUDENTSURVEYFORR contains the columns GENDERIDENTITY (a variable that measures what gender student most identify with (Female or Male or Other). We wish to consider just the females in the STATISTICSSTUDENTSURVEYFORR dataset. Make a crosstab table of the counts for each of the (JTGOOD, FEDBEST) pairs for just the females. Paste it below. Notice that it does not match the crosstab table calculated in a). Fill in the cell values and the totals by hand and then solve the problems below. SOFTWARE OUTPUT FOR FEMALES ONLY JT GOOD (JUSTIN TRUDEAU GOOD)_ FEDBEST (BEST FEDERAL PARTY) Conservative Green Liberal NDP Total No Yes Total g) If we randomly select a female student, what is the probability she is a Liberal platform fan and thinks that Justin Trudeau is doing a good job? HINT: Consider just the females in the data set. For a female student, P( )= 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
h) If we randomly select a female student, what is the probability she is a Liberal platform fan or thinks that Justin Trudeau is doing a good job? HINT: Consider just the females in the data set. For a female student, P( ) = i) If we randomly select a student, what is the probability they are a Liberal platform fan or think Justin Trudeau is doing a good job, given they are female . FORMAL HINT:(Consider just the females in the data set) For a female student, P( ) = j) Consider only the females. If we randomly select a Liberal platform fan, what is the probability she thinks Justin Trudeau is doing a good job. INFORMAL: FOR FEMALES, LOOK FOR YESES IN THE LIBERAL GROUP k)Consider only the females. If we randomly select a Justin Trudeau fan (yes group), what is the probability she prefers the Liberal platform? INFORMAL: FOR FEMALES, LOOK FOR LIBERALS IN THE YES GROUP l)If we randomly select a female Liberal platform fan, what is the probability she thinks Justin Trudeau is doing a good job? P(Yes|Female and Liberal) = 13
Questions 8 and 9: Students may find the following pictures and summary translations of English wording to Statistics wording useful in doing Questions 8 and 9. BINOMIAL: Helpful Calculation Pictures BINOMIAL: Helpful Translations : English to Math Less than 8 P(X 7) At most 7 P(X 7) No more than 7 P(X 7) 7 or less P(X 7) More than 6 P(X ≥ 7) = 1 – P(X 6) At least 7 P(X ≥ 7) = 1 – P(X 6) No less than 7 P(X ≥ 7) = 1 – P(X 6) 7 or more P(X ≥ 7) = 1 – P(X 6) Between 5 and 9 inclusive P(5 X 9) = P(X 9) – P(X 4) More than 5 but less than 9 Between 5 and 9 not inclusive P(6 X 8) = P(X 8) – P(X 5) 8. Consider the data at https://en.wikipedia.org/wiki/Demographics_of_Canada#Visible_minority_population Suppose we survey 1000 Canadians in 2016. Find the probabilities below, artificially assuming (for the sake of the problem) that we can view each question as a 2016 binomial experiment with 1000 trials where the probability of success is equal to the probability of belonging to the visible minority population of interest expressed in the problem. Give full answers. Use 6 decimal places in your answer. (BTW, the 2016 Canadian census profile at https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/page.cfm? 14
Lang=E&Geo1=PR&Code1=01&Geo2=PR&Code2=01&Data=Count&SearchText=canada&SearchType=Begins& SearchPR=01&B1=All&TABID=1 is a super interesting browse) In 2016, find the probability that: FULL SOFTWARE OUTPUT SETUP/PROB STATMT/ CALCULATION CONCLUDING STATEMENT ) a) between 30 and 39 (inclusive) identify as Black. FOR BIN(1000,0.035), P(30 ≤ X ≤ 39) = P(X ≤ 39) – P(X ≤ 29) = 0.7838592 – 0.1724819 = 0.6113773 The probability that between 30 and 39 (inclusive) identify as Black is 0.6113773 b) more than 52 but less than 55 identify as Chinese. FOR BIN(1000,0.051), The probability that c) at most 15 identify as Latin American. FOR BIN(1000,0.013), P(X ≤ 15) = The probability at 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
d) at least 22 identify as Filipino. FOR BIN(1000,0.023), P(X ≥ 22) = 1- P(X ≤ 21) = The probability that e) no less than 5 but no more than 7 identify as Korean. FOR BIN( ), The probability that f) no more than 3 identify as Japanese. FOR BIN( ), The probability that g) no less than 50 identify as aboriginal FOR BIN( ), The probability 9. Consider a binomial experiment where 100 independently randomly chosen University students are asked a yes/no question where the probability of success, p, is equal to the probability a person answered yes. Use SOFTWARE to answer the following questions fully. Use 7 decimal places in your answer. Output SETUP/ PROB STATMT/ CALCULATION CONCLUDING STATEMENT a)Yes/No Question: Can you program in C? p=15% said yes (after all, it is a University!) Find the probability that between 12 and 18 students (inclusive) can program in C. FOR BIN(100,0.15), P(12 ≤ X ≤ 18) = P(X ≤ 18) – P(X ≤ 11) = 0.8371746 – 0.1634862 = 0.6736884 The probability that between 12 and 18 students (inclusive) can program in C is 0.6736884. b)Yes/No Question: Do you have a vegetable garden? 28% said yes (and everyone who did had too much zucchini and rhubarb!) Find the probability that more than 26 students but less than 31 students have a vegetable garden. FOR BIN(100,0.28), P(27 ≤ X ≤ 30) = P(X ≤ 30) – P(X ≤ 26) = The probability that more than 26 but less than 31 students 16
0.7149122 – have a vegetable garden is. c)Yes/No Question: Have you ever parachuted? P=12% said yes (everyone else wished they had!). Find the probability that at most 9 students said yes. FOR BIN(100,0.12), The probability d)Yes/No Question: Are you an Oilers fan? P = 82% said yes (some students aren’t from Edmonton!) Find the probability that at least 85 students said yes. FOR BIN( ), P(X ≥ 85) = The probability that e)Yes/No Question: Do you believe that an animal called a “Pink Fairy Armadillo” exists? P = 24% said yes (BTW, it does exist, but students got fooled by the silly name!) Find the probability that no less than 20 but no more than 30 students believe that a “Pink Fairy Armadillo” exists. FOR BIN( ), The probability that no less than 20 but no more than 30 students believe a “Pink Fairy Armadillo” exists f)Yes/No Question: Do you find Jeeves and Wooster a better show than Fawlty Towers? P = 52% said yes (that is a difficult question!) Find the probability that no more than 46 students find Jeeves and Wooster a better show than Fawlty Towers. FOR BIN( ), The probability that no more than 46 students find Jeeves and Wooster a better show than Fawlty Towers is g)Yes/No Question: Have you made a TikTok? 63% said yes (everyone else planned to make one!) Find the probability that no less than 60 students have made a TikTok FOR BIN( ), The probability no less than 60 students have made a TikTok is 10A. Assume that the wingspan of a certain type of monarch butterfly is normally distributed with a mean of 10.3 cm and a standard deviation of 0.6 cm. Answer the following. (3m each) a) Find the probability that a monarch butterfly has a wingspan of no less than 11.2 cm. b) Find the probability that a monarch butterfly has a wingspan of at most 8 cm. c) A monarch butterfly with a wingspan in the bottom 2% of wingspans is considered “exquisite”. Find the highest wingspan that a monarch butterfly considered exquisite will have. d) A monarch butterfly with a wingspan in the top 1 % of wingspans is considered “extraordinary”. Find the wingspan that a monarch butterfly would need to have to be considered extraordinary. 1. SOFTWARE OUTPUT SETUP,PSTATEMENTS, CALCULATION CONCLUDING STATEMENT a) For N(10.3, 0.6), P(X > 11.2) = The probability that a monarch butterfly has a wingspan of no less than 11.2 cm is ___________ b) 1b)For N(10.3, 0.6), P(X<8) = _____ The probability that a monarch butterfly has a wingspan of at most 8 cm is _______ 17
c) For N(10.3,0.6), want x such that P( X < x) = ___ The highest wingspan that an “exquisite” monarch butterfly could have is ____________ d) For N(10.3,0.6) want x such that P(X > x)= ___ the same x for which P(X< x)= ____ A monarch butterfly will need a wingspan of _________ or greater to be considered “extraordinary” 10B. To encourage basil plants to be \bushy" and contain many leaves, it is common practice among growers to prune away the top pair of leaves every time a branch grows to contain 3 pairs of leaves: after this, two new branches will form. This pruning pattern is repeated every time three more pairs of leaves grow on any new branch. The time to first pruning of basil plants is known to be normally distributed with an average of 35 days and a standard deviation of 3 days. Answer the following. Include full probability statements, calculations, and conclusions. Time to first pruning for a basil plant is N( 35, 3) a ) Find the probability a randomly chosen basil plant has its first pruning before it is 30 days old. Distributions>Continuous Distributions >Normal Distributions>Normal Probabilities Variable value(s): 30 Mean: 35 Standard deviation: 3 For N(35, 3) Find P(X < 30) = 0.04779035 The probability a randomly chosen basil plant has its first pruning before it is 30 days old is 0.04779035. b) Find the probability a randomly chosen basil plant has its first pruning after it is 39 days old. Commands and Output For N(35, 3) Find P( X > 39) = The probability a randomly chosen basil 18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
plant has its first pruning after it is 39 days old is __________ c) Kathleen and her students grow basil plants. They collect seeds from early starter basil plants that have their first pruning at a young age (plants in the bottom 5% of days old for their first pruning). Find the number of days (or younger) of an early starter basil plant. Commands and Output For N(35, 3) Want ? such that P(X < ?)=0.05 The number of days for an early starter basil plant is _____________ days (or younger). d) Sometimes a student finds they have a late starter plant that does not reach its age of first pruning until a later age (plants in the top 2% of days old for first pruning). Find the number of days (or older) of a late starter basil plant. Commands and Output For N(35, 3) Want ? such that P(X > ?) = 0.02 Want ? such that P(X < ?) = ___ The number of days for a later starter basil plant is _____________ (or older). 11. A pizza delivery company advertises that they will give you your pizza for free if it takes them more than a certain number of minutes to deliver it to you. They wish to only have to give away free pizzas at most 0.5% of the time, and their delivery time follows a normal distribution with a mean of 20 minutes and a standard deviation of 3 minutes. They wish to advertise a number in whole minutes. a) What number of minutes will they advertise? Why? (4 m) 19
SOFTWARE OUTPUT SETUP,PSTATEMENTS,CALCULATION CONCLUDING STATEMENT For N(20,3), Want x such that P(X > x) = 0.005 The same x for which P(X<x) = ______ . A pizza delivered b) What is the probability a pizza will is delivered between 22 and 25 minutes? SOFTWARE OUTPUT SETUP,PSTATEMENTS,CALCULATION CONCLUDING STATEMENT For N(20,3), P(22 < X < 25) = . The probability a pizza will is delivered between 22 and 25 minutes is 12. R/Rcmdr can be used to generate random samples for various probability distributions including the normal distribution. This was done to generate a dataset of 1000 samples of size 4 (that is, randomly sample 1000 samples of size 4) from a normal parent population distribution with a mean µ = 100 and a standard deviation σ = 24. A random seed of 2348 was used. The results were stored in an Excel file called normalmean100sigma24seed2348samples1000n4. The first five lines of this data file are shown below. Open this file to access the resulting sample data so as to answer the following questions. (NOTE: students are not required to know how to generate random data in this fashion – only how to do similar calculations to the problems below). a) Recall that a sampling distribution of sample means when taking all possible samples of size 4 from a population with mean µ and standard deviation σ will have a mean of µ = and a standard deviation of σ n = . b) Use SOFTWARE to calculate the mean and standard deviation of the column of 1000 samples means that you generated from your samples of size 4. Report your answers. c) The mean and standard deviation of the column of 1000 sample means will “approximate” the actual mean and standard deviation of a sampling distribution of all possible sample means of size 4 taken from your normal parent population. Are the mean and standard deviation of your 1000 sample means close to a mean of µ = 100 and a standard deviation of σ n = 24 4 = 12? d) Use SOFTWARE to create a frequency histogram of the column of 1000 sample means in your dataset. This histogram shape will “approximate” the sampling distribution shape of all possible sample means of size 4 20
from your normal parent population. Paste it below and comment on the shape of it. Does your frequency distribution appear normal? Why or why not? Graph Comment: e) Use SOFTWARE to calculate the median of your column of 1000 sample means that you generated you’re your samples of size 4. Does the median of the sample dataset appear close to the mean µ = 100 of your normal parent population? (Recall that for a normal distribution, the mean and median of the population are equal.) f) A boxplot can flag problems that might indicate that a dataset is non-normal (although it cannot indicate if a dataset distribution appears normal as a histogram can). Use SOFTWARE to create a boxplot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your boxplot below. g) We will see if the boxplot gives any indication that your dataset is non-normal. 1) Check the scale on your boxplot and report (roughly) the values of any outliers the boxplot shows. (minimum, Q1, median, Q3, and maximum) of the dataset. 2) Are the lengths of the whiskers relatively equal? 3) Is the box from Q1 to Q3 roughly split in two by the median? 4) Do any of your findings above give you an indication that your dataset might be non-normal? 21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1) There are a few “outliers” identified. Upper outliers range (roughly) from 135 to 140. Lower outliers range (roughly) from 62 to 68. 2) Yes, the lengths of the whiskers appear the same. 3) Yes, the median appears to split the box from Q1 to Q3 equally. 4) No, there is no flag that indicates that the data may be non-normal. There are only a very few computer identified “outliers” and they are all within 3-4 standard deviations of the middle of the data. The data is balanced about its middle. However, we must remember that a balanced boxplot with equal length whiskers does not “prove” normality. h) A normal probability plot that generates a relatively straight line of points when generated from a dataset indicates normality of the data in that dataset. Use SOFTWARE to create a normal probability plot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your normal probability plot below. i) Does the normal probability plot you created suggest that the sample means dataset of 1000 sample means that you generated (from samples of size 4) *might* be non-normal? Why or why not? Answer: j) Why were the mean and standard deviation of your dataset of 1000 samples not *necessarily* exactly µ and σ n ? k) If we have a normal parent population with mean µ and standard deviation σ, and we examine the shape of the sampling distribution of all possible means of size n taken from that parent population, what will we find? Consider different values of n in your answer. . 22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
13. The χ 2 (pronounced Chi-square) distributions form a family of right skewed distributions where a parameter called “degrees of freedom” determines where the peak of the distribution is and how skewed the distribution is. The mean of the χ 2 distribution is equal to the number of degrees of freedom. The variance of a χ 2 distribution is equal to two times the number of degrees of freedom. Here is a picture of a χ 2 distribution with degrees of freedom 5. It has mean µ = 5 and standard deviation σ = σ 2 = 10 = 3.162278 (to 6 decimals) R/Rcmdr can be used to generate random samples for various probability distributions including the χ 2 distribution. The SOFTWARE was used to generate a dataset of 1000 samples of size 4 (that is, randomly sample 1000 samples of size 4) from a χ 2 parent population distribution with degrees of freedom 5 (hence a mean µ = 5 and a standard deviation σ = 3.162278). A random seed of 6292 was used. The SOFTWARE was also used to generate a column that contains the sample means for each of the 1000 samples. The results are stored in the Excel file chisquaredf5seed6292samples1000n4. The first five lines of this data file are shown below. Open this file to access the resulting sample data so as to answer the following questions. (NOTE: students are not required to know how to generate random data in this fashion – only how to do similar calculations to the problems below). a) Recall that a sampling distribution of sample means when taking all possible samples of size 4 from a population with mean µ and standard deviation σ will have a mean of µ = and a standard deviation of σ n = . b) Use SOFTWARE to calculate the mean and standard deviation of the column of 1000 samples means that you generated from your samples of size 4. Report your answers. c) The mean and standard deviation of the column of 1000 sample means will “approximate” the actual mean and standard deviation of a sampling distribution of all possible sample means of size 4 taken from your χ 2 23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
parent population. Are the mean and standard deviation of your 1000 sample means close to a mean of µ = 5 and a standard deviation of σ n = 3.162278 4 = 1.581139? d) Use SOFTWARE to create a frequency histogram of the column of 1000 sample means in your dataset. This histogram shape will “approximate” the sampling distribution shape of all possible sample means of size 4 from your χ 2 parent population. Paste it to the left and comment on the shape of it. Does your frequency distribution appear normal? Why or why not? Comment: e) A boxplot flags situations that might indicate that a dataset is non-normal (although it cannot indicate if a dataset distribution appears normal as a histogram can). Use SOFTWARE to create a boxplot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your boxplot to the left. f) We will see if the boxplot gives any indication that your dataset is non-normal. 1) Check the scale on your boxplot and report (roughly) the values of any outliers the boxplot shows. (minimum, Q1, median, Q3, and maximum) of the dataset. 2) Are the lengths of the whiskers relatively equal? 3) Is the box from Q1 to Q3 roughly split in two by the median? 4) Do any of your findings above give you an indication that your dataset might be non-normal? Answer 24
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
g) A normal probability plot that generates a relatively straight line of points when generated from a dataset indicates that normality of the data in that dataset is an assumption that is not untoward. Use SOFTWARE to create a normal probability plot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your normal probability plot to the left. h) Does the normal probability plot you created suggest that the sample means dataset of 1000 sample means (from samples of size 4) that you generated might not be normal? Why or why not? Answer: i) Why were the mean and standard deviation of your 1000 sample means (from samples of size 4) not *necessarily* exactly equal to µ = 5 and σ n = 3.162278 4 = 1.581139? j) If we have a χ 2 parent population with mean µ and standard deviation σ, and we examine the shape of the sampling distribution of all possible means of size n taken from that parent population, what will we find? Consider different values of n in your answer. Did you find what you expected to find when n = 4? What is the name of the theorem you used to get your answer? 14. The SOFTWARE R/Rcmdr was used to generate a dataset of 1000 samples of size 36 (that is, randomly sample 1000 samples of size 36) from a χ 2 parent population distribution with degrees of freedom 5 (hence a mean µ = 5 and a standard deviation σ = σ 2 = 3.162278). A random seed of 7891 was used. The SOFTWARE was also used to generate a column that contains the sample means for each of the 1000 samples. The results are stored in the Excel file chisquaredf5seed7891samples1000n36. The first five lines of this data file are shown below (not all the columns are shown). Open this file to access the resulting sample data so as to answer the following questions. (NOTE: students are not required to know how to generate random data in this fashion – only how to do similar calculations to the problems below). 25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a) Recall that a sampling distribution of sample means when taking all possible samples of size 36 from a population with mean µ and standard deviation σ will have a mean of µ = and a standard deviation of σ n = . b) Use SOFTWARE to calculate the mean and standard deviation of the column of 1000 samples means that you generated from your samples of size 36. Report your answers. c) The mean and standard deviation of the column of 1000 sample means will “approximate” the actual mean and standard deviation of a sampling distribution of all possible sample means of size 36 taken from your χ 2 parent population. Are the mean and standard deviation of your 1000 sample means close to a mean of µ = 5 and a standard deviation of σ n = 3.162278 36 = 0.527046? d) Use SOFTWARE to create a frequency histogram of the column of 1000 sample means in your dataset. This histogram shape will “approximate” the sampling distribution shape of all possible sample means of size 36 from your χ 2 parent population. Paste it below and comment on the shape of it. Does your frequency distribution appear normal? Why or why not Comment: 26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
e) A boxplot can flag situations that might indicate that a dataset is non- normal (although it cannot indicate if a dataset distribution appears normal as a histogram can). Use SOFTWARE to create a boxplot from the column of 1000 sample means (from the samples of size 36) in your dataset. Paste your boxplot to the left, f) We will see if the boxplot gives any indication that your dataset is non- normal. 1) Check the scale on your boxplot and report (roughly) the values of any outliers the boxplot shows. (minimum, Q1, median, Q3, and maximum) of the dataset. 2) Are the lengths of the whiskers relatively equal? 3) Is the box from Q1 to Q3 roughly split in two by the median? 4) Do any of your findings above give you an indication that your dataset might be non-normal? Answer g) A normal probability plot that generates a relatively straight line of points when generated from a dataset indicates that normality of the data in that dataset is an assumption that is not untoward. Use SOFTWARE to create a normal probability plot from the column of 1000 sample means (from the samples of size 36) in your dataset. Paste your normal probability plot to the left. h) Does the normal probability plot you created suggest that the sample means dataset of 1000 sample means (from samples of size 36) that you generated might be non-normal? Why or why not? Answer 27
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
i) Why were the mean and standard deviation of your 1000 sample means (from samples of size 36) not *necessarily* exactly equal to µ = 5 and σ n = 3.162278 36 = 0.527046? j) If we have a χ 2 parent population with mean µ and standard deviation σ, and we examine the shape of the sampling distribution of all possible means of size n taken from that parent population normal what will we find? Consider different values of n in your answer. Did you find what you expected to find when n = 36? What is the name of the theorem you used to get your answer? 15. The weekly travel time for a population of students is normal with a mean of 200 minutes and a standard deviation of 10 minutes. a) What is the probability that a single student drawn from the population travels at least 196 minutes a week? b) A random sample of 25 students is drawn from the population. What is the probability that the mean travel time of the 25 students is no less than 196 minutes a week? c) Suppose the population above was of an unknown shape. Explain (using statistical reasoning) why or why not you would feel comfortable calculating the above probabilities. (2m) SOFTWARE OUTPUT SETUP/FULL PROBABILITY STATEMENTS AND CALCULATION/CONCLUDING STATEMENT (a) Normal with mean = and standard deviation= For N( ), P( X > 196) = The probability that the student travels at least 196 minutes a week is (b) Normal with mean = For N( ), 28
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
and standard deviation= P( X > 196) The probability that (c) Acceptable Answer 1: Comfortable because Acceptable Answer 2: Not comfortable because Acceptable Answer 3: Unsure because 16. A random sample of 1280 students is taken from a larger population and it reveals that the sample mean (average) amount for their student loans is $18,900.00. Suppose it is known that the standard deviation σ of the population is $40,222.00. Find a 90% confidence interval for the true population mean µ. a) What do you know about the mean, standard deviation, and shape of the sampling distribution of X ? Why do we know the sampling distribution has the indicated shape? Mean: µ = Standard Deviation: σ n = Approximate Shape of Sampling Distribution: Why: Sample Size is large: b) Calculate a 90% confidence interval for the population mean. Interval: Output: c) Interpret your 90% confidence interval. We are 90% confident that d) Fill in the blanks We (do, do not) know if our particular interval contains the true population mean. e) For a (1-α)% confidence level, we define a significance level that is equal to α%. The confidence level and the significance level add to 100%. Basically, for now, you can think of the significance level as our willingness to be wrong when we seek to determine if there is statistical evidence that a population mean differs from a given amount. What is our significance level for this problem? 29
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
f) A friend wonders if the population mean differs from $20,000. What do you tell your friend? At the ___% significance level, since $20,000 falls (inside , outside) the 90% confidence interval, we (do, do not) have significant evidence that the population mean differs from $20,000. g) A friend wonders if the population mean differs from $30,000. What do you tell your friend? At the ___% significance level, since $30,000 falls (inside , outside) the 90% confidence interval, we (do, do not) have significant evidence that the population mean differs from $30,000. h) Perform a hypothesis test to determine if there is evidence that the population mean for student loans differs from $20,000.00. Use a level of significance of 10%. You can calculate the z value X μ σ / n by hand, but SOFTWARE will find the p-value for this problem. Output for p-value Answer: Fill in the table: Be sure to verify z = -0.97843957 and fill the remaining boxes. Hypotheses Test stat P-value Decision and Reason Decision in “English” H 0 : H a : X = z = X μ σ / n = Pvalue Statement 2P( X > ) = 2P(Z > ) pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 10% significance level, there (is, is not) significant evidence that, the average population student loan differs from $20,000. i) Perform a hypothesis test to determine if there is evidence that the population mean for student loans exceeds $17,000.00. Use a level of significance of 1%. You can calculate the z value X μ σ / n by hand, but SOFTWARE will find the p-value for this problem. Recall x = 18900, σ = 40222, σ n = 40222 1280 = 1124.4. Output for p-value OR Answer: Fill in the table: Be sure to verify z = 1.69003199 and fill the remaining boxes. 30
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
USE R TO FIND p-value Hypotheses Test stat P-value Decision and Reason Decision in “English” H 0 : H a : X = z = X μ σ / n = by hand Pvalue Statement P( X > ) = P(Z> ) = pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 1% significance level, there (is, is not) significant evidence that, the average population student loan mean exceeds $17,000. j) Perform a hypothesis test to determine if there is evidence that the population mean for student loans lies below $21,000.00. Use a level of significance of 5%. You can calculate the z value X μ σ / n by hand, but SOFTWARE will find the p-value for this problem. Recall x = 18900, σ = 40222, σ n = 40222 1280 = 1124.4. Output for p-value OR Answer: Fill in the table: Be sure to verify z = - 1.86793008 and fill the remaining boxes. Hypotheses Test stat P-value Decision and Reason Decision in “English” H 0 : H a : X = z = X μ σ / n = by hand Pvalue Statement P( X < ) = P(Z< ) = pvalue= (Reject , Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that, the average population student loan mean lies below $21,000. 17. A random sample of 1280 students is taken from a larger normal population and it reveals that the sample mean (average) amount for their student loans is $18,900.00 and the sample standard deviation is $45,192.00. You are asked to find a 95% confidence interval for the true population mean. a) What do you know about the mean, standard deviation, and shape of the sampling distribution of X ¿ )? Why do we know the sampling distribution has the indicated shape? Mean: µ = unknown 31
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Standard Deviation: σ n is unknown, because σ is unknown; Estimated Standard Deviation: s n = 45192 1280 = 1263.15 Approximate Shape of Sampling distribution: Normal (Because population is normal, X is N(µ, σ n ¿ and X μ σ / n ) is N(0,1) ) (Because population is normal, X μ s / n is a t distribution with n – 1 df ) (Because n is large, the t distribution is close to a normal shape of a N(0,1) distribution) b)Calculate a 95% confidence interval for the population mean. Use software to help. The 95% confidence interval for μ is: Output for t 0.025, 1279 c) Interpret your 95% confidence interval. We are 95% confident that d) Fill in the blanks We (do, do not) know if our particular interval contains the true population mean. e) For a (1-α)% confidence level, we define a significance level that is equal to α%. The confidence level and the significance level add to 100%. Basically, you can think of the significance level as our willingness to be wrong when we seek to determine if there is significant statistical evidence that a population mean differs from a given amount. What is our significance level for this problem? 32
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
f)A friend wonders if the population mean differs from $20,000. What do you tell your friend? At the ___% significance level, since $20,000 falls (inside, outside) the 95% confidence interval, we (do, do not) have significant evidence that the population mean differs from $20,000. g)A friend wonders if the population mean differs from $30,000. What do you tell your friend? At the ___% significance level, since $30,000 falls (inside, outside) the 95% confidence interval, we (do, do not) have significant evidence that the population mean differs from $30,000. h) Perform a hypothesis test to determine if there is evidence that the population mean for student loans differs from $20,000.00. Use a level of significance of 5% . You will do most of this problem by hand. You can calculate the test statistic t = x μ s / n for this data by hand and find your pvalue can be found using SOFTWARE. Output for your pvalue Hypotheses Test stat/df P-value Decision and Reason Decision in “English” H 0 : H a : t = X μ s / n = df= Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that, the average population student loan differs from $20,000.00. i) Perform a hypothesis test to determine if there is evidence that the population mean for student loans is greater than $17,000.00. Use a level of significance of 5% . You will do most of this problem by hand. You can calculate the test statistic t = x μ s / n for this data by hand and find your pvalue can be found using SOFTWARE. Output for your pvalue Hypotheses Test stat/df P-value Decision and Reason Decision in “English” H 0 : H a : t = X μ s / n = df= Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that, the average population student loan is greater than $17,000. 33
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
j) Perform a hypothesis test to determine if there is evidence that the population mean for student loans is less than $22,000. Use a level of significance of 5% . You will do most of this problem by hand. You can calculate the test statistic t = x μ s / n for this data by hand and find your pvalue can be found using SOFTWARE. Output for your pvalue Hypotheses Test stat/df P-value Decision and Reason Decision in “English” H 0 : H a : t = X μ s / n = df= Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α 0.05 At the 5% significance level, there (is, is not) significant evidence that, the average population student loan is less than $22,000. 18. In August 2009, an article in Canadian Family Physicians, “Health Practises of Canadian Physicians” indicated that large population of Canadian doctors reported, on average, a total of 281 minutes per week of exercise. Today, you take a random sample of 20 doctors in Canada, and ask them to report their minutes of exercise per week. Students should enter the data below into a column in SOFTWARE. The file DOCTOREXERCISEDEMO also contains this data. 291 298 287 269 266 298 281 283 292 306 311 268 276 283 285 316 294 296 279 266 a) Use SOFTWARE to create a 90% confidence interval for the average weekly minutes of exercise of doctors in Canada today. At the 10% significance level, is there evidence that the average weekly minutes of exercise for Canadian doctors has changed from 281 minutes? Fill in the table below. Circle or capitalize the answers chosen in the sentences. Output Conf Level Signif Level Conf Interval Sample Mean 34
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
90% Summary Statement We can be 90% confident that English Statement At the level of significance, because 281 minutes is (inside, outside) the 90% confidence interval, there (is, is not) significant evidence that the average weekly minutes of exercise for Canadian doctors has changed from 281 minutes. b) Fully and completely state the two assumptions that must be met to perform the one sample t confidence interval test above. (3m) 1. 2. c) A frequency histogram of the sample data appears below. Examine it, and then highlight or circle your chosen answers in the bracketed parts of the sentences below. (4m) The shape of the histogram of the sample data for doctor exercise minutes (does, does not) appear normal as the spike of probability at the left side is (too low, too high). The population distribution of weekly doctor exercise minutes is assumed (normal, skewed right, skewed left) to solve the problem. A sample of size 20 (will, will not) give us some indication of the shape of the population, but we’d like it to be (smaller, larger). d) Indicate whether each of the following statements is true or false. (7m) i. 90% of doctors exercise weekly for the number of minutes between the endpoints of your interval above. ii. The width of a 99% Confidence Interval created from the data above would be larger than the width of a 90% CI. 35
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
iii. The shape of a population distribution is always the same as the shape of a test statistic distribution. iv. There is a probability of 90% that the true mean µ of weekly doctor exercise minutes is in the 90% confidence interval created above. v. A 90% confidence interval created from a sample of size 1000 from the population of weekly doctor exercise minutes would have a greater chance of capturing the true mean µ of weekly doctor exercise minutes than the 90% confidence interval you created with the sample of size 20 above. vi. 90% of all intervals created with this formula with many many repeated samples of the same size from the population of weekly doctor exercise minutes will contain the true mean µ of weekly doctor exercise minutes. vii. Today, we can be sure that the true mean µ of weekly doctor exercise minutes is 281 minutes. e) Perform a hypothesis test to determine if there is evidence that the average weekly minutes of exercise for Canadian doctors has changed from 281 minutes. Use a level of significance of 10%. Output Hypotheses Test stat/df P-value Decision and Reason Decision in “English” H 0 : μ H a : μ t = df = Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the ___% significance level, there (is, is not) significant evidence that f) Perform a hypothesis test to determine if there is evidence that the average weekly minutes of exercise for Canadian doctors exceeds 279 minutes. Use a level of significance of 5%. 36
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Output Hypotheses Test stat/df P-value Decision and Reason Decision in “English” H 0 : μ H a : μ t = df = Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that the mean weekly exercise minutes for doctors exceeds 279 min. g) Perform a hypothesis test to determine if there is evidence that the average weekly minutes of exercise for Canadian doctors lies below 295 minutes. Use a level of significance of 5%. Output Hypotheses Test stat/df P-value Decision and Reason Decision in “English” H 0 : μ H a : μ t = df = Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that the mean weekly exercise minutes for doctors lies below 295 min. h) We found a histogram of the sample data above, and noted that we assumed normality of the population to do the question even though the sample histogram indicated non-normal distribution bulk in the left side. 37
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Create a QQ plot of the sample data, and comment on whether it indicates that the population could be considered normal. (4m) . 19. The file ALLANSMIDTERMDATA contains a column of midterm grades obtained by 20 students randomly selected from a Calculus class taught by Allan. a) Use SOFTWARE to obtain a 95% and 99% confidence interval for the midterm grades. Paste your output below. (2m) b) Fill in the following tables, and in each case, test whether there is significant evidence that the true average score of all calculus students taking the midterm differs from a score of 49.5. (8m) Conf Level Signif Level Conf Interval Sample Mean 95% 5% Summary Statement We can be 95% confident that the true mean grade for the calculus midterm English Statement At the 5% level of significance, because 49.5 is (inside, outside) the 95% confidence interval, there (is, is not) significant evidence that the true average midterm score on Allan’s calculus 38
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
midterm differs from a score of 49.5. Conf Level Signif Level Conf Interval Sample Mean 99% 1% Summary Statement We can be 99% confident that English Statement At the 1% level of significance, because 49.5 is (inside, outside) the 99% confidence interval, there (is, is not) significant evidence that the average that the true average midterm score on Allan’s calculus midterm differs from a score of 49.5. c) Fully and completely state the two assumptions that must be met to perform the one sample t confidence interval test above. Also indicate whether you will do a z or a t test. (3m) 1. 2. d) Create a QQ plot of the sample data, and comment on whether it indicates that the population could be considered normal. (4m) From the graph, we can see that e) Perform a hypothesis test to determine if there is significant evidence that the true average score of all calculus students taking the midterm differs from 49.5. Use a level of significance of 1%. (10m) 39
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Output 1.Hypotheses 2.α 3.Test stat/df 4.P-value 5.Decision and Reason 6.Decision in “English” H 0 : μ H a : μ (two tailed test) t = df = Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 1% significance level, there (is, is not) significant evidence that, the average grade on Allan’s calculus midterm differs from 49.5. f) Fill in the following blanks. A (1 – α)% confidence interval test will always give _____________ decision as an _____test of significance. g) Perform a hypothesis test to determine significant evidence that the true average score of all calculus students taking the midterm is more than 50. Use a level of significance of 5%. (10m) Output 1.Hypotheses 2.α 3.Test stat/df 4.P-value 5.Decision and Reason 6.Decision in “English” H 0 : μ H a : μ (right sided test) t = df = Pvalue Statement Pvalue = (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that the true average score of all calculus students taking the midterm is more than a score of 50. 40
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
h) Obtain an upper tailed 95% confidence interval with lower bound for this test. Interpret it and explain it in an English sentence. (5m) Confidence Interval Interpretation We can be 95% confident that ….. English Statement At the ____ % level of significance, since the entire interval ______ the score of 50, we ( have, do not have) significant evidence that true average score of all calculus students taking the midterm is more than a score of 50. i) Perform a hypothesis test to determine significant evidence that the true average score of all calculus students taking the midterm is less than 54. Use a level of significance of 5%. (10m) Output 1.Hypotheses 2.α 3.Test stat/df 4.P-value 5.Decision and Reason 6.Decision in “English” H 0 : μ H a : μ (left sided test) t = df= Pvalue Statement Pvalue = (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that the true average score of all calculus students taking the midterm is less than 54. j) Obtain a lower tailed 95% confidence interval with upper bound for this test. Interpret it and explain it in an English sentence. (5m) Confidence Interval Interpretation We can be 95% confident that English Statement At the___ % level of significance, since the entire interval _______ 54, we ( have, do not have) significant evidence that the true average score of all calculus students taking the midterm is less than 54. 20. Use SOFTWARE with HOURSHOMEWORKGENDER This file contains weekly hours of homework per classroom hour for two independent random samples of students (35 of whom are male identifying and 35 of whom are female identifying). We will consider only the female column of data (HHF) in the following problems, and reiterate that a random sample was taken of the females. Output is provided below. Students should verify that they can produce it. 41
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a) Is there significant evidence that females did more than 2.8 hours of homework per classroom hour, on average? Use a significance level of 5%. Fill in the blanks in the table below. Highlight or circle answers you choose in the sentences. (10m) Output H 0 : H a : t= df = Pvalue stmt= Pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the ___ % significance level, there (is, is not) significant evidence that, on average, female identifying students do more than 2.8 hours of homework per classroom hour. b) Obtain an upper tailed 95% confidence interval with lower bound for this test. Interpret it and explain it in an English sentence. (5m) Confidence Interval Interpretation We can be___% confident ….. English Statement At the ___ % level of significance, since the entire interval is ______ 2.8 hours, we (have, do not have) significant evidence that female students do more than 2.8 hours of homework per classroom hour. c) Is there significant evidence females did less than 3.6 hrs of homework per classroom hr, on average? Fill in the blanks in the table below. Use a significance level of 5%. Highlight or circle answers you choose in the sentences. (10m) Output Hypotheses Test stat and df P-value and statement Decision in “Stats” Decision in “English” H 0 : H a : t= Pvalue stmt= (Reject, Do not reject) H 0 Pvalue ≤, > α At a ____% significance level, there (is, is not) significant evidence that, 42
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
df = Pvalue= d) Obtain a lower tailed 95% confidence interval with upper bound for this test. Interpret it and explain it in an English sentence. (5m) Confidence Interval Interpretation We can be 95% confident that ……. English Statement At the ___ % level of significance, since the entire interval is _______ 3.6 hours, we (have, do not have) significant evidence that female students do less than 3.6 hours of homework per classroom hour. e) Is there significant evidence females did a differing amount than 2.8 hrs of homework per classroom hr, on average? Use a significance level of 5%. Fill in the blanks in the table below. Highlight or circle answers you choose in the sentences. (10m) Output Hypotheses Test stat and df P-value and statement Decision in “Stats” Decision in “English” H 0 : H a : t= df = Pvalue stmt= Pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At a 5% significance level, there (is, is not) significant evidence that, f) State the 95% confidence interval for the hours of homework per course done by females. Does the confidence interval provided significant evidence that the female students do an average amount of homework that differs from 3.6 hrs? Output 95% confidence interval Fill in the following At the 5% significance level, since 3.6 hours is (within, not within) the confidence interval, we (have, do not have) significant evidence that the average hours of homework done per course by female 43
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
students differs from 3.6 hours. g) Highlight or circle answers you choose in the sentences below. (4m) In part a), the test is (left, two, right ) sided. In part c), the test is ( left , two, right) sided. In part e), the test is (left, two , right) sided. h) In the problems above, we assumed a random sample of females was taken from a large population of females. We also were able to do the problem because our sample size of 35 was “large” (greater than 30) and we did not have to assume a normal population of female data. Nevertheless, it is of interest to examine a normal probability plot of the female data, as 35 might be considered a borderline “large” sample, and we would like to know that there is nothing flagged in our sample data that suggests a population that is highly non-normal. Create the normal probability plot for HHF and explain whether it suggests that the population of female homework hours is normally distributed. NORMAL PROBABILITY PLOT FOR HHF (WEEKLY HOURS HOMEWORK FEMALES) 21. A two-week remedial program was designed for students who scored less than 50 on a math aptitude test. Nine students were selected at random, given the remedial course, and then tested again. Here are the results of these tests before and after the remedial course. You will find the data in the file MATHAPTITUDE. Before 47 41 40 32 45 47 39 48 45 After 53 58 58 36 40 49 35 60 45 Difference -6 -17 -18 -4 5 -2 4 -12 0 a) Find the mean and standard deviation of the differences. Use D = Before – After. Output Mean X D = Standard Deviation sD = 44
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b) Create a normal probability QQ plot of the sample data, and comment on whether it indicates that the population could be considered non-normal. (4m) c) State necessary assumptions to perform a test of hypothesis about the mean of the differences (and to create a confidence interval for the mean of differences). 1. 2. d) Use a hypothesis test to test whether the students performed better, on average, following the remedial program. Assume a significance level of 2%. Use D = Before - After. Here, since we are expecting before numbers to be less than after numbers, we expect the differences to be negative. We do a left sided test. Hypotheses Level of Significance Test stat and df P-value and statement Decision in “Stats” Decision in “English” H 0 : µ D H a : µ D D=Before - After α = T = df = Pvalue stmt= Pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At a 2% significance level, there (is, is not) significant evidence that students performed better, on average, following the remedial programme. By hand verification of test statistic t d * = d 0 s d / n d with degrees of freedom = n d – 1 = 8 By hand verification of p-value (bracket with t tables) p-value = P(t 8 < ) = P(t 8 > ) < P(t 8 > 1.964)) < (tables for 8 df) 45
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
e) Use a hypothesis test to test whether the students performed better, on average, following the remedial program. Assume a significance level of 2%. Use D = After - Before. Here, since we are expecting after numbers to be greater than before numbers, we expect the differences to be positive. We do a right sided test. Hypotheses Level of Significance Test stat and df P-value and statement Decision in “Stats” Decision in “English” H 0 : µ D H a : µ D D = After - Before α = T = df = Pvalue stmt= Pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At a 2% significance level, there (is, is not) significant evidence that students performed better, on average, following the remedial programme. f) Use a hypothesis test to test whether the students performed less well, on average, prior to the remedial program. Assume a significance level of 2%. Use D = Before – After. Here, since we are expecting before numbers to be less than after numbers, we expect the differences to be negative. We do a left sided test. Hypotheses Level of Significance Test stat and df P-value and statement Decision in “Stats” Decision in “English” H 0 : H a : D = Before - After α = T = df = Pvalue stmt= Pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At a 2% significance level, there (is, is not) significant evidence that students performed less well, on average, prior to the remedial programme. g) Use a hypothesis test to test whether the students performed less well, on average, prior to the remedial program. Assume a significance level of 2%. Use D = After – Before. Here, since we are expecting after numbers to be greater than before numbers, we expect the differences to be positive. We do a right sided test. Hypotheses Level of Significance Test stat and df P-value and statement Decision in “Stats” Decision in “English” H 0 : α = T = Pvalue stmt= (Reject, Do not reject) H 0 At a 2% significance level, there (is, is 46
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
H a : D = After - Before df = Pvalue= Pvalue ≤, > α not) significant evidence that students performed less well, on average, prior to the remedial programme. 22. Students at a University collected data from a random sample of 12 second year statistics students. Students reported their final grade in their last year of high school and their final grades in their first year of University. Data, including HS grades (HS), UNI grades (UNI) and HS-UNI differences (DIFFERENCE) can be found in the file HSUNIGRADEDEMO. HS 86 80 78 75 72 92 90 74 84 69 83 87 UNI 84 76 78 75 71 93 86 71 81 70 81 86 DIFFERENC E 2 4 0 0 1 -1 4 3 3 -1 2 1 a) A boxplot of the paired differences between the final grades for high school and University for the students is provided. The difference = “HS – UNI” was used. i) What 5 descriptive statistical measures can be read from the boxplot? (1m) ii) Can a boxplot of the differences tell us that the sample data distribution of the differences is normal? Why or why not? (2m) b) A dot plot of the sample differences appears below. Highlight or circle chosen answers within bracketed parts of the sentences below. (2m) The dotplot of the paired differences appears to be (uniform, normal) in distribution. We assume the shape of the population of paired differences is (uniform, normal, unknown) in order to perform statistical inference with this data. 47
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c)Create a frequency histogram of the sample data, and comment on whether it appears that the population could be considered normal. (4m) d) Create a normal probability QQ plot of the sample data, and comment on whether it indicates that the population could be considered non-normal. (4m) . e) State appropriate null and alternative hypotheses for examining the question of whether or not HS grades differ from UNI grades, on average. (2m) H 0 : µ d Where d = H a : µ d f) State necessary assumptions to perform a test of hypothesis about the mean of the differences (and to create a confidence interval for the mean of differences). 1. 2. 48
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
g) Use SOFTWARE to find the test statistic value and the p-value for this test. Report your results. (3m) Use α = 0.10. (Remember to state your α when you do a hypothesis test write-up!) Output Test statistic value Pvalue and statement h) Should you reject or not reject your null hypothesis if the level of significance is 0.05? Why? Is your result significant or not significant? (3m) Reject or Not Reject H 0 ? Why? Significant or Not? i) Create a 90% confidence interval for the average paired difference in grades for HS-UNI grades Output A 90% confidence interval for the average paired difference in grades for HS-UNI grades is ( , ) grade points. We are 90% confident that the average difference in the paired grades is between and grade points. j) Does your 90% confidence interval indicate that there is significant evidence (at the 10% significance level) that the average of the paired differences differs from 0? Why or why not? Because the interval (does, does not contain 0), there (is, is not) significant evidence (at the 10% significance level) that the average of the paired differences differs from 0. 23. Keyhole heart bypass surgery was performed on 8 patients and conventional surgery was performed on 10 patients. The length of hospital stay in days was recorded for all patients and is summarized in the following table. Sample # Surgery Method Sample Size Mean Standard Deviation 1 Keyhole 8 3.5 days 1.5 days 2 Conventional 10 8.0 days 2.0 days 49
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
We wish to test the hypothesis that recovery from keyhole surgery requires a shorter hospital stay than conventional surgery (that is, test the hypothesis that people undergoing conventional surgery spend longer in hospital, on average). Use a significance level of 1%. a) State necessary assumptions. 1. 2. 3. Check Variances: b) State the null and alternative hypothesis Hypotheses H 0 : µC - µK H a : µC - µK . c)Verify by hand that the pooled standard deviation is 1.7984, that the test statistic is 5.2752 and the df = 16 s p = ( n 1 1 ) s 1 2 + ( n 2 1 ) s 2 2 n 1 + n 2 2 = = 1.7984 t = ( X 1 X 2 ) ( μ ¿¿ 1 μ 2 ) S p ( 1 n 1 + 1 n 2 ) ¿ = = 5.2752 df = n 1 + n 2 2 = = 16 d) Use SOFTWARE to find the p-value and then make your decision using a 5% significance level. Output P-value Decision and Reason Decision in “English” Pvalue Statement P(t>5.28 ) pvalue= ( Reject , Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there ( is, is not) significant evidence that, on average, that keyhole surgery requires a shorter hospital stay. (i.e. conventional requires a longer hospital stay) 24. Keyhole heart bypass surgery was performed on 8 patients and conventional surgery was performed on 10 patients. The time spent on breathing tubes was recorded for all patients and is summarized in the following table. Sample # Surgery Method Sample Size Mean Standard Deviation 1 Keyhole 8 3.0 hours 1.5 hours 50
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2 Conventional 10 7.0 hours 2.7 hours Test the hypothesis that keyhole surgery reduces the time spent, on average, with breathing tubes. Use a significance level of 1%. a) State necessary assumptions. 1. 2. 3. Check Variances: b) State the null and alternative hypothesis Hypotheses H 0 : µK - µC H a : µK - µC c)Verify by hand that the pooled standard deviation is 2.2555, the test statistic is -3.7396 and the df = 16 s p = ( n 1 1 ) s 1 2 + ( n 2 1 ) s 2 2 n 1 + n 2 2 = = 2.2555 t = ( X 1 X 2 ) ( μ ¿¿ 1 μ 2 ) S p ( 1 n 1 + 1 n 2 ) ¿ = = -3.7396 df = n 1 + n 2 2 = = 16 d) Use SOFTWARE to find the p-value and then make your decision using a 5% significance level. Output P-value Decision and Reason Decision in “English” Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 1% significance level, there (is, is not) significant evidence that, on average, keyhole surgery reduces the time spend with breathing tubes. 25. In a study of internet users, the average time spent online per day was determined for a group of college graduates as well as for a group of non-college graduates. The following summary results were obtained. The data can be found in the file DAILYTIMEONLINE. 51
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Sample # Surgery Method Sample Size Mean Standard Deviation 1 College Graduates 14 8.6 hours 1.1 hours 2 Non-college graduates 12 6.3 hours 2.7 hours a)Fully and completely state the three assumptions that must be made to perform two independent sample t tests with this data. NOTE: This question provides practice with two independent sample t problems. The literature provides for two possible versions of this test; in one case, a “pooled” variance approach is made (the two variances are “close” in value) and in the other case an “unpooled” variance approach is made (the two variances are “not close” in value). Some instructors like to teach the pooled t approach as the assumption of equal variances is necessary in other tests that look at comparing the means of several populations. Other instructors point out that assuming equal variances is an assumption we would only make that if we were very familiar with our background population in general, and it is better to be safe than sorry. Your instructor will let you know what approach you should take in lecture. In lab, we will do all two independent sample problems with the assumption of “unpooled” variances. R allows for either approaches; the default is “unpooled”. For two independent sample t problems, R does all hypothesized differences of means in alphabetical order. (Questions 23, 24 and 26 do have examples that use that pooled approach if students ever need to reference them for any reason other than for lab.) b) Create side-by-side normal probability plots, histograms, boxplots, and dotplots for daily hours online for the college and non-college grads. In each case, explain whether the graph suggests that the populations can be considered normal or non-normal. (NOTE: histograms will be stacked rather than side-by-side). BOXPLOTS Interpret 52
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
. HISTOGRAMS Interpret . NORMAL PROBABILITY PLOTS Interpret . 53
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
DOTPLOT Interpret . c) BY HAND: Using the summary data from the question preamble, test by hand if there is significant evidence that on average, college graduates are online more than non-college graduates. Use R to find your p-value. Use the alternative H a : µCG- µNCG > 0. Use a level of significance of 5%. Hypothesis: H 0 : H a : Test statistic: t = ( x 1 x 2 ) ( μ ¿¿ 1 μ 2 ) s 1 2 / n 1 + s 2 2 / n 2 ¿ = df=min( n 1 1 ,n 2 1 )= (conservative formula used here) Pvalue Statement: P(t > 2.7611 ) pvalue= R output: Decision: ( Reject , Do not reject) H 0 Pvalue ≤, > α Conclusion: At the 5% significance level, there (is, is not) significant evidence that, on average, college graduates are online more than non-college graduates. d) Use R to find the summary statistics for daily online hours for the college and non-college graduates for the dataset DAILYTIMEONLINE. Note that we rounded the standard deviations to 1 decimal place in the question preamble. This means that questions done by hand will yield slightly different test statistics and pvalues than questions done by hand. Keep that in mind below. e) USING R: Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, college graduates are online more than non-college graduates. By default, the µ1– µ2 that R uses in a two independent samples t test uses the outcome name that starts with the letter than comes earlier in the alphabet first and the outcome name that starts with the letter than coms later in the alphabet second. Here it will use µCG- µNCG, and you can test with the alternative H a : µCG- µNCG > 0. Use the grouping column EDUCATION. Use a level of significance of 5%. 54
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OUTPUT: Hypothesis: H 0 : H a : Test statistic: (from R) t = df= (Welch’s formula from R used here) Pvalue Statement: Pvalue = (from R) Decision: (Reject, Do not reject) H 0 Pvalue ≤, > α Conclusion: At the 5% significance level, there (is, is not) significant evidence that, on average, college graduates are online more than non- college graduates Students should note that their test statistic and their pvalue will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom). f) BY HAND: Using the summary data from the question preamble, test by hand if there is significant evidence that on average, college graduates are online less than non-college graduates. Use R to find your p-value. Use the alternative H a :µNCG-µCG < 0. Use a level of significance of 5%. Hypothesis: H 0 : H a : Test statistic: t = ( x 1 x 2 ) ( μ ¿¿ 1 μ 2 ) s 1 2 / n 1 + s 2 2 / n 2 ¿ = df=min( n 1 1 ,n 2 1 )= (conservative formula used here) Pvalue Statement: pvalue= 0.009259704 R output: Decision: (Reject, Do not reject) H 0 Pvalue ≤, > α Conclusion: At the 5% significance level, there (is, is not) significant evidence that, on average, non-college graduates are online less than college graduates 55
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
g) USING R: Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, non-college graduates are online less than college graduates. You must use R to solve this problem with a “workaround”. Note that if µNCG – µCG is < 0, then µCG – µNCG > 0. So, if using R, you must instead test with the alternative H a : µCG – µNCG > 0 with the grouping column EDUCATION but write your conclusion to match the wording of the question. Use a level of significance of 5%. OUTPUT: Hypothesis: H 0 : µNCG - µCG ≥ 0 H a : µNCG - µCG < 0 But we test the equivalent H 0 : µCG - µNCG 0 H a : µCG - µNCG > 0 Test statistic: (from R) t = df= (Welch’s formula from R used here) Note: the test statistic of 2.7376 is obtained when we do right sided test with H a : µCG - µNCG > 0. If we could do a left sided test with H a : µNCG - µCG < 0 in R, our test statistic value would be -2.7376. Pvalue Statement: Pvalue = (verify from R) Decision: (Reject, Do not reject) H 0 Pvalue ≤, > α Conclusion: At the 5% significance level, there (is, is not) significant evidence that, on average, non- college graduates are online less than college graduates Students should note that their test statistic and their pvalue will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom). h) USING R: An alternative way to set up to do the test in part g) is to rewrite your grouping variable outcome names so that the outcome name that comes first alphabetically is the one for non-college graduates. In the column EDREORDER, NCG is relabeled 1NCG and CG is relabeled 2CG. Then, you can ask R to use the column EDREORDER to solve the test with the alternative H a : µ1NCG – µ2CG < 0. Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, non-college graduates are online less than college graduates. Use a level of significance of 5%. 56
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Hypothesis: H 0 : µ1NCG - µ2CG ≥ 0 H a : µ1NCG - µ2CG < 0 Test statistic: t = - 2.7376 df=14.054 (Welch’s formula from R used here) Pvalue Statement: P(t < -2.7376) = 0.007933 Pvalue = 0.007933 Decision: ( Reject , Do not reject) H 0 Pvalue ≤, > α 0.0 07933 0.05 Conclusion: At the 5% significance level, there ( is, is not) significant evidence that, on average, non-college graduates are online less than college graduates Students should note that their test statistic and their pvalue will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom). i) BY HAND: Using the summary data from the question preamble, test by hand if there is significant evidence that on average, time online differs for college graduates and non-college graduates. Use R to find your p- value. Use the alternative H a : µCG- µNCG ≠ 0. Use a level of significance of 5%.That way, you can work with a positive test statistic. Hypothesis: H 0 : µCG - µNCG = 0 H a : µCG - µNCG ≠ 0 Test statistic: t = ( x 1 x 2 ) ( μ ¿¿ 1 μ 2 ) s 1 2 / n 1 + s 2 2 / n 2 ¿ = df=min ¿¿ )= (conservative formula used here) Pvalue Statement: pvalue= R output: Decision: (Reject, Do not reject) H 0 Conclusion: At the 5% significance level, there (is, is not) significant 57
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Pvalue ≤, > α evidence that, on average, the time spent online by college and non-college graduates differs. j) USING R: Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, the amount of online time differs for college and non-college graduates. Use the alternative H a : µCG- µNCG ≠ 0 with the column EDUCATION. Use a level of significance of 5%. That way, you can work with a positive test statistic. Hypothesis: H 0 : µCG - µNCG = 0 H a : µCG - µNCG ≠ 0 Test statistic: t = df= Pvalue Statement: pvalue= Decision: (Reject, Do not reject) H 0 Pvalue ≤, > α Conclusion: At the 5% significance level, there (is, is not) significant evidence that, on average, the time spent online by college and non-college graduates differs. Students should note that their test statistic and their pvalue will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom). k) BY HAND: Using the summary data from the question preamble, calculate a 95% confidence interval for µCG- µNCG, the difference, on average, between online hours of college and non-college graduates. Use R to calculate your t α / 2 ,n 1 . Is there evidence that time online differs for college graduates and non-college graduates? Why or why not? Answer by filling in the blanks in the sentences below. (4m) A 95% confidence interval is: ( x 1 x 2 ) ±t α / 2 ,n 1 s 1 2 n 1 + s 2 2 n 2 = = = ( 0.4662574, 4.1334726) hours (verify) You will need t 0.025,11 from R: A 95% CI for the average difference in online daily hours between college and non-college graduates is ( ) hours. We are 95% confident that the interval contains the true population mean 58
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
difference in online daily hours, on average, between college and non-college graduates. At the 5% significance level, since 0 __ _____ in the confidence interval, there _ significant evidence that there is a difference in online daily hours between college and non-college graduates l) USING R: Using R with the dataset DAILYTIMEONLINE, calculate a 95% confidence interval for µCG- µNCG, the difference, on average, in online hours of college and non-college graduates for the dataset DAILYTIMEONLINE. Verify that it closely matches the confidence interval you found in part k). Use the column EDUCATION. (2m) A 95% CI for the average difference in online daily hours between college and non-college graduates is ( ) hours. We are 95% confident that the interval contains the true population mean difference in online daily hours, on average, between college and non-college graduates. At the 5% significance level, since 0 __ ____ in the confidence interval, there _ __ significant evidence that there is a difference in online daily hours between college and non-college graduates Students should note that their confidence interval will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom). 26. Two independent samples of students are taken, 35 of whom identify as male and 35 of whom identify as female. The number of hours of homework the students do per classroom hour is recorded. You wish to test, at the 5% significance level, whether there is significant evidence that the amount of homework done on average by female identifying students differs from the amount of homework done on average by male identifying students. Data is in the file HOURSHOMEWORKGENDER. SOFTWARE was used to create stacked frequency histograms for the homework hours of each group. a) Fully and completely state the three assumptions that must be met in order to perform a two sample t confidence interval 59
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
test for the difference µHHM-µHHF. (3m) 1. 2. 3. b) Highlight or circle correct answers below. (4m) The shape of the histogram of sample data for females looks (uniform, normal, skewed right, skewed left). The shape of the histogram of the sample for males appears (uniform, normal, skewed right, skewed left). The shape of the SAMPLING distribution of the test statistic is (approximately normal, unknown). We must assume both the female and male student populations are both normal in order to create confidence intervals and perform hypothesis tests with this data. (true, false) c) Create a normal probability QQ plot of the sample data, and comment on whether it indicates that the population could be considered non-normal. d) Calculate descriptive statistics (sample sizes, mean and standard deviations suffice) for the hours of homework performed by the female and male students in our dataset. 60
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Output e) Calculate the ratio of the largest standard deviation to the smallest standard deviation for these two groups of students. Are the standard deviations for the two histograms of sample data close enough to consider using a pooled variance when creating confidence intervals and performing hypothesis tests for the difference of means with this set of data? (2m) f) Calculate a 95% confidence interval for µHHF - µHHM, the difference, on average, in hours of homework per classroom hour, for female and male identifying students using a pooled variance . Is there evidence that there is a difference in the average homework hours per classroom hour for the two groups? Verify that you can obtain the following output. Why or why not? Answer by filling in the blanks in the sentences below. (4m) Output A 95% CI for the average difference in homework hours per classroom hour between female and male students is ____________________ hours. We are 95% confident that the interval contains the true population mean difference in hours of homework per classroom hour, on average, for male and female identifying students. At the ___ significance level, since 0 __________ in the confidence interval, there _ _____ significant evidence that there is a difference hours of homework per classroom hour, on average, for male and female identifying students. 61
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
g) Calculate a 95% confidence interval for µHHF - µHHM, the difference, on average, in hours of homework per classroom hour, for female and male identifying students using an unpooled variance . Is there evidence that there is a difference in the average homework hours per classroom hour for the two groups? Why or why not? Answer by filling in the blanks in the sentences below. (4m) Output A 95% CI for the average difference in hours of homework per classroom hours female and male single parent sole wage earners is ( ) ______. We are 95% confident that the interval contains the true population mean difference in hours of homework per classroom hour, on average, for male and female identifying students. At the ___% significance level, since 0 __(is, is not) in the confidence interval, there _(is, is not) __ significant evidence that there is a difference hours of homework per classroom hour, on average, for male and female identifying students. h) Use SOFTWARE output to perform a hypothesis test to test at the 5% significance level, whether, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at MacEwan differs. Use the difference µHHF- µHHM with a pooled variance. Verify that you can obtain the following output. Fill in answers below. Highlight or circle chosen answers. (10m) Output Hypotheses Level of Significance Test stat/df P-value Decision and Reason Decision in “English” H 0 : H a : α = t = df= Pvalue Statement pvalue = (Reject, Do not reject) Ho Pvalue ≤, > α At the ___% significance level, there (is , is not) significant evidence that, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at 62
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
MacEwan differs. i) Use SOFTWARE output to perform a hypothesis test to test at the 5% significance level, whether, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at MacEwan differs. Use the difference µHHF- µHHM with an unpooled variance. Fill in answers below. Highlight or circle chosen answers. (10m) Output Hypotheses Level of Significance Test stat/df P-value Decision and Reason Decision in “English” H 0 : H a : α = t = df= Pvalue Statement pvalue = (Reject, Do not reject) Ho Pvalue ≤, > α At the ___% significance level, there (is , is not) significant evidence that, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at MacEwan differs. j) What are your degrees of freedom with the two independent samples pooled variances t test? What are your degrees of freedom with the two independent samples unpooled variances t test? Are they the same? Why or why not? Degrees of Freedom: Pooled variances t test = Degrees of Freedom: Unpooled variances t test = Discuss: Students should note that the nearly identical t-values and p-values obtained with the pooled variances and non- pooled variances approach in this example are not typical. In this case, this occurs because the denominator of the test statistic is similar for both cases and because the degrees of freedom are close for both cases. Students should also note that assuming equal variances means that an additional assumption about the populations must be made in order to perform a test. This is not an assumption made lightly, and would only be made in real life if we were very familiar with our background populations in general. Some instructors and researchers prefer not to teach the pooled t approach as this extra assumption introduces an additional possibility of error. Some instructors and researchers like to teach the pooled t approach as the assumption of 63
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
equal variances is necessary in other tests that look at comparing the means of several populations. Your instructor will let you know which approach she or he prefers. k) Use SOFTWARE output to perform a hypothesis test to test at the 5% significance level, whether, on average, the homework hours performed per classroom hour by female identifying students is more than the homework hours performed per classroom hour by male identifying students. Use an unpooled variance. Use the difference µHHF- µHHM. Fill in answers below. Highlight or circle chosen answers. (10m). Output Hypotheses Level of Significance Test stat/df P-value Decision and Reason Decision in “English” H 0 : H a : α= 0.05 t = df= Pvalue Statement pvalue= (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5% significance level, there (is, is not) significant evidence that, on average, the homework hours performed per classroom hour is greater for female identifying students than male identifying students. 27. Consider the STATISTICSSTUDENTSSURVEYFORR file. For this problem, assume we can consider the students in introductory statistics who were surveyed for the file STATISTICSSTUDENTSSURVEYFORR to be a random sample drawn from a larger population of students in introductory statistics classes in all universities in Alberta. a) Students who take introductory statistics classes at MacEwan are a mix of students who are pursuing a BS (biology, mathematics, chemistry, etc) and students who are pursuing a BA (psychology, sociology, etc) . Your instructor found information at Statistics Canada that indicated that if we consider only students pursuing an undergraduate BA or BS degree, 43% of them will be pursuing a BS degree and 57% of them will be pursuing a BA degree. Find a 99% confidence interval to determine whether the proportion of BA students in universities in Alberta differs from 57%. Test using SOFTWARE and then complete the table below by hand. Use the normal approximation (as in your textbook). In your interval, use 3 decimals for your proportions and 1 decimal for your percents. Output 64
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Assumptions (4M) State in full and show work when checking. ASSUMPTIONS: 1. Simple Random Sample (stated in problem) 2. BA: # successes =24 > 5, # failures = 36 ≥ 5 3. Normal Approximation Interval (2M) Confidence Interval: ( , ) = ( , )% Interpretation (4M) At the _______ significance level, the 99% CI (does , does not) cover 57%, so we (do, do not) have significant evidence that the proportion of introductory statistics university students pursuing a BA in Alberta differs from 57%. c)Use SOFTWARE to perform a hypothesis test to determine, at a significance level of 1%, whether there is evidence that the proportion of Alberta university introductory statistics students who are pursuing a BA differs from 57%. Post your output and then write up your test in the spaces provided. Use the normal approximation (as in your textbook). (Note that a test of proportion that uses the normal approximation with a Z test statistic is equivalent to a χ 2 test with #outcomes -1 = degrees of freedom and that both tests yield identical p-values. Software such as R returns a χ 2 test statistic value. Your instructor will mention this to you in class and may or may not provide more details according to time constraints.) Output Write up by hand: i) Hypotheses: H 0 : versus H a : ii) Assumptions: 1. 2. iii) Test statistic: χ 2 = iv) Pvalue statement = , pvalue = for _____ degrees of freedom 65
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
v) Decision and reason for it: Since pvalue _________________________, we_________ H 0 . vi) English Statement: At the 1% significance level, we (do, do not) have significant evidence that the proportion introductory statistics university students pursuing a BA in Alberta differs from 57%. c) Assume that this data represents two independent random samples drawn from larger populations of students viewing Rachel Notley as a poor leader (no’s) and Rachel Notley as a good leader (yeses) for Alberta. Calculate and interpret a 95% confidence interval to determine whether the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a poor (not good) leader (no’s) in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader (yeses). Bold and italicize chosen answers. Estimate proportions separately, like in your textbook. Use the normal approximation (like in your textbook). Note: SOFTWARE, like the textbook , always estimates proportions separately for confidence intervals for a two-proportion confidence interval. (Note that this confidence interval can also be calculated by doing a normal approximation to a χ 2 test with degrees of freedom = (#rows -1)(#columns -1) and yields an identical confidence interval. Software such as R takes this approach. Your instructor will mention this to you in class and may or may not provide more details according to time constraints. We will use the latter approach here) Output 1 (using the two-sample proportion test approach in the software R). This output provides a confidence interval and proportion estimates. Output 2 (using the two-way table approach in the software R) This output provides the counts. 66
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Note that the software assigns the letter 1 to the “no” group and the letter 2 to the “yes” group by default. So for the No group, p 1 = pgraduate = 28.6% means that 28.6% (or 8 individuals) of the no group who thought Rachel Notley would not do a good job (do a poor job) were pursuing graduate/professional degrees and for the yes group, p 2 = pgraduate = 59.4% means that 59.4% (or 19 individuals) of the yes group who thought Rachel Notley would do a good job were pursuing graduate/professional degrees. Fill in the table below. Assumptions (4M) State in full and show work when checking. ASSUMPTIONS: 1. 2. 3. No: #graduate/professional = , #undergraduates = (work shown) Yes: #graduate/professional = , #undergraduates = (work shown) Interval (2M) Confidence Interval: ( , ) = ( , ) % Interpretation (4M) At the 5% significance level, the CI (does, does not) cover 0, so we (do , do not) have significant evidence that the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a poor (not good) leader in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader. d. Use SOFTWARE to perform a hypothesis test to determine, at a significance level of 5%, whether there is significant evidence that the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a poor (not good) leader in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader. Note: SOFTWARE can calculate a z test statistic value using either a pooled estimate of the proportion or separate proportion estimates. We will use a pooled proportion as our estimate in order to match the textbook approach and use the normal approximation (like in your textbook). (Note that this Z test can also be done by doing a normal approximation to a χ 2 test with degrees of freedom = (#rows -1)(#columns -1) and that both tests yield identical p-values. Software such as R returns a χ 2 test statistic value. Your instructor will mention this to you in class and may or may not provide more details according to time constraints. We will use the latter approach here.) You will need the same output as above. Repaste it here for easy reference. Output 1 (using the two-sample proportion test approach in the software R). This output provides a confidence interval and proportion estimates. 67
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Output 2 (using the two-way table approach in the software R) This output provides the counts. Both pieces of output above provides the test statistic value, degrees of freedom, and p-value for the test. Note that the software assigns the letter 1 to the “no” group and the letter 2 to the “yes” group by default. So for the No group, p 1 = pgraduate = 28.6% means that 28.6% (or 8 individuals) of the no group who thought Rachel Notley would not do a good job (do a poor job) were pursuing graduate/professional degrees and for the yes group, p 2 = pgraduate = 59.4% means that 59.4% (or 19 individuals) of the yes group who thought Rachel Notley would do a good job were pursuing graduate/professionals degrees. Fill in the table below. i) Hypotheses: H 0 : versus H a : ii) Assumptions 1. 2. 3. (each cell in the contingency table has more that 5 observations) 68
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
No: Yes: iii) Test statistic: χ 2 = iv) Pvalue statement = pvalue: = df= v) Decision and reason for it: Since pvalue ___________________, we __________ H 0 . vi) English statement: At the 5% significance level, we (do, do not) have significant evidence that the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a poor (not good) leader in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader. 28. The overall proportion of all Albertans who voted for the 4 major parties In Canada in the 2015 Federal election follow. Conservative Green Liberal NDP 0.6062 0.0257 0.2496 0.1185 *The Alberta proportions were calculated by your instructor using data online at Elections Canada. Interested students can google and look at the statistics provided there. These include ONLY Albertans who voted for one of the 4 main parties. This will allow us to make our comparisons sensibly. (Your instructor notes that out of all votes cast, 98.26% of Albertans and 94.53% of Canadians voted for those 4 parties. The Canadian percent is lower because of those voting BQ. No BQ candidates ran in Alberta). We consider, somewhat artificially, the Statistics 151 student data in the FULLPOSTELECTIONSURVEYFORR file to be a random sample from a much larger hypothetical population of Alberta students. We make a null hypothesis that the preferred political platform proportions of the population of Alberta students matches the overall Albertan voting proportions. Perform a goodness of fit test to test this hypothesis. Choose α = 0.05 as your level of significance. a) State your null and alternative hypothesis. (2m) Let 1 be Conservative, 2 Green, 3 Liberal, 4 NDP H 0 : H a : b) State your assumptions and comment on their validity. (3m) 1. 2. Comment: 69
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c) The table below includes the observed counts for chosen best platform (see the BestPlat column) for students in the file FULLPOSTELECTIONSURVEYFORR. Fill in the following by hand. Use 4 decimals. Test Statistic: PARTY p Oi Observed Counts, O i Expected Counts, E i = 446p 0i ( O i E i ) 2 E i Conservative 0.6062 130 270.3652 72.8732 Green 0.0257 23 Liberal 0.2496 220 NDP 0.1185 73 TOTALS 1.0001 446 446 198.2667 Test statistic = total of cell contributions = χ 2* = Σ Σ ( O i E i ) 2 E i = 198.2667 Degrees of Freedom = # choices – 1 = k -1 = 4 – 1 = 3 d) Run the SOFTWARE to verify your test statistic and degrees of freedom and to find you p-value. Paste it below. (4m) Output Counts and Test: e) Report your test statistic, degrees of freedom, and your p-value. Include a full p-value statement. (4m) Test Statistic Degrees of freedom P Statement PValue χ 2* = f) Does the evidence lead to rejection of your null hypothesis? Why or why not? (2m) Reject or Fail to Reject H 0 ? Why or why not? g) Fill in the following sentence. (3m) At the 5% significance level, we (have , do not have) significant evidence that ____________________________________________________________________________________ h) Comment on the results. What cells contributed greatly to the test statistic value? What cells did not? How did observed counts differ from expected counts? 70
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
29. The proportions of blood types O, A, B and AB in a general population of a particular country are in the ratio 49:38:9:4, respectively. This means if a person is randomly selected from this country, the probability of having type O is 49/100 = 0.49, the probability of having type A is 38/100 = 0.38, the probability of having type B is 9/100 = 0.09 and the probability of having type AB is 4/100 = 0.04. A research team investigating a small isolated community in this country obtained the following frequencies of blood type: O for 87 individuals, A for 59 individuals, B for 20 individuals and AB for 4 individuals. Test the hypothesis that the proportions in this community do not differ significantly from those in the general population. Use a significance level of 0.05. You are provided with a data file of individual responses called BLOODTYPES.xls in which the column labeled BloodType contains individual blood types for the 170 people in the research study. a) State your null and alternative hypothesis. (2m) Let 1 be type A, 2 be type AB, 3 be type B, and 4 be type 0 H 0 : p 0TypeA = , p 0TypeAB = , p 0TypeB = , p 0TypeO = H a : H 0 is not true (at least one of ….. b) State your assumptions and comment on their validity. (3m) 1. 2. Comment: c) Use SOFTWARE to find your observed counts and perform a goodness of fit test to test this hypothesis. Paste your output below. (4m) Output: (Counts and Test) d) Report your test statistic, degrees of freedom, and your p-value. Include a full p-value statement. (4m) Test Statistic Degrees of freedom P Statement PValue χ 2* = df P(χ 2 > ) e) Does the evidence lead to rejection of your null hypothesis? Why or why not? (2m) Reject or Fail to Reject H 0 ? Why or why not? 71
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
f) Fill in the following sentence. (3m) At the % significance level, we _____________ significant evidence that _______________________________________________________________________________________ g) The table below includes the observed counts for the blood types for people in the small community that can be found in the BLOODTYPE Excel file. Fill in the following by hand to verify your SOFTWARE results. Use 4 decimals. Test Statistic: PARTY p Oi Observed Counts, O i Expected Counts, E i = 170p 0i ( O i E i ) 2 E i Type A 0.38 59 Type AB 0.04 4 Type B 0.09 20 Type O 0.49 87 TOTALS 1 170 Test statistic = total of cell contributions = χ 2* = Σ Σ ( O i E i ) 2 E i = Degrees of Freedom = # choices – 1 = h) Comment on the results. What cells contributed greatly to the test statistic value? What cells did not? How did observed counts differ from expected counts? 30. A random sample of people are surveyed and then classified according to their age group (below 40, 40 or older) and the likelihood that they would purchase an electronic device that could be calibrated to help them find lost keys. Perform a test of independence to determine if age group and likelihood of purchasing the device are related. Use a significance level of 5%. a) State your null and alternative hypothesis. (2m) H 0 : H a : 72
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b) State your assumption and comment on their validity. (4m) 1. 2. at Comment: c) Set up your problem in SOFTWARE . Check that it matches the data in the file KEYS, as below. AGE GROUP VERY UNLIKELY UNLIKELY NEUTRAL LIKELY VERY LIKELY <40 6 12 20 30 15 83 40+ 5 8 26 42 30 111 TOTALS 11 20 46 72 45 194 d) From SOFTWARE, paste your output and then fill in the following tables. Expected Counts Output Expected Values AGE GROUP VERY UNLIKELY UNLIKELY NEUTRAL LIKELY VERY LIKELY Totals <40 4.71 8.56 19.68 30.80 83 40+ 6.29 11.44 26.32 41.20 111 Totals 11 20 46 72 45 194 Cell Contributions Output Cell Contributions AGE GROUP VERY UNLIKELY UNLIKELY NEUTRAL LIKELY VERY LIKELY <40 0.36 1.39 0.01 0.20 40+ 0.27 1.04 0.00 0.02 e) Paste test output and then report your test statistic, degrees of freedom, and pvalue in the table. Include a full p-value statement. (4m) 73
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Test Output Test Statistic Degrees of freedom P Statement PValue χ 2* = P( ) Pvalue = f) Does the evidence lead to rejection of your null hypothesis? Why or why not? (2m) Reject or Fail to Reject H 0 ? Why or why not? g) Fill in the following sentence. (3m): At the _____ significance level, there _____ significant evidence that _____________________________________________________________________________________ 31. Joseph Lister was a 19 th century surgeon who pioneered the use of disinfectants to reduce infection rates. Over the years, he performed 75 amputations; 40 using carbolic acid as a disinfectant and 35 without any disinfectant. The following results were obtained: Patient Patient Died Survived Total With carbolic acid Without carbolic acid 6 34 16 19 40 35 Total 22 53 75 At the 98% confidence level, determine if the patient survival or death is independent of whether or not carbolic acid was used during surgery. a) State your null and alternative hypothesis. (2m) H 0 : H a : b) State your assumption and comment on their validity. (4m) 1. 74
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2. Comment: (both hold for this example) c) Use SOFTWARE with the file LISTER to obtain the expected values, the cell contributions to the test statistic, and the test output. Use Disinfectant as the row variable and Patient_Status as the column variable. (If using R, students should note that they can do this problem using the data file or, alternatively, by typing data counts into a two-way table, and that they do not need the data file. Please see the commands file for further details.) Paste output in the provided boxes below. (3m) Expected Counts Output Cell Contributions Output Test Output c) Fill in the following table to perform the test (3m). Test stat/df P-value Decision and Reason Decision in “English” χ 2 = df = Pvalue Statement Pvalue = (Reject, Do not reject) H 0 Pvalue ≤, > α At the 2% significance level, there (is, is not) significant evidence that the status of the patient (survive or die) is not independent of whether or not the carbolic acid was used during surgery. That is, the patient status is dependent on if the carbolic acid is used. 32. Consider the data set STATISTICSSTUDENTSSURVEYFORR. a) Create a scatter plot to illustrate the relationship BEFPULSEMIN and BEFBREATHMIN. Let BEFPULSEMIN be the dependent (response) variable and X = BEFBREATHMIN be the independent (explanatory) variable. Scatterplot 75
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b) Comment on the strength, direction and form you see on the scatterplot. c) Find the line of best fit (regression line) for this data. Output: Line of Best fit: d) Give an interpretation of intercept and slope in the context of this problem. Intercept: Slope: e) Find the correlation between the two variables BEFPULSEMIN and BEFBREATH min. r = r 2 = (from output above) r = __________________ from the correlation matrix output below 76
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
f) Perform a test to test the null hypothesis that there is no linear relationship between pre survey breath rate and pre survey pulse rate against the two-sided alternative. That is, determine if there is significant evidence that the slope is non-zero. Use the output above to help you fill in some of the answer. Use a significance level of 5%. You do not have to state assumptions. (SEE OUTPUT ABOVE) Hypothes es Test statistic and degrees of freedom Pvalue statement and Pvalue Decision: Reject or Fail to Reject H 0 . Why? English conclusion: significance or non significance. H 0 : β = 0 H a : β ≠ 0 t* = df = For df = At the 5% significance level, there ( is , is not) significant evidence to support the alternative that the slope is nonzero. 33. Students in three treatment groups (no fertilizer, liquid fertilizer, pellet fertilizer) monitored and recorded the number of days that it took until they could perform the first pruning for a basil plant they were growing in a class experiment. They then calculated the average and standard deviation of days to the time of first pruning for their treatment groups (generally between about 35 days to 49 days. Data can be found in the file BASIL. We will perform an ANOVA test to determine if the mean growth differs significantly in at least one of the groups. Our level of significance will be 5%. a) State your null and alternative hypotheses and your level of significance. Draw a picture of the background populations to show what you mean. H 0 : versus H a : α = b) State your assumptions. 1. 2. 3. 4. (c) Use SOFTWARE to obtain an ANOVA table from which you can determine if at least one mean differs from the others. Paste your output below. Output 77
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(d) State your test statistic, degrees of freedom and p-value. F* ¿ Degrees of freedom p-value statement and p-value: (e) Use software to find the critical value for your level of significance, paste your output, and state the critical region. Critical Region: (f) Does the evidence lead you to reject or fail to reject your null hypothesis? Explain. Critical Value approach: Pvalue Approach: (g) State whether you have significant evidence to conclude that at least one of the population treatment means differs from the others. We conclude that there (is, is not) significant evidence that at least one of the population treatment means (mean days to first pruning for the 3 treatments (fertilizers) ) differs from the others. 34. Allan taught three sections of Stat 151 labs during Fall 2019. The lab times were Tuesday AM, Thursday AM, and Monday AM. At the end of the semester, all students in Stat 151 wrote a common lab final exam worth 61 marks. The lab grades for all the students in all three of Allan’s sections can be found in the Excel file ALLANSLABEXAMGRADESFALL2019. The column TimeOfDay contains the time the exam was written (TuesdayAM, ThursdayAM or MondayAM) and the column Grades contains the student’s grade on the lab exam out of 61. Allan was curious to see if the mean grades from the lab exams on different days would be the same. Assuming these results are representative of populations of students writing at the indicated times, perform a one-way ANOVA test to determine if the mean grades are equal at these various times. Use a 0.05 significance level. Use SOFTWARE to compute the ANOVA table including the test statistic and p-value. 78
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a) State your null and alternative hypotheses and your α.. H 0 : versus H a : α = b) State your assumptions. 1. 2. 3. 4. (c) Use SOFTWARE to obtain an ANOVA table from which you can determine if at least one mean differs from the others. Paste your output below. Output d) Use the software output to fill in the following ANOVA table. Test stat/df P-value Decision and Reason Decision in “English” F = df = Pvalue Statement P(F > ) Pvalue = (Reject, Do not reject) H 0 Pvalue ≤, > α At the 5 % significance level, there (is, is not) significant evidence e) Use software to find the critical value for your level of significance, paste your output, and state the critical region. Critical Region: Reject H 0 if f) Does the evidence lead you to reject or fail to reject your null hypothesis? Explain. 79
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Critical Value approach: Pvalue Approach: g) State whether you have significant evidence to conclude that at least one of the population treatment means differs from the others. We conclude that there (is, is not) significant evidence that __________________________________________________________________________________________ (It is interesting to observe that the Tuesday am class didn’t do as well as the Monday am or Thursday am class.) 80
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help