Demo Questions and Solutions V63 (1)
docx
keyboard_arrow_up
School
Carleton University *
*We aren’t endorsed by this school
Course
102
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
80
Uploaded by MegaRiver12287
DEMO QUESTIONS and Solutions: STATISTICS 151 LAB
Please title, label, and include appropriate scales on graphs, charts, plots and tables.
1. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column ALBBEST, a variable that measures which Alberta political party students think had the best platform (Conservative, Green, Liberal, NDP). a) What type of variable is ALBBEST?
Categorical (not ordinal)
b) Use SOFTWARE to tally the counts and percents for the various outcomes for this data. Fill in the chart below.
SOFTWARE OUTPUT
Party
Count
Percent
Conservative
21
35.0000%
Green
5
8.3333%
Liberal
5
8.3333%
NDP
29 (HIGHEST COUNT) (MODE)
48.3333%
60
99.9999%
c) Use the information to help you fill in the table below (by hand).
Frequency
Relative Frequency (f/n)(Proportion)
Conservative
21
21/60= 0.3500 etc.
Green
5
0.0833
Liberal
5
0.0833
NDP
29
0.4833
TOTAL
60
0.9999
1
d) Use the information above to draw a frequency and percent bar chart for describing this data. Frequency Bar Chart
Percent Bar Chart
e) The mode is the outcome (choice) of a categorical value that occurs most often (i.e. has the highest count). In this case the mode for our column of ALBBEST is NDP
as that bar on the graphs has the highest count of 29 students. 2. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column LIKEENGLISH, a categorical ordinal variable that measures how much students like learning the discipline English at school (1_Dislike_Very_Much, 2_Dislike, 3_Neutral, 4_Like, 5_Like_Very_Much).
a) What kind of a variable is LIKEENGLISH?
Categorical Ordinal
b) Use SOFTWARE to create a bar chart and a pie chart to illustrate the counts of the data for the various choices in the LIKEENGLISH variable. Paste them below.
c) The mode is the most common outcome (choice) of a categorical value. This outcome has highest bar in its 2
bar chart. Here, we have a tie
, as both 2_Dislike and 3_Neutral
have the same count of values (and the same
height of bar in their bar chart.
d) Do you prefer the bar chart or the pie chart for illustrating this data. Explain your choice.
The bar chart works nicely for ordinal data, as most people readily think from least to most (or lowest to highest) (or 1_dislike_very_much to 5_like_very_much). It requires a bit more cognitive work to move your mind among the outcomes (choices) of the categorical variable when we look at the pie chart.
3. Allan recently recorded the raw scores obtained by 20 of his students on a calculus midterm. You are interested in the number of students in the following classes (also called intervals or bins). [40,45), [45, 50), [50, 55), [55,60), [60,65), [65,70). Data can be found in the file ALLANSMIDTERMDATA.
a) A [ bracket means that the left end of the interval is closed, while a ) bracket means that the right end of the
interval is open. For example, this means that a value of 45 exactly would be assigned to the interval [45, 50).
To what interval would a value of 60 be assigned?
[60,65)
b) Cutpoints are values that occur at the left [ edge and the right ) edge of each interval. For example, the cutpoints for the interval [45,50) are 45 and 50. What are the cutpoints for the interval [60,65)?
60 and 65
c) The midpoints of an interval are the points in the middle of an interval. For example, the midpoint of the interval [40,45) is 42.5. What is the midpoint of the interval [60,65)?
62.5
R, by default, makes intervals of the form (a,b]. But the default used by the textbook (and in lecture) if making a histogram by hand, is intervals of the form [a,b). Histograms will be similar, but not exactly the same, depending on whether they are generated by R or by hand.
d) Use SOFTWARE to make a frequency histogram of this data. It will automatically choose the cutpoints on the edges of each interval and midpoints in the middle of each interval, and in this case, it happens to choose the ones that interest Allan. (This is not always the case!) Allan’s midterm data is pasted below. Students can verify that R uses intervals (a,b]. That is, R uses the intervals (40,45], (45, 50], (50, 55], (55,60], (60,65], (65,70]
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4.Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column WKHRSNEWS (a variable that measures weekly hours of news consumed by students) and the column RNGOOD (a variable that measures whether the student believes Rachel Notley is doing a good job).
a) What kind of variables are WKHRSNNEWS and RNGOOD?
WKHRSNEWS is numerical (continuous) and RNGOOD is categorical (non-ordinal).
b) Create stacked percent histograms for this data (one histogram for each value of the variable RNGOOD). Paste your output below. c) Create side by side boxplots for this data. Paste your output below.
4
d) Calculate descriptive measures of center and dispersion for this data. Paste your output below. e) What measures of centrality and dispersion are best used to describe this data.
Since the data is somewhat symmetric and does not exhibit notable skew or show outliers, it is appropriate to use the means and standard deviations as descriptive measures for this data.
f) STATISTICSSTUDENTSSURVEYFORR also contains the column WKHRSMUSIC (a variable that measures weekly
hours of music consumed by students and the column UNDERGORGRAD (a variable that measures whether the student is pursuing an undergraduate or graduate degree). What kind of variables are WKHRSNMUSIC and
UNDERGORGRAD?
WKHRSMUSIC is numerical (continuous) and UNDERGORGRAD is categorical (non-ordinal).
g) Create stacked percent histograms for this data (one histogram for each value of the variable UNDERGORGRAD). Paste your output below. 5
h) Create side by side boxplots for this data. Paste your output below.
i) Calculate descriptive measures of center and dispersion for this data. Paste your output below. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
j) What measures of centrality and dispersion are best used to describe this data. Since the data is notably right skewed, it is appropriate to use the medians, IQRs (and ranges) to describe the data.
k) STATISTICSSTUDENTSSURVEYFORR also contains the column BEFPULSEMIN (a variable that measures student pulse in beats per min (bpm) before doing the survey) and the column UNDERGORGRAD (a variable that measures whether the student is pursuing an undergraduate or graduate degree). Create side by side boxplots for this data. Paste your output below.
Before Pulse Minutes for those pursuing Undergrad or Grad Studies
l) Fill in the blanks or bold your choices in the brackets in the sentences. The median pulse rate for students pursuing an undergraduate degree is (below, the same, above
) the median pulse rate for students pursing a graduate/professional degree.
The data for those pursuing a graduate/professional degree has an upper whisker that is (less spread out than, about the same spread as, more spread out than
) than the upper whisker for those pursuing an undergraduate degree. The percent of data in the interquartile range for the graduate/professional data is (lower than, the same
as
, higher than) the percent of data in the interquartile range for the undergraduate data. 5. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the columns BEFPULSEMIN (a variable that measures students’ pulse rate (beats per minute) prior to doing the survey) and BEFBREATHMIN (a variable that measures students’ breaths per minute prior to doing the survey). A scatterplot of these two variables can be found below. 7
a) What is the range of data for the breaths per minute for the students? 12 to 24
b) What is the range of data for the pulse rate (beats per minute) for the students? 62 to 85
c) Is there a discernable pattern to this data? Why do you think you see what you see?
There is a clear upward trend to the data. As breaths per minute increase, pulse beats per minute increase. It makes sense that people who are hurrying to arrive at class would have higher respiration and breath rates together, whereas people who are more relaxed would have lower respiration and breath rates together. d) Do you see any discernable outliers (values that do not fit to the pattern you see in the data)? Suggest a possible reason for your answer.
There are no discernable outliers. In real life, it is unusual for a person to have a high pulse rate and a low respiration rate, or vice versa. Our data mirrors what we would expect to find.
e) Find the equation for the line that would fit best through the points (line of best fit) for this data. Use 2 decimal places.
OUTPUT:
EQUATION: BEFPULSEMIN = 46.19 + 1.51 BEFBREATHMIN
f) Predict the pulse rate minutes for a student who takes 18 breaths per minute. Are you comfortable with this prediction? Why or why not?
8
BEFPULSEMIN = 46.19 + 1.510 (18) = 73.37 BEATS PER MINUTE
We are comfortable because the line has a fit to the line appears good, and 18 breaths is in the middle of the range of predictor values (12 to 24) for the BEFBREATHMIN predictor variable.
g) Predict the pulse rate minutes for a student who takes 5 breaths per minute. Are you comfortable with this prediction? Why or why not?
BEFPULSEMIN = 46.19 + 1.510 (5) = 53.75 BEATS PER MINUTE
The range of the predictor values of BEFBREATHMIN is only from 12 to 24 for our data, and 5 breaths per minute is far below the lowest values of 12 in our range. We should not extrapolate so far outside our range
of values and we are not comfortable with our prediction.
h) Find the correlation for this data. Correlation is a measure between -1 and 1 that indicates how well the points fit to the line. It is positive if the slope of the line of best fit is positive, and negative if the slope of the line of best fit is negative. A number close to 1 or -1 for correlation means a strong correlation. A number closer to 0 means a weak correlation.
Output:
Answer: r = 0.828
i)
Bold the correct answers below.
The correlation between these two variables is (mild, moderate, strong
) and (
positive
, negative).
The slope of the line of best fit for this data is (
positive
, negative).
The correlation of two variables and the slope of the line of best fit through the two variables will always (be
different, be the same
).
6. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column MOALPHABET (a variable that measures the age (in months) at which students were able to recite the alphabet in their first language). a) Create a frequency histogram of this data. Comment on its shape. Note any peaks.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The middle 3 bars of the histogram are somewhat uniform, with a slight peak at the 30 to 35 month bar. The bars on the left and right of the graph contain very few values. Although it might be tempting to view this data as normal, the drop off from the middle of the graph to the left and right is not a tidy tapering bell shape, but more extreme. (or something sensible)
b) Create a boxplot of this data. Explain what the whiskers tell us about the shape of the data. Are there any outliers?
The whiskers are _______ in size to the interquartile range. That is, the range in data between the minimum and the first quartile is _____________ as the range in data between the maximum and the third quartile and the range in data of the interquartile range is also ___________. ______outliers are noted.
c) Create a normal probability plot of this data. Comment on your findings.
10
It would be correct to say that as the points don’t deviate too far from a line, there is not a lot of evidence of non-normality in the data when we examine the probability plot. This answer suffices.
However:
It would also be correct to notice the point off the line at the left side and the increased spacing between points on the right side and wonder about “outliers”.
It would also be correct to notice the slight spikiness about the line in the middle and suggest it might indicate that the data is not a “perfect” bell over the middle of the data. A sensible answer is what we seek!
7. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. You will create some crosstab tables from this dataset and calculate some probabilities. For all probability questions, include a probability statement, and a solution in fractions, proportion (3 decimals) and percent (1 decimal). a) STATISTICSSTUDENTSURVEYFORR contains the columns FEDBEST (a variable that measures what federal party a student views as having the best party platform (Conservative, Green, Liberals, or NDP) ) and JTGOOD (a variable that indicates whether a student feels Justin Trudeau is doing a good job as Prime Minister (Yes or No). Find the totals for each of the outcomes in the FEDBEST column. Find the totals for each of the outcomes in the JTGOOD column. Make a crosstab table of the counts for each of the (JTGOOD, FEDBEST) pairs. Use this information to fill in the provided table by hand. If we randomly select a student, what is the probability that they are a Liberal platform fan?
JT GOOD
FEDBEST (BEST FEDERAL PARTY)
11
(JUSTIN TRUDEAU GOOD)_
Conservative
Green
Liberal NDP
Total
No 10
3
1
11
25
Yes
5
2
23 YandL
5
35
Total
15
5
24
16
60 overall
P(Liberal) = 24/60 = 0.400 = 40%
b) If we randomly select a student, what is the probability that they are a Liberal party platform fan and think that Justin Trudeau is doing a good job? INTERSECTION – BOTH OCCUR TOGETHER
P(Liberal and Yes) = 23/60 = 0.383 = 38.3%
c) If we randomly select a student, what is the probability that they are a Liberal platform fan or think that Justin Trudeau is doing a good job? UNION – ONE OR THE OTHER OR BOTH
P(Liberal or Yes) = P(Liberal) + P(Yes) – P(Liberal and Yes) = 24/60 + 35/60 - 23/60 = 36/60 = 0.600 = 60.0%
d) If we randomly select a Liberal platform fan, what is the probability that they think Justin Trudeau is doing a good job? INFORMAL: LOOKING FOR YESES IN THE LIBERAL GROUP
P(Yes|Liberal) = 23/24 = 0.958 = 95.8%
e) If we randomly select a Justin Trudeau fan (yes group), what is the probability they are a Liberal platform fan? INFORMAL: LOOKING FOR LIBERALS IN THE YES GROUP
P(Liberal|Yes) = 23/35 = 0.657 = 65.7%
f) STATISTICSSTUDENTSURVEYFORR contains the columns GENDERIDENTITY (a variable that measures what gender student most identify with (Female or Male or Other). We wish to consider just the females in the STATISTICSSTUDENTSURVEYFORR dataset. Make a crosstab table of the counts for each of the (JTGOOD, FEDBEST) pairs for just the females. Paste it below. Notice that it does not match the crosstab table calculated
in a). Fill in the cell values and the totals by hand and then solve the problems below.
SOFTWARE OUTPUT FOR FEMALES ONLY
JT GOOD
(JUSTIN TRUDEAU
GOOD)_
FEDBEST (BEST FEDERAL PARTY)
Conservative
Green
Liberal
NDP
Total
No 6
3
1
7
17
Yes
3
2
14
3
22
Total
9
5
15
10
39
CONTROL: GENDERINDENTITY, ROW JTGOOD, COL FEDBEST
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
g) If we randomly select a female student, what is the probability she is a Liberal platform fan and thinks that Justin Trudeau is doing a good job? HINT: Consider just the females in the data set.
For a female student, P(Liberal and Yes) = 14/39 = 0.359 = 35.9%
h) If we randomly select a female student, what is the probability she is a Liberal platform fan or thinks that Justin Trudeau is doing a good job? HINT: Consider just the females in the data set.
For a female student, P(Liberal or Yes) = (15 + 22 – 14)/39 = 23/39 = 0.590 = 59.0%
i) If we randomly select a student, what is the probability they are a Liberal platform fan or think Justin Trudeau is doing a good job, given they are female
. FORMAL HINT:(Consider just the females in the data set)
For a female student, P(Liberal or Yes) = (15 + 22 – 14)/39 = 23/39 = 0.590 = 59.0%
Students may write P((Liberal or Yes)|Female) – that would be fine j) Consider only the females. If we randomly select a Liberal platform fan, what is the probability she thinks Justin Trudeau is doing a good job. INFORMAL: FOR FEMALES, LOOK FOR YESES IN THE LIBERAL GROUP
For a female student, P(Yes|Liberal) = 14/15 = 0.933 = 93.3%
k)Consider only the females. If we randomly select a Justin Trudeau fan (yes group), what is the probability she
prefers the Liberal platform? INFORMAL: FOR FEMALES, LOOK FOR LIBERALS IN THE YES GROUP
For a female student, P(Liberal|Yes) = 14/22 = 0.636 = 63.6%
l)If we randomly select a female Liberal platform fan, what is the probability she thinks Justin Trudeau is doing a good job? P(Yes|Female and Liberal) = 14/(1+14) = 14/15 = 0.933 = 93.3%
(note: there may be other ways to solve this problem)
13
Questions 8 and 9: Students may find the following pictures and summary translations of English wording to Statistics wording useful in doing Questions 8 and 9. BINOMIAL: Helpful Calculation Pictures
BINOMIAL: Helpful Translations : English to Math
Less than 8
P(X
7)
At most 7
P(X
7)
No more than 7
P(X
7)
7 or less
P(X
7)
More than 6
P(X
7) = 1 – P(X
6)
At least 7
P(X
7) = 1 – P(X
6)
No less than 7
P(X
7) = 1 – P(X
6)
7 or more
P(X
7) = 1 – P(X
6)
Between 5 and 9 inclusive
P(5
X
9) = P(X
9) – P(X
4)
More than 5 but less than 9
Between 5 and 9 not inclusive
P(6
X
8) = P(X
8) – P(X
5)
14
8. Consider the data at https://en.wikipedia.org/wiki/Demographics_of_Canada#Visible_minority_population
Suppose we survey 1000 Canadians in 2016. Find the probabilities below, artificially assuming (for the sake of the problem) that we can view each question as a 2016 binomial experiment with 1000 trials where the probability of success is equal to the probability of belonging to the visible minority population of interest expressed in the problem. Give full answers. Use 6 decimal places in your answer. (BTW, the 2016 Canadian census profile at https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/page.cfm?
Lang=E&Geo1=PR&Code1=01&Geo2=PR&Code2=01&Data=Count&SearchText=canada&SearchType=Begins&
SearchPR=01&B1=All&TABID=1
is a super interesting browse)
In 2016, find the probability
that:
FULL SOFTWARE OUTPUT SETUP/PROB STATMT/ CALCULATION (
CONCLUDING
STATEMENT (
a) between 30 and 39 (inclusive) identify as
Black. FOR BIN(1000,0.035),
P(30
X
39) = P(X
39) – P(X
29) = 0.7838592 – 0.1724819 = 0.6113773
The probability that between
30 and 39 (inclusive) identify as Black is 0.6113773. b) more than 52 but less than 55 identify as
FOR BIN(1000,0.051),
P(53
X
54) = P(X
54) –
The probability that more than 52 but less than 55 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chinese. P(X
52) = 0.6981531 – 0.5933113 = 0.1048418
identify as Chinese is 0.1048418.
c) at most 15 identify
as Latin American. FOR BIN(1000,0.013),
P(X
15) = 0.7647689
The probability at
most 15 identify as Latin American is 0.7647689.
d) at least 22 identify as Filipino. FOR BIN(1000,0.023),
P(X
22) = 1 - P(X
21) =
1 – 0.3875311 =
0.6124689
The probability that at least 22 identify as Filipino is 0.6124689.
e) no less than 5 but no more than 7 identify as Korean. FOR
BIN(1000,0.005),
P(5
X
7) = P(X
7) –
P(X
4
) = 0.8671524 – 0.4400534 = 0.427099
The probability that no less than 5 but no
more than 7 identify as Korean is 0.427099.
f) no more than 3 identify as Japanese. FOR BIN(1000,0.003),
P(X
3) = 0.6472321
The probability that no more than 3 identify as Japanese is 0.6472321.
g) no less than 50 identify as aboriginal FOR BIN(1000,0.049),
P(X>=50) = 1 – P(X<=49) = 1 – 0.5379076 = 0.4620924
The probability no less than 50 identify as aboriginal is 0.4620924.
9. Consider a binomial experiment where 100 independently randomly chosen University students are asked a yes/no question where the probability of success, p, is equal to the probability a person answered yes. Use SOFTWARE to answer the following questions fully. Use 7 decimal places in your answer.
16
Output SETUP/
PROB STATMT/ CALCULATION CONCLUDING STATEMENT a)Yes/No Question: Can you program in C? p=15% said yes (after all, it is a University!) Find the probability that between 12 and 18 students (inclusive) can program in C.
FOR BIN(100,0.15),
P(12
X
18) = P(X
18) – P(X
11) = 0.8371746 – 0.1634862
= 0.6736884
The probability that between 12 and 18 students (inclusive) can program in C is 0.6736884.
b)Yes/No Question: Do you have a vegetable garden? 28% said yes (and everyone who did had too much zucchini and rhubarb!) Find the probability that more than 26 students but less than 31 students have a vegetable garden. FOR BIN(100,0.28),
P(27
X
30) = P(X
30) – P(X
26) = 0.7149122 – 0.3748065 = 0.3401057
The probability that more than 26 but less than 31 students have a vegetable garden is.
c)Yes/No Question: Have you ever parachuted? P=12% said yes (everyone else wished they had!). Find the probability that at most 9 students said yes.
FOR BIN(100,0.12),
P(X
9) =
0.2256519
The probability at most 9 students have
ever parachuted is 0.2256519.
d)Yes/No Question: Are you an Oilers fan? P = 82% said yes (some students aren’t from Edmonton!) Find the probability that at least 85 students said yes. FOR BIN(100,0.82),
P(X
85) = 1- P(X
84) = 1 – 0.7370156 = 0.2629844
The probability that at least 85 students are Oilers fans is 0.2629844.
e)Yes/No Question: Do you believe that an animal called a “Pink Fairy Armadillo” exists? P = 24% said yes (BTW, it does exist, but students got fooled by the silly name!)
Find the probability that no less than 20 but no more than 30 students believe that a “Pink Fairy Armadillo” exists.
FOR BIN(100,0.24),
P(20
X
30) =
P(X
30) – P(X
19) =
0.9331086 – 0.1453154 = 0.7877932
The probability that no less than 20 but no more than 30 students believe a “Pink Fairy Armadillo” exists is 0.7877932.
f)Yes/No Question: Do you find Jeeves and Wooster a better show than Fawlty Towers? P = 52% said yes (that is a difficult question!) Find the probability that no more than 46 students find Jeeves and Wooster a better show than Fawlty Towers. FOR BIN(100,0.52),
P(X
46) =
0.1354949 The probability that no more than 46 students find Jeeves and Wooster a better
show than Fawlty Towers is 0.1354949.
g)Yes/No Question: Have you made a TikTok? 63% said yes (everyone else planned
to make one!) Find the probability that no less than 60 students have made a TikTok
FOR BIN(100,0.63),
P(X
60) = 1 – P(X
59) = 1 – 0.2330151 = 0.7669849
The probability no less than 60 students have made a TikTok is 0.7669849.
10A. Assume that the wingspan of a certain type of monarch butterfly is normally distributed with a mean of 10.3 cm and a standard deviation of 0.6 cm. Answer the following. (3m each)
17
a) Find the probability that a monarch butterfly has a wingspan of no less than 11.2 cm. b) Find the probability that a monarch butterfly has a wingspan of at most 8 cm. c) A monarch butterfly with a wingspan in the bottom 2% of wingspans is considered “exquisite”. Find the highest wingspan that a monarch butterfly considered exquisite will have. d) A monarch butterfly with a wingspan in the top 1 % of wingspans is considered “extraordinary”. Find the wingspan that a monarch
butterfly would need to have to be considered extraordinary. 1. SOFTWARE OUTPUT SETUP,PSTATEMENTS,
CALCULATION CONCLUDING STATEMENT
a)
For N(10.3, 0.6), P(X > 11.2) = 1 – P(X < 11.2) =
1 – 0.9331928 = 0.0668072
The probability that a monarch butterfly has a wingspan of no less than 11.2 cm is 0.066807.
b)
1b)For N(10.3, 0.6), P(X<8) = 0.00006320923 The probability that a monarch butterfly has a wingspan of at most 8 cm is 0.000063
c)
For N(10.3,0.6), want x such that P( X < x) = 0.02
The highest wingspan that an “exquisite” monarch butterfly could have is 9.067751 cm.
d)
For N(10.3,0.6) want x such that
P(X > x)= 0.01
the same x for which P(X< x)= 0.99 A monarch butterfly will need a wingspan of 11.69581 cm or greater to be considered “extraordinary”
10B.
To encourage basil plants to be \bushy" and contain many leaves, it is common practice among
growers to prune away the top pair of leaves every time a branch grows to contain 3 pairs of leaves:
after this, two new branches will form. This pruning pattern is repeated every time three more pairs
of leaves grow on any new branch. The time to first pruning of basil plants is known to be
normally
distributed with an average of 35 days and a standard deviation of 3 days. Answer the following. Include
full probability statements, calculations, and conclusions. Time to first pruning for a basil plant is N( 35, 3)
a
)
Find the probability a randomly chosen basil plant has its first pruning before it is 30 days old.
Distributions>Continuous Distributions >Normal Distributions>Normal Probabilities
Variable value(s): 30
Mean: 35
Standard deviation: 3
For N(35, 3) Find P(X < 30)
= 0.04779035
The probability a randomly chosen basil plant has its first pruning before it
18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
is 30 days old is 0.04779035.
b) Find the probability a randomly chosen basil plant has its first pruning after it is 39 days old.
Distributions>Continuous Distributions >Normal Distributions>Normal Probabilities
Variable value(s): 39
Mean: 35
Standard deviation: 3
For N(35, 3) Find P( X > 39) = 1 – P(X <39) = 1 – 0.9087888 = 0.0912112
The probability a randomly chosen basil
plant has its first pruning after it is 39 days old is 0.0912112.
c) Kathleen and her students grow basil plants. They collect seeds from early starter basil plants that have their first pruning at a young age (plants in the bottom 5% of days old for their first pruning). Find the number of days (or younger) of an early starter basil plant.
Distributions>Continuous Distributions >Normal Distributions>Normal Quantiles
Probabilities: 0.05
Mean: 35
Standard deviation: 3
For N(35, 3) Want ? such that
P(X < ?)=0.05
The number of days for an early starter basil plant is ? = 30.06544 days (or younger). d) Sometimes a student finds they have a late starter plant that does not reach its age of
first pruning until a later age (plants in the top 2% of days old for first pruning). Find the number of days (or older) of a late starter basil plant.
Distributions>Continuous Distributions >Normal Distributions>Normal Quantiles
Probabilities: 0.98
Mean: 35
Standard deviation: 3
For N(35, 3) Want ? such that P(X >
?) = 0.02 Want ? such that P(X <
?) = 0.98
The number of days for
a later starter basil 19
plant is 41.16125 days (or older). 11. A pizza delivery company advertises that they will give you your pizza for free if it takes them more than a certain number of minutes to deliver it to you. They wish to only have to give away free pizzas at most 0.5% of
the time, and their delivery time follows a normal distribution with a mean of 20 minutes and a standard deviation of 3 minutes. They wish to advertise a number in whole minutes. a)
What number of minutes will they advertise? Why? (4 m)
SOFTWARE OUTPUT SETUP,PSTATEMENTS,CALCULATION
CONCLUDING STATEMENT
For N(20,3),
Want x such that P(X > x) = 0.005 The same x for which P(X<x) = 0.995.
A pizza delivered after 27.72749 minutes would be late. Since the company wants to advertise whole minutes and they want to give
away free pizzas at most 0.5% of the time, they should advertise 28 minutes. b)
What is the probability a pizza will is delivered between 22 and 25 minutes?
SOFTWARE OUTPUT SETUP,PSTATEMENTS,CALCULATION
CONCLUDING STATEMENT
For N(20,3), P(22 < X < 25) = P(X < 25) – P(X < 22)
= 0.9522096 – 0.7475075
= 0.2047021
The probability a pizza will is delivered between 22 and 25 minutes is 0.2047021.
12. R/Rcmdr can be used to generate random samples for various probability distributions including the normal distribution. This was done to generate a dataset of 1000 samples of size 4 (that is, randomly sample 1000 samples of size 4) from a normal parent population distribution with a mean µ = 100 and a standard deviation σ = 24. A random seed of 2348 was used. The results were stored in an Excel file called normalmean100sigma24seed2348samples1000n4. The first five lines of this data file are shown below.
Open this file to access the resulting sample data so as to answer the following questions. (NOTE: students are not required to know how to generate random data in this fashion – only how to do similar calculations to the problems below).
20
a) Recall that a sampling
distribution of sample means when taking all possible samples of size 4 from a population with mean µ and standard deviation σ will have a mean of µ = 100 and a standard deviation of σ
√
n
=
24
√
4
= 12.
b) Use SOFTWARE to calculate the mean and standard deviation of the column of 1000 samples means that you generated from your samples of size 4. Report your answers. c) The mean and standard deviation of the column of 1000 sample means will “approximate” the actual mean and standard deviation of a sampling distribution of all possible sample means of size 4 taken from your normal parent population. Are the mean and standard deviation of your 1000 sample means close to a mean of µ = 100 and a standard deviation of σ
√
n
= 24
√
4
= 12?
100.3076 is out by 0.3076 units. 11.6581 is out by 0.2419 units. They are pretty close.
d) Use SOFTWARE to create a frequency histogram of the column of 1000 sample means in your dataset. This histogram shape will “approximate” the sampling distribution shape of all possible sample means of size 4 from your normal parent population. Paste it below and comment on the shape of it. Does your frequency distribution appear normal? Why or why not? Graph
Comment: Yes, the histogram looks quite normal. It has a somewhat symmetric bell shape (the peak is a little out) with tails on both sides that are not too long.
e) Use SOFTWARE to calculate the median of your column of 1000 sample means that you generated you’re your samples of size 4. Does the median of the sample dataset appear close to the mean µ = 100 of your normal parent population? (Recall that for a normal distribution, the mean and median of the population are equal.) 21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The median of 99.97404 is close to 100. It is 0.02596 units apart.
f) A boxplot can flag problems that might indicate that a dataset is non-normal (although it cannot indicate if a dataset distribution appears normal as a histogram can). Use SOFTWARE to create a boxplot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your boxplot below.
g) We will see if the boxplot gives any indication that your dataset is non-normal.
1) Check the scale on your boxplot and report (roughly) the values of any outliers the boxplot shows. (minimum, Q1, median, Q3, and maximum) of the dataset. 2) Are the lengths of the whiskers relatively equal? 3) Is the box from Q1 to Q3 roughly split in two by the median? 4) Do any of your findings above give you an indication that your dataset might be non-normal?
1) There are a few “outliers” identified. Upper outliers range (roughly) from 135 to 140. Lower outliers range (roughly) from 62 to 68. 2) Yes, the lengths of the whiskers appear the same.
3) Yes, the median appears to split the box from Q1 to Q3 equally.
4) No, there is no flag that indicates that the data may be non-normal. There are only a very few computer identified “outliers” and they are all within 3-4 standard deviations of the middle of the data. The data
is balanced about its middle. However, we must remember that a balanced boxplot with equal length whiskers does not “prove” normality. h) A normal probability plot that generates a relatively straight line of points when generated from a dataset indicates normality of the data in that dataset. Use SOFTWARE to create a normal probability plot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your normal probability plot below.
i) Does the normal probability plot
you created suggest that the sample means dataset of 1000 sample means that you generated
(from samples of size 4) is non-
normal? Why or why not?
There is no notable evidence that the generated dataset is not close to normal. The points are all close
to the line. The data does not suggest overt non-normality.
22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
j) Why were the mean and standard deviation of your dataset of 1000 samples not *necessarily* exactly µ and σ
√
n
? We only took 1000 random samples of size 4 from the parent population. We did not sample all possible samples of size 4 in the parent population.
k) If we have a normal parent population with mean µ and standard deviation σ, and we examine the shape of the sampling distribution of all possible means of size n taken from that parent population, what will we find? Consider different values of n in your answer.
The shape of the sampling distribution of sample means will be normal no matter the sample size n of the samples we take.
13. The
2
(pronounced Chi-square) distributions form
a family of right skewed distributions where a parameter called “degrees of freedom” determines where the peak of the distribution is and how skewed the distribution is. The mean of the
2
distribution is equal to the number of degrees of freedom. The variance of a
2
distribution is equal to two times the number of degrees of freedom. Here is a picture of a
2
distribution with degrees of freedom 5. It has mean µ = 5 and standard deviation
σ = √
σ
2
= √
10
= 3.162278 (to 6 decimals).
R/Rcmdr can be used to generate random samples for various probability distributions including the
2
distribution. The SOFTWARE was used to generate a dataset of 1000 samples of size 4 (that is, randomly sample 1000 samples of size 4) from a
2
parent population distribution with degrees of freedom 5 (hence a mean µ = 5 and a standard deviation σ = 3.162278). A random seed of 6292 was used. The SOFTWARE was also used to generate a column that contains the sample means for each of the 1000 samples. The results are stored in the Excel file chisquaredf5seed6292samples1000n4. The first five lines of this data file are shown below.
Open this file to access the resulting sample data so as to answer the following questions. (NOTE: students are not required to know how to generate random data in this fashion – only how to do similar calculations to the problems below).
a)
Recall that a sampling
distribution of sample means when taking all possible samples of size 4 from a population with mean µ and standard deviation σ will have a mean of µ = 5 and a standard deviation of
23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
σ
√
n
= 3.162278
√
4
=
¿
1.581139. b) Use SOFTWARE to calculate the mean and standard deviation of the column of 1000 samples means that you generated from your samples of size 4. Report your answers. c) The mean and standard deviation of the column of 1000 sample means will “approximate” the actual mean and standard deviation of a sampling distribution of all possible sample means of size 4 taken from your
2 parent population. Are the mean and standard deviation of your 1000 sample means close to a mean of µ = 5 and a standard deviation of σ
√
n
= 3.162278
√
4
= 1.581139?
4.968728 is out by 0.0.031272 units. 1.553921 is out by 0.027218 units. They are pretty close.
d) Use SOFTWARE to create a frequency histogram of the column of 1000 sample means in your dataset. This histogram shape will “approximate” the sampling distribution shape of all possible sample means of size 4 from your
2
parent population. Paste it to the left and comment on the shape of it. Does your frequency distribution appear normal? Why or why not? Comment: The histogram is right skewed with a shorter tail than the parent population. It is not normal (bell shaped).
e) A boxplot flags situations that might indicate that a dataset is non-normal (although it cannot indicate if a dataset distribution appears normal as a histogram can). Use SOFTWARE to create a boxplot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your boxplot to the left.
f) We will see if the boxplot gives any indication that your dataset is non-normal.
1) Check the scale on your boxplot and report (roughly) the values of any outliers the boxplot shows. (minimum, Q1, median, Q3, and maximum) of the dataset. 2) Are the lengths of the whiskers relatively equal? 3) Is the box from Q1 to Q3 roughly split in two by the median? 4) Do any of your findings above give you an indication that your dataset might be non-normal?
1) There are many “outliers” identified. They range from about 9 to 12. 2) No, the lengths of the whiskers are different. The upper whisker is noticeably slightly longer than the lower whisker.
3) No, the median does not split the box from Q1 to Q3 equally.
4) Yes, the many outliers, the whiskers of differing lengths and the size of the box above and below the median being of different heights all flag non-normal data. The 24
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
long upper tail (and its outliers) suggests the sample data may be right skewed.
g) A normal probability plot that generates a relatively straight line of points when generated from a dataset indicates that normality of the data in that dataset is an assumption that is not untoward. Use SOFTWARE to create a normal probability plot from the column of 1000 sample means (from the samples of size 4) in your dataset. Paste your normal probability plot to the left.
h) Does the normal probability plot you created suggest that the sample means dataset of 1000 sample means (from samples of size 4) that you generated might not be normal? Why or why not?
The points in the normal probability plot are not close to the line. This suggests that the data is non-normal. The shape of the normal probability plot that we see indicates the dataset is right skewed.
i) Why were the mean and standard deviation of your 1000 sample means (from samples of size 4) not *necessarily* exactly equal to µ = 5 and σ
√
n
= 3.162278
√
4
= 1.581139?
We only took 1000 random samples of size 4 from the parent population. We did not sample all possible samples of size 4 in the parent population.
25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
j) If we have a
2
parent population with mean µ and standard deviation σ, and we examine the shape of the sampling distribution of all possible means of size n taken from that parent population, what will we find? Consider different values of n in your answer. Did you find what you expected to find when n = 4? What is the name of the theorem you used to get your answer?
The shape of the sampling distribution of sample means will be “approximately” normal only when n is large, say above 30. In our case, our n is 4, and our sampling distribution of sample means still retains the shape of the parent population, as expected. The name of the theorem we used to get this answer is the Central Limit Theorem.
14. The SOFTWARE R/Rcmdr was used to generate a dataset of 1000 samples of size 36 (that is, randomly sample 1000 samples of size 36) from a
2 parent population distribution with degrees of freedom 5 (hence a mean µ = 5 and a standard deviation σ = √
σ
2
= 3.162278). A random seed of 7891 was used. The SOFTWARE was also used to generate a column that contains the sample means for each of the 1000 samples. The results
are stored in the Excel file chisquaredf5seed7891samples1000n36. The first five lines of this data file are shown below (not all the columns are shown).
Open this file to access the resulting sample data so as to answer the following questions. (NOTE: students are not required to know how to generate random data in this fashion – only how to do similar calculations to the problems below).
a) Recall that a sampling
distribution of sample means when taking all possible samples of size 36 from a population with mean µ and standard deviation σ will have a mean of µ = 5 and a standard deviation of σ
√
n
= 3.162278
√
36
= 0.527046. b) Use SOFTWARE to calculate the mean and standard deviation of the column of 1000 samples means that you generated from your samples of size 36. Report your answers. c) The mean and standard deviation of the column of 1000 sample means will “approximate” the actual mean and standard deviation of a sampling distribution of all possible sample means of size 36 taken from your
2
26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
parent population. Are the mean and standard deviation of your 1000 sample means close to a mean of µ = 5 and a standard deviation of σ
√
n
= 3.162278
√
36
= 0.527046?
5.013517 is out by 0.0.013517 units. 0.5268158 is out by 0.00023 units. They are pretty close.
d) Use SOFTWARE to create a frequency histogram of the column of 1000 sample means in your dataset. This histogram shape will “approximate” the sampling distribution shape of all possible sample means of size 36 from your
2
parent population. Paste it below and comment on the shape of it. Does your frequency distribution appear normal? Why or why not
Comment: The histogram is quite normal ((bell shaped) looking with one
noted outlier a bit off the right tail
e) A boxplot can flag situations that might indicate that a dataset is non-
normal (although it cannot indicate if a dataset distribution appears normal as a histogram can). Use SOFTWARE to create a boxplot from the column of 1000 sample means (from the samples of size 36) in your dataset. Paste your boxplot to the left, f) We will see if the boxplot gives any indication that your dataset is non-
normal.
1) Check the scale on your boxplot and report (roughly) the values of any outliers the boxplot shows. (minimum, Q1, median, Q3, and maximum) of the dataset. 2) Are the lengths of the whiskers relatively equal? 3) Is the box from Q1 to Q3 roughly split in two by the median? 4) Do any of your findings above give you an indication that your dataset might be non-normal?
1) There are three “outliers” identified. One has a value of 3.5 while the others range from about 5.5 to 7.3. 27
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2) The lengths of the whiskers are very similar.
3) The median does split the box from Q1 to Q3 equally.
4) No, there is no flag that indicates that the data may be non-normal. There are only a very few computer identified “outliers” and they are all within 3-5 standard deviations of the middle of the data. The data is balanced about its middle. However, we must remember that a balanced boxplot with equal length whiskers does not “prove” normality. g) A normal probability plot that generates a relatively straight line of points when generated from a dataset indicates that normality of the data in that dataset is an assumption that is not untoward. Use SOFTWARE to create a normal probability plot from the column of 1000 sample means (from the samples of size
36) in your dataset. Paste your normal probability plot to the left.
h) Does the normal probability plot you created suggest that the sample means dataset of 1000 sample means (from samples of size 36) that you generated is non-normal? Why or why not?
The normal probability plot does not suggest that the dataset is non-normal. The points are all close to the line with the exception of the very few identified “outliers” that are not extreme outliers.
i) Why were the mean and standard deviation of your 1000 sample means (from samples of size 36) not *necessarily* exactly equal to µ = 5 and σ
√
n
= 3.162278
√
36
= 0.527046?
We only took 1000 random samples of size 36 from the parent chi-square population. We did not sample all
possible samples of size 36 in the parent chi-square population.
j) If we have a
2 parent population with mean µ and standard deviation σ, and we examine the shape of the sampling distribution of all possible means of size n taken from that parent population normal what will we find? Consider different values of n in your answer. Did you find what you expected to find when n = 36? What
is the name of the theorem you used to get your answer?
The shape of the sampling distribution of sample means will have a with mean µ and standard deviation σ
√
n
,
and will be “approximately” normal when n is above 30. In our case, our n is 36, and our sampling distribution of sample means is approximately normal, as expected. The name of the theorem we used to get this answer is the Central Limit Theorem.
28
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
15. The weekly travel time for a population of students is normal with a mean of 200 minutes and a standard deviation of 10 minutes. a) What is the probability that a single student drawn from the population travels at least 196 minutes a week?
b) A random sample of 25 students is drawn from the population. What is the probability that the mean travel
time of the 25 students is no less than 196 minutes a week? c) Suppose the population above was of an unknown shape. Explain (using statistical reasoning) why or why not you would feel comfortable calculating the above probabilities. (2m)
SOFTWARE OUTPUT SETUP/FULL PROBABILITY STATEMENTS AND CALCULATION/CONCLUDING STATEMENT
(a) Normal with mean = 200 and standard deviation=10
For N(200,10), P(
X
> 196) = 1 – P(X <196) = 1 – 0.3445783 = 0.6554217
The probability that the student travels at least 196 minutes a week is 0.6554217.
(b) Normal with mean = 200 and
standard deviation = 10
√
25
=
10
5
=
2
For N(200,2), P(
X
> 196) = 1 – P(
X
<196) = 1 – 0.02275013 = 0.97724987
The probability that the average travel time of 25 randomly drawn students is no less than 196
minutes a week is 0.97724987.
(c) Acceptable Answer 1: Comfortable because n = 25 large enough sample size to apply the CLT.
Acceptable Answer 2: Not comfortable because n = 25 is not a large enough sample size to apply the CLT.
Acceptable Answer 3: Unsure because 25 is borderline to be a large enough sample size to apply the CLT.
16. A random sample of 1280 students is taken from a larger population and it reveals that the sample mean (average) amount for their student loans is $18,900.00. Suppose it is known that the standard deviation σ of the population is $40,222.00. Find a 90% confidence interval for the true population mean µ.
a)
What do you know about the mean, standard deviation, and shape of the sampling distribution of X ?
Why do we know the sampling distribution has the indicated shape?
Mean: µ = Unknown
Standard Deviation: σ
√
n
= $
40,222.00
√
1280
= $1,124.24 Approximate Shape of Sampling Distribution: Normal
Why: Sample Size is large (Central Limit Theorem)
b)
Calculate a 90% confidence interval for the population mean.
29
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The 90% confidence interval for μ
is: x± z
α
/
2
⋅
σ
√
n
90% confidence implies α
=
0.1
→
α
2
=
0.05
. So that z
α
/
2
=
z
0.05
=
z
value that captures top 5% and bottom 95% of Z-distribution. SOFTWARE gives z
0.05
=
1.644854
. Hence the CI is:
x± z
0.05
⋅
σ
√
n
=
18900
±
1.644854
⋅
1124.24
=
18900
±
1849.21
=(
$
17050.79
, $
20749.21
)
c)
Interpret your 90% confidence interval.
We are 90% confident that the true population mean falls within the interval ($ 17,050.79 , $ 20,749.21).
d)
Fill in the blanks
We (do, do not
) know if our particular interval contains the true population mean.
e) For a (1-α)% confidence level, we define a significance level that is equal to α%. The confidence level and the significance level add to 100%. Basically, for now, you can think of the significance level as our willingness to be wrong when we seek to determine if there is statistical evidence that a population mean differs from a given amount. What is our significance level for this problem?
(1-α)% = 90%, so α% = 10%
f) A friend wonders if the population mean differs from $20,000. What do you tell your friend?
At the 10% significance level, since $20,000 falls (
inside
, outside) the 90% confidence interval, we (do, do not
)
have significant evidence that the population mean differs from $20,000.
g) A friend wonders if the population mean differs from $30,000. What do you tell your friend?
At the 10% significance level, since $30,000 falls (inside , outside
) the 90% confidence interval, we (
do
, do not)
have significant evidence that the population mean differs from $30,000.
h) Perform a hypothesis test to determine if there is evidence that the population mean for student loans differs from $20,000.00. Use a level of significance of 10%.
You can calculate the z value
X
−
μ
σ
/
√
n
by hand, but SOFTWARE will find the p-value for this problem. Output for p-value
OR
30
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Answer: Fill in the table: Be sure to verify z = -0.97843957 and fill the remaining boxes.
Hypotheses
Test stat
P-value Decision and Reason
Decision in “English”
H
0
: µ = 20000
H
a
: µ ≠ 20000
X
= 18900
z = X
−
μ
σ
/
√
n
=
18900
−
20000
40222
/
√
1280
= -0.97843957
by hand
Pvalue Statement
2P(
X
< 18900)
=2(0.1639287) =
0.3278574 2P(Z<-.97843957)
=2(0.1639285)
=0.3278570
pvalue= 0.3265874 (Reject, Do not reject
) H
0
Pvalue
,
α
0.3278574 >
0.10
At the 10% significance level, there (is,
is not
) significant evidence that, the average population student loan differs from $20,000.
i) Perform a hypothesis test to determine if there is evidence that the population mean for student loans exceeds $17,000.00. Use a level of significance of 1%. You can calculate the z value X
−
μ
σ
/
√
n
by hand, but SOFTWARE will find the p-value for this problem. Recall x
= 18900, σ = 40222, σ
√
n
=
40222
√
1280
= 1124.4.
Output for p-value
OR
Answer: Fill in the table: Be sure to verify z = 1.69003199 and fill the remaining boxes.
USE R TO FIND p-value
Hypotheses
Test stat
P-value Decision and Reason
Decision in “English”
H
0
: µ = 17000
H
a
: µ > 17000
X
= 18900
z = X
−
μ
σ
/
√
n
=
18900
−
17000
40222
/
√
1280
= 1.69003199
by hand
Pvalue Statement
P(
X
> 18900)
= 0.04553406
P(Z>
1.69003199
)
= 0.04551092
pvalue = 0.04553406
(Reject, Do not reject
) H
0
Pvalue
,
α
0.04553406
0.01
At the 1% significance level, there (is, is not
) significant evidence that, the average population student loan mean exceeds $17,000.
j) Perform a hypothesis test to determine if there is evidence that the population mean for student loans lies below $21,000.00. Use a level of significance of 5%. 31
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
You can calculate the z value X
−
μ
σ
/
√
n
by hand, but SOFTWARE will find the p-value for this problem. Recall x
= 18900, σ = 40222, σ
√
n
=
40222
√
1280
= 1124.4.
Output for p-value
OR
Answer: Fill in the table: Be sure to verify z = -
1.86793008 and fill the remaining boxes.
Hypotheses
Test stat
P-value Decision and Reason
Decision in “English”
H
0
: µ = 21000
H
a
: µ < 21000
X
= 18900
z = X
−
μ
σ
/
√
n
=
18900
−
21000
40222
/
√
1280
= -
1.86793008
by hand
Pvalue Statement
P(
X
< 18900)
= 0.03090455
P(Z< -1.86793008)
=
pvalue= 0.03090455
(
Reject
,
Do not reject) H
0
Pvalue
,
α
0.03090455
0.05
At the 5% significance level, there (
is
, is not) significant evidence that, the average population student loan mean lies below $21,000.
17. A random sample of 1280 students is taken from a larger normal population and it reveals that the sample mean (average) amount for their student loans is $18,900.00 and the sample standard deviation is $45,192.00.
You are asked to find a 95% confidence interval for the true population mean.
a) What do you know about the mean, standard deviation, and shape of the sampling distribution
of X
¿
)? Why do we know the sampling distribution has the indicated shape?
(BY HAND)
Mean: µ = unknown Standard Deviation: σ
√
n
is unknown, because σ is unknown; Estimated Standard Deviation: s
√
n
=
45192
√
1280
=
1263.15
Approximate Shape of Sampling distribution: Normal (Because population is normal, X
is N(µ, σ
√
n
¿
and X
−
μ
σ
/
√
n
is N(0,1))
(Because population is normal, X
−
μ
s
/
√
n
is a t distribution with n – 1 df) 32
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(Because n is large, the t distribution is close to a normal shape of a N(0,1) distribution)
b)Calculate a 95% confidence interval for the population mean. Use software to help.
The 95% confidence interval for μ
is: x±t
n
−
1
,α
/
2
⋅
s
√
n
Note:
Use the critical value from the T-distribution since we are estimating σ with s. n=1280 implies df=n-1=1279; 95% confidence implies α
=
0.05
→
α
2
=
0.025
. So t
n
−
1
,α
/
2
=
t
1279,0.025
=
¿
value that captures top 2.5% and bottom 97.5% of T-distribution with df=1279. SOFTWARE gives t
1279,0.025
=
1.96182
. Hence the CI is: x±t
1279,0.025
⋅
s
√
n
=
18900
±
1.96182
⋅
1263.15
=
18900
±
2478.07
=(
$
16421.93
,$
21378.07
)
Output for t
0.025, 1279 c) Interpret your 95% confidence interval.
We are 95% confident that the true population mean falls within the interval ($ 16,421.93 , $ 21,378.07).
d) Fill in the blanks
We (do, do not
) know if our particular interval contains the true population mean.
e) For a (1-α)% confidence level, we define a significance level that is equal to α%. The confidence level and the significance level add to 100%. Basically, you can think of the significance level as our willingness to be wrong when we seek to determine if there is significant statistical evidence that a population mean differs from a given amount. What is our significance level for this problem?
(1-α)% = 95%, so α% = 5%
f)A friend wonders if the population mean differs from $20,000. What do you tell your friend?
At the 5% significance level, since $20,000 falls (
inside
, outside) the 95% confidence interval, we (do, do not
) have significant evidence that the population mean differs from $20,000.
g)A friend wonders if the population mean differs from $30,000. What do you tell your friend?
At the 5% significance level, since $30,000 falls (inside, outside
) the 95% confidence interval, we (
do
, do not) have significant evidence that the population mean differs from $30,000.
h) Perform a hypothesis test to determine if there is evidence that the population mean for student loans differs
from $20,000.00. Use a level of significance of 5% . You will do most of this problem by hand. You can calculate the test statistic t = x
−
μ
s
/
√
n
for this data by hand and find your pvalue can be found using SOFTWARE. 33
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Output for your pvalue
Hypotheses
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µ = 20000
H
a
: µ ≠ 20000
t
= X – μ
s
/
√
n
¿
18900
−
20000
45192
/
√
1280
= -0.8708355
df= 1279
Pvalue Statement
2P(t > 0.8708355 ) = 2P(t < -0.8708355)
pvalue= 2(0.1920037) = 0.3840074
(Reject, Do not reject
) H
0
Pvalue
,
α
0.3844636
0.05
At the 5% significance level, there (is, is not
) significant evidence that, the average population student loan differs
from $20,000.00.
i) Perform a hypothesis test to determine if there is evidence that the population mean for student loans is greater than $17,000.00. Use a level of significance of 5% . You will do most of this problem by hand. You can calculate the test statistic t = x
−
μ
s
/
√
n
for this data by hand and find your pvalue can be found using SOFTWARE. Output for your pvalue
Hypotheses
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µ
17000
H
a
: µ > 17000
t
= X
−
μ
s
/
√
n
=
18900
−
17000
45192
/
√
1280
= 1.50417
df = 1279
Pvalue Statement P(t > 1.50417)
pvalue = 0.06639224
(Reject, Do not reject
) H
0
Pvalue
,
α
0.06639224
0.05
At the 5% significance level, there
(is, is not
) significant evidence that, the average population student loan is greater than $17,000. j) Perform a hypothesis test to determine if there is evidence that the population mean for student loans is less than $22,000. Use a level of significance of 5% . You will do most of this problem by hand. You can calculate the test statistic t
= X
−
μ
s
/
√
n
for this data by hand and find your pvalue can be found using SOFTWARE. Output for your pvalue
34
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Hypotheses
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µ
22000
H
a
: µ < 22000
t
= X
−
μ
s
/
√
n
=
18900
−
22000
45192
/
√
1280
= -2.454173
df = 1279
Pvalue Statement
P(t < -2.454173)
pvalue = 0.007126715
(
Reject
, Do not reject) H
0
Pvalue
,
α
0.007126715
0.05
At the 5% significance level, there (
is
, is not) significant evidence that the average population student loan is less than $22,000.
18. In August 2009, an article in Canadian Family Physicians, “Health Practises of Canadian Physicians” indicated that large population of Canadian doctors reported, on average, a total of 281 minutes per week of exercise. Today, you take a random sample of 20 doctors in Canada, and ask them to report their minutes of exercise per week. Students should enter the data below into a column in SOFTWARE. The file DOCTOREXERCISEDEMO also contains this data. 291
298
287
269
266
298
281
283
292
306
311
268
276
283
285
316
294
296
279
266
a) Use SOFTWARE to create a 90% confidence interval for the average weekly minutes of exercise of doctors in
Canada today. At the 10% significance level, is there evidence that the average weekly minutes of exercise for Canadian doctors has changed from 281 minutes? Fill in the table below. Circle or capitalize the answers chosen in the sentences.
Output
Conf Level
Signif Level
Conf Interval
Sample Mean
90%
10%
(281.63, 292.87) minutes
287.25 minutes
Summary Statement
We can be 90% confident that the average weekly minutes of exercise taken by Canadian doctors today falls in this interval.
English Statement
At the 10% level of significance, because 281 minutes is (inside, outside
) the 90% confidence interval, there (
is
, is not) significant evidence that the average weekly minutes of exercise for Canadian doctors has changed from 281 minutes.
35
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
b) Fully and completely state the two assumptions that must be met to perform the one sample t confidence interval test above. (3m)
1. simple random sample
2. a) large sample (above 30) OR b) the population from which the sample was taken is normal OR c) both. c) A frequency histogram of the sample data appears below. Examine it, and then highlight or circle your chosen answers in the bracketed parts of the sentences below. (4m)
The shape of the histogram of the sample data
for doctor exercise minutes (does, does not
) appear normal as the spike of probability at the left side is (too low, too high
). The population distribution of weekly doctor exercise minutes is assumed (
normal
, skewed right, skewed left) to solve the problem.
A sample of size 20 (
will
, will not) give us some
indication of the shape of the population, but we’d like it to be (smaller, larger
). d) Indicate whether each of the following statements is true or false. (7m)
i. 90% of doctors exercise weekly for the number of minutes between the endpoints of your interval above.
False
ii. The width of a 99% Confidence Interval created from the data above would be larger than the width of a 90% CI. True
iii. The shape of a population distribution is always the same as the shape of a test statistic distribution.
False
iv. There is a probability of 90% that the true mean µ of weekly doctor exercise minutes is in the 90%
confidence interval created above.
False
36
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
v. A 90% confidence interval created from a sample of size 1000 from the population of weekly doctor exercise minutes would have a greater chance of capturing the true mean µ of weekly doctor exercise minutes than the 90% confidence interval you created with the sample of size 20 above.
False
vi. 90% of all intervals created with this formula with many many repeated samples of the same size from the population of weekly doctor exercise minutes will contain the true mean µ of weekly doctor exercise minutes.
True
vii. Today, we can be sure that the true mean µ of weekly doctor exercise minutes is 281 minutes.
False
e) Perform a hypothesis test to determine if there is evidence that the average weekly minutes of exercise for Canadian doctors has changed from 281 minutes. Use a level of significance of 10%.
Output
Hypotheses
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µ = 281
H
a
: µ ≠ 281
t = 1.9231
df= 19
Pvalue Statement
2P( t>1.9231 ) pvalue = 0.06959 (
Reject
, Do not reject) H
0
Pvalue
,
α
0.069659
0.10
At the 10% significance level, there (
is
, is
not) significant evidence that the mean weekly exercise minutes for doctors differs from 281 min.
f) Perform a hypothesis test to determine if there is evidence that the average weekly minutes of exercise for Canadian doctors exceeds 279 minutes. Use a level of significance of 5%.
Output
37
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Hypotheses
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µ
279
H
a
: µ > 279
t = 2.5385
df = 19
Pvalue Statement
P(t > 2.5385) pvalue = 0.01002 (
Reject
, Do not reject) H
0
Pvalue
,
α
0.01002
0.05
At the 5% significance level, there (
is
, is not) significant evidence that the mean weekly exercise minutes for doctors exceeds 279 min.
g) Perform a hypothesis test to determine if there is evidence that the average weekly minutes of exercise for Canadian doctors lies below 295 minutes. Use a level of significance of 5%.
Output
Hypotheses
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µ
295
H
a
:
< 295
t = -2.3846
df= 19
Pvalue Statement
P( t < -2.3846 ) pvalue= 0.01384 (
Reject
, Do not reject) H
0
Pvalue
,
α
0.01384
0.05
At the 5% significance level, there (
is
, is not) significant evidence that the mean weekly exercise minutes for doctors lies below 295 min.
h) We found a histogram of the sample data above, and noted that we assumed normality of the population to do the question even though the sample histogram indicated non-normal distribution bulk in the left side. Create a QQ plot of the sample data, and comment on whether it indicates that the population could be considered normal. (4m) 38
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Although the points on the normal probability plot are all within the bounds of the dotted lines and somewhat close to the line, the plot does flag the bulk in the left of the sample distribution
as the points spread out and go below and above the line, with the farthest point far from the others. There is more probability on the right end of the distribution than expected with the points spreading out away from each other towards the right end of the line. The sample data does suggest non-normality, but it does not
flag outliers or extreme skewness (which would be most concerning). It’s *possible* the population might be normal, but we do certainly
wonder, and it seems more likely it is non-
normal.
19. The file ALLANSMIDTERMDATA contains a column of midterm grades obtained by 20 students randomly selected from a Calculus class taught by Allan. a) Use SOFTWARE to obtain a 95% and 99% confidence interval for the midterm grades. Paste your output below. (2m)
b) Fill in the following tables, and in each case, test whether there is significant evidence that the true average score of all calculus students taking the midterm differs from a score of 49.5. (8m)
Conf Level
Signif Level
Conf Interval
Sample Mean
95%
5%
(49.9, 55.0)
52.46
Summary Statement
We can be 95% confident that the true mean grade for the calculus midterm is between 49.9 and 55.0.
English Statement
At the 5% level of significance, because 49.5 is (inside, outside
) the 95% confidence interval, there (
is
, is not) significant evidence that the true average midterm score on Allan’s calculus midterm differs from a score of 49.5.
Conf Level
Signif Level
Conf Interval
Sample Mean
99%
1%
(49.0, 55.9)
(49.0, 55.9)
Summary Statement
We can be 99% confident that the true mean grade for the calculus midterm is between 49.0 and 55.9.
English At the 1% level of significance, because 49.5 is (
inside
, outside) the 99% confidence interval, 39
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Statement
there (is, is not
) significant evidence that the average that the true average midterm score on
Allan’s calculus midterm differs from a score of 49.5.
c) Fully and completely state the two assumptions that must be met to perform the one sample t confidence interval test above. Also indicate whether you will do a z or a t test. (3m)
1. simple random sample
2. a) large sample (above 30 units) OR b) the population from which the sample was taken is normal OR c) both. IN THIS CASE, WE ASSUME A NORMAL POPULATION AS n is below 30.
We state whether the data indicates we need a σ or an s, and hence whether it is a Z or a t problem. We use an s to do our problem as the data is raw data and σ is not provided in the problem. s = 5.452223 can be found with the Statistics>Summaries>Numerical Summaries path in R with the MIDTERM column.
d) Create a QQ plot of the sample data, and comment on whether it indicates that the population could be considered normal. (4m) -2
-1
0
1
2
45
50
55
60
65
QQ plot of Calculus Midterm Grades
norm quantiles
Midterm Grades
15
16
From the graph, we can see that that most of the grades follow a straight-line pattern so the data may be following a normal distribution. (Note the “stray” points at the ends of the lines, however, which challenges us a bit about believing this.) Thus we *might* feel comfortable assuming the population is normally distributed which implies (via CLT) that the sample means are also normally distributed (at least approximately). Hence, we will apply the one sample t-
interval procedure to find confidence intervals for the data.
e) Perform a hypothesis test to determine if there is significant evidence that the true average score of all calculus students taking the midterm differs from 49.5. Use a level of significance of 1%. (10m)
Output
40
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1.Hypotheses
2.α
3.Test stat/df
4. P-value 5.Decision and Reason
6.Decision in “English”
H
0
: μ = 49.5
H
a
: μ ≠ 49.5 (two tailed test)
0.01
t = 2.4279
df = n – 1 = 20 – 1
= 19
Pvalue Statement
2P(t > 2.4279) pvalue= 0.02529 (Reject, Do not reject
) H
0
Pvalue
,
α
0.02529
0.01
At the 1% significance level, there (is, is not
) significant evidence that, the average grade on Allan’s calculus midterm differs from 49.5.
f) Fill in the following blanks. A (1 – α)% confidence interval test will always give the same
decision as an α%
test of significance.
g) Perform a hypothesis test to determine significant evidence that the true average score of all calculus students taking the midterm is more than 50. Use a level of significance of 5%. (10m)
Output
1.Hypotheses
2.α
3.Test stat/df
4.P-value 5.Decision and Reason
6.Decision in “English”
H
0
: µ
50
H
a
:
> 50
(right-sided test)
0.05
t = 2.0178
df= 19
Pvalue Statement
P(t > 2.0178 ) Pvalue = 0.02898
(
Reject
, Do not reject) H
0
Pvalue
,
α
0.02898
0.05
At the 5% significance level, there (
is
, is not) significant evidence that the true average score of all calculus students taking the midterm is more than a score of 50.
h) Obtain an upper tailed 95% confidence interval with lower bound for this test. Interpret it and explain it in an English sentence. (5m)
Confidence Interval
(50.35192, ∞
)
Interpretation
We can be 95% confident that the interval (50.35192, ∞
) contains the true average score of all calculus students taking the midterm.
English Statement
At the 5
%
level of significance, since the entire interval is above the score of 50, we (have
,
do
not have) significant evidence that true average score of all calculus students taking the midterm is more than a score of 50.
41
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
i) Perform a hypothesis test to determine significant evidence that the true average score of all calculus students taking the midterm is less than 54. Use a level of significance of 5%. (10m) Output
1.Hypotheses 2.α
3.Test stat/df
4.P-value 5.Decision and Reason
6.Decision in “English”
H
0
:
54
H
a
:
< 54
(left sided test)
0.05
t = -1.2632
df= 19
Pvalue Statement
P(t<-1.2632 ) Pvalue = 0.1109
(Reject, Do not reject
) H
0
Pvalue
,
α
0.1109
0.05
At the 5% significance level, there (is,
is not
) significant evidence that the true average score of all calculus students taking the midterm is less than 54.
j) Obtain a lower tailed 95% confidence interval with upper bound for this test. Interpret it and explain it in an English sentence. (5m)
Confidence Interval
(
−
∞
, 54.56808) Interpretation
We can be 95% confident that the interval (
−
∞
, 54.56808) contains the true average score of all calculus students taking the midterm.
English Statement
At the 5
%
level of significance, since the entire interval (
contains, does not contain)
54, we (
have, do not have
) significant evidence that the
true average score of all calculus students taking the midterm is less than 54.
20. Use SOFTWARE with HOURSHOMEWORKGENDER This file contains weekly
hours of homework per classroom hour for two independent random samples of students (35 of whom are male identifying and 35 of whom are female identifying). We will consider only the female column of data (HHF) in the following problems,
and reiterate that a random sample was taken of the females. Output is provided below. Students should verify
that they can produce it.
a) Is there significant evidence that females did more than 2.8 hours of homework per classroom hour, on average? Use a significance level of 5%. Fill in the blanks in the table below. Highlight or circle answers you choose in the sentences. (10m)
Output
42
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
H
0
: µHHF
2.8
H
a
: µHHF > 2.8
t = 1.8924
df = 34
Pvalue stmt P(t > 1.8924) Pvalue = 0.03349
(
Reject
, Do not reject) H
0
Pvalue
,
α
0.03349
0.05
At the 5% significance level, there (
is
, is not) significant evidence that, on average, female identifying students do more than 2.8 hours of homework per classroom hour.
b)
Obtain an upper tailed 95% confidence interval with lower bound for this test. Interpret it and explain it in an English sentence. (5m)
Confidence Interval
(2.838018, ∞
) hours
Interpretation
We can be 95% confident that the interval (2.838018, ∞
) hours contains the true population mean µ of homework hours of female students per course.
English Statement
At the 5
%
level of significance, since the entire interval is above 2.8 hours, we (
have
, do not have) significant evidence that female students do more than 2.8 hours of homework per classroom hour.
c)
Is there significant evidence females did less than 3.6 hrs of homework per classroom hr, on average? Fill
in the blanks in the table below. Use a significance level of 5%. Highlight or circle answers you choose in the sentences. (10m)
Output
Hypotheses
Test stat and df
P-value and statement
Decision in “Stats”
Decision in “English”
H
0
: µHHF
3.6
H
a
: µHHF < 3.6
t = -2.3465
df = 34
Pvalue stmt
P(t < -2.3465) Pvalue= 0.01246
(
Reject
, Do not reject) H
0
Pvalue
,
α
0.01246
0.05
At a 5% significance level, there (
is
, is not) significant evidence that, on average, female students do less homework per classroom hr than 3.6 hrs. d) Obtain a lower tailed 95% confidence interval with upper bound for this test. Interpret it and explain it in an English sentence. (5m)
43
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Confidence Interval
(
−
∞
, 3.476268) hours
Interpretation
We can be 95% confident that the interval (
−
∞
, 3.476268) hours contains the true population mean µ of homework hours of female students per course.
English Statement
At the 5
%
level of significance, since the entire interval is below 3.6 hours, we (
have
, do not have) significant evidence that
female students do less than 3.6 hours of homework per classroom hour.
e) Is there significant evidence females did a differing amount than 2.8 hrs of homework per classroom hr, on average? Use a significance level of 5%. Fill in the blanks in the table below. Highlight or circle answers you choose in the sentences. (10m)
Output
Hypotheses
Test stat and df
P-value and statement
Decision in “Stats”
Decision in “English”
H
0
: µHHF = 2.8
H
a
: µHHF ≠ 2.8
t = 1.8924
df = 34
Pvalue stmt= 2P(t > 1.8924) Pvalue= 0.06698
(Reject, Do not reject
) H
0
Pvalue
,
α
0.06698
0.05
At a 5% significance level, there (is, is not
) significant evidence that, on average, female students do a differing amount of homework per classroom hr than 2.8 hrs.
f) State the 95% confidence interval for the hours of homework per course done by females. Does the confidence interval provided significant evidence that the female students do an average amount of homework that differs from 3.6 hrs?
Output
95% confidence interval
Fill in the following
(2.773601, 3.540684) hours
At the 5% significance level, since 3.6 hours is (within, not within
) the confidence interval, we (
have
, do not have) significant evidence that the average hours of homework done per course by female students differs from 3.6 hours.
g) Highlight or circle answers you choose in the sentences below. (4m)
In part a), the test is (left, two,
right
) sided. In part c), the test is (
left
, two, right) sided. In part e), the test is (left, two
, right) sided.
44
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
h) In the problems above, we assumed a random sample of females was taken from a large population of females. We also were able to do the problem because our sample size of 35 was “large” (greater than 30) and we did not have to assume a normal population of female data. Nevertheless, it is of interest to examine a normal probability plot of the female data, as 35 might be considered a borderline “large” sample, and we would like to know that there is nothing flagged in our sample data that suggests a population that is highly non-normal. Create the normal probability plot for HHF and explain whether it suggests that the population of female homework hours is not normally distributed.
NORMAL PROBABILITY PLOT FOR HHF (WEEKLY HOURS HOMEWORK FEMALES)
The data in the normal probability plot of the sample data does not suggest no
n-
normality; it is close to the line of the normal probability plot. We see nothing here to concern us greatly, but note that most of the points on the left of the graph fall below the line. This indicates a heavier left tail for the sample data than would be expected with normal sample data (students
can verify this by looking at the HHF sample histogram, if they wish). Nevertheless, there
is nothing in this graph that would lead us to great concern about proceeding with our test, and none of our points stray outside the dotted lines. We do not see any outliers to concern us. The data does not suggest non-normality.
21. A two-week remedial program was designed for students who scored less than 50 on a math aptitude test.
Nine students were selected at random, given the remedial course, and then tested again. Here are the results of these tests before and after the remedial course. You will find the data in the file MATHAPTITUDE.
Before
47
41
40
32
45
47
39
48
45
After
53
58
58
36
40
49
35
60
45
Difference
-6
-17
-18
-4
5
-2
4
-12
0
a)
Find the mean and standard deviation of the differences. Use D = Before – After.
Output
Mean X
D
= -5.56 Standard Deviation s
D
= 8.49
b) Create a normal probability QQ plot of the sample data, and comment on whether it indicates that the population could be considered non-normal. (4m)
45
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Although there are not very many points (only 9), all them fall fairly close to the line. The normal probability plot does not suggest that the sample
data could be overtly non-normal. The population distribution could perhaps be assumed to be normal.
c) State necessary assumptions to perform a test of hypothesis about the mean of the differences (and to create a confidence interval for the mean of differences).
1. Simple random sample of paired differences
2. The population distribution of the paired differences is normal, or n
d is large (>30)
Since our n
d
is 9, we must assume the distribution of the paired differences is normal. Examination of the normal probability graph above indicates that we are moderately comfortable with this assumption.
d)
Use a hypothesis test to test whether the students performed better, on average, following the remedial program. Assume a significance level of 2%. Use D = Before - After. Here, since we are expecting before numbers to be less than after numbers, we expect the differences to be negative. We
do a left sided test.
Hypotheses
Level of Significance
Test stat and df
P-value and statement
Decision in “Stats”
Decision in “English”
H
0
: µ
D
0
H
a
: µ
D
< 0
D=Before - After
α = 0.02
T = -1.9638
df = 8
Pvalue stmt= P(t < -1.9638) Pvalue = 0.04258 (Reject, Do not reject
) H
0
Pvalue
,
α
0.04258
0.02
At a 2% significance level, there (is, is not
) significant evidence that students performed better, on average, following the remedial programme.
By hand verification of test statistic
t
d
*
= d
−
0
s
d
/
√
n
d
= −
5.56
−
0
8.49
/
√
9
= -1.964
with degrees of freedom = n
d
– 1 = 8
By hand verification of p-value (bracket with t tables)
p-value = P(t
8
< -1.964) = P(t
8
> 1.964)
0.025 < P(t
8
> 1.964) < 0.05 (tables for 8 df)
e) Use a hypothesis test to test whether the students performed better, on average, following the remedial program. Assume a significance level of 2%. Use D = After - Before. Here, since we are expecting after numbers to be greater than before numbers, we expect the differences to be positive. We do a right sided test.
46
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Hypotheses
Level of Significance
Test stat and df
P-value and statement
Decision in “Stats”
Decision in “English”
H
0
: µ
D
0
H
a
: µ
D
> 0
D = After - Before
α = 0.02
T = 1.9638
df = 8
Pvalue stmt= P(t > 1.9638) Pvalue = 0.04258
(Reject, Do not reject
) H
0
Pvalue
,
α
0.04258
0.02
At a 2% significance level, there (is, is not
) significant evidence that students performed better, on average, following the remedial programme.
e)
Use a hypothesis test to test whether the students performed less well, on average, prior to the remedial program. Assume a significance level of 2%. Use D = Before – After. Here, since we are expecting before numbers to be less than after numbers, we expect the differences to be negative. We
do a left sided test.
Hypotheses
Level of Significance
Test stat and df
P-value and statement
Decision in “Stats”
Decision in “English”
H
0
: µ
D
0
H
a
: µ
D
< 0
D = Before - After
α = 0.02
T = -1.9638
df = 8
Pvalue stmt= P(t < -1.9638) Pvalue = 0.04258
(Reject, Do not reject
) H
0
Pvalue
,
α
0.04258
0.02
At a 2% significance level, there (is, is not
) significant evidence that students performed less well, on average, prior to the remedial programme.
g) Use a hypothesis test to test whether the students performed less well, on average, prior to the remedial program. Assume a significance level of 2%. Use D = After – Before. Here, since we are expecting after numbers to be greater than before numbers, we expect the differences to be positive. We do a right sided test.
Hypotheses
Level of Significance
Test stat and df
P-value and statement
Decision in “Stats”
Decision in “English”
H
0
: µ
D
0
H
a
: µ
D
> 0
D = After - Before
α = 0.02
T = 1.9638
df = 8
Pvalue stmt= P(t > 1.9638) Pvalue = 0.04258
(Reject, Do not reject
) H
0
Pvalue
,
α
0.04258
0.02
At a 2% significance level, there (is, is not
) significant evidence that students performed less well, on average, prior to the remedial programme.
22. Students at a University collected data from a random sample of 12 second year statistics students. Students reported their final grade in their last year of high school and their final grades in their first year of 47
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
University. Data, including HS grades (HS), UNI grades (UNI) and HS-UNI differences (DIFFERENCE) can be found in the file HSUNIGRADEDEMO. HS 86
80
78
75
72
92
90
74
84
69
83
87
UNI 84
76
78
75
71
93
86
71
81
70
81
86
DIFFERENC
E
2
4
0
0
1
-1
4
3
3
-1
2
1
a) A boxplot of the paired differences between the final grades for high school and University for the students is provided. The difference = “HS – UNI” was used. i) What 5 descriptive statistical measures can be read from the boxplot? (1m)
Minimum, Q1, Median, Q3, and Maximum
ii) Can a boxplot of the differences tell us that the sample data distribution of the differences is normal? Why or why not? (2m)
No, it is possible to get a sample boxplot that looks like the one to the
left and have a symmetric v shaped sample histogram (distribution), for example.
b) A dot plot of the sample differences appears below. Highlight or circle chosen answers within bracketed parts of the sentences below. (2m)
The dotplot of the paired differences appears to be (
uniform
, normal) in distribution.
We assume the shape of the population of paired differences is (uniform, normal
, unknown) in order to perform statistical inference with this data.
c)Create a frequency histogram of the sample data, and comment on whether it appears that the population could be considered normal. (4m) 48
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The shape of the histogram of the sample data for differences data appears rather odd. It looks like it might be right skewed. Note that this does not match what the dot plot shows above. The issue here is that the small sample size of only 12 values and the decision of the software to make 5 groupings does not give us very good information. It is suspect to assume normality of the paired differences sample data here. We don’t have a higher peak in the middle of
our histogram with tails going out equally to either side. The normality assumption needs to hold in order for us to proceed. It may be that a larger sample size would produce a normal shape for the sample differences that suggested a normal shape for the population of differences.
d)Create a normal probability QQ plot of the sample data, and comment on whether it indicates that the population could be considered non-normal. (4m) Most of the grades follow a straight-line pattern so there is nothing
to suggest that the data is overtly non-normal. Note the “stray” points at the ends of the lines (these are caused by the more uniform nature of this data with heavier tails), but note that nevertheless these points lie within the dotted lines. Thus we *might* feel comfortable assuming normality of the population differences. Recall that the t test is robust to a population “close” to normal.
e) State appropriate null and alternative hypotheses for examining the question of whether or not HS grades differ from UNI grades, on average. (2m)
H
0
: µ
d
= 0
Where d = HS – UNI
H
a
: µ
d
0
f) State necessary assumptions to perform a test of hypothesis about the mean of the differences (and to create a confidence interval for the mean of differences).
1. Simple random sample of paired differences
2. The population distribution of the paired differences is normal, or n
d is large (>30)
Since our n
d
is 12, we must assume the distribution of the paired differences is normal. Examination of the normal probability graph above indicates that we are moderately comfortable with this assumption.
g) Use SOFTWARE to find the test statistic value and the p-value for this test. Report your results. (3m)
49
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Use α = 0.10. (Remember to state your α when you do a hypothesis test write-up!)
Output
Test statistic value
t* = 2.913 , df = 11
Pvalue and statement
2P(t > 2.913) = 0.01411
h) Should you reject or not reject your null hypothesis if the level of significance is 0.05? Why? Is your result significant or not significant? (3m)
Reject or Not Reject H
0
? Why?
Reject
H
0
because Pvalue
α, 0.01411
0.10
Significant or Not? Yes, we have significant evidence at the 10% significance level that the average of the paired differences differs from 0.
i) Create a 90% confidence interval for the average paired difference in grades for HS-UNI grades
Output
A 90% confidence interval for the average paired difference in grades for HS-UNI grades is (0.575, 2.425) grade points. We are 90% confident that the average difference in the paired grades is between (0.575 and 2.425) grade points.
j) Does your 90% confidence interval indicate that there is significant evidence (at the 10% significance level) that the average of the paired differences differs from 0? Why or why not?
Because the interval (does, does not contain 0
), there (
is
, is not) significant evidence (at the 10% significance level) that the average of the paired differences differs from 0.
23. Keyhole heart bypass surgery was performed on 8 patients and conventional surgery was performed on 10
patients. The length of hospital stay in days was recorded for all patients and is summarized in the following table.
Sample #
Surgery Method
Sample Size
Mean
Standard Deviation
1
Keyhole
8
3.5 days
1.5 days
2
Conventional
10
8.0 days
2.0 days
We wish to test the hypothesis that recovery from keyhole surgery requires a shorter hospital stay than conventional surgery (that is, test the hypothesis that people undergoing conventional surgery spend longer in
hospital, on average). Use a significance level of 1%. a) State necessary assumptions.
1. Two independent random samples
50
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2. Two Normal Populations or two large samples (n1>30 and n2>30) or both
In this case, we must assume two normal populations, as n1 and n2 are both too small
3. Check Variances: 2.0/1.5 < 2, so equal variances are assumed (pooled approaches)
b) State the null and alternative hypothesis
Hypotheses
H
0
: µC - µK
0
H
a
: µC - µK > 0
.
c)Verify by hand that the pooled standard deviation is 1.7984, that the test statistic is 5.2752 and the df = 16
s
p
= √
(
n
1
−
1
)
s
1
2
+
(
n
2
−
1
)
s
2
2
n
1
+
n
2
−
2
= √
(
10
−
1
)
(
2
)
2
+
(
8
−
1
)
(
1.5
)
2
10
+
8
−
2
= 1.7984
t = (
X
1
−
X
2
)
–
(
μ
¿¿
1
−
μ
2
)
S
p
√
(
1
n
1
+
1
n
2
)
¿
= (
8.0
−
3.5
)
−
0
(
1.7984
)
√
(
1
10
+
1
8
)
= 5.2752 df = n
1
+
n
2
−
2
= 10 + 8 - 2 = 16
d) Use SOFTWARE to find the p-value and then make your decision using a 5% significance level.
Output
P-value Decision and Reason
Decision in “English”
Pvalue Statement
P(t > 5.2752) pvalue = 0.00003772842 (
Reject
, Do not reject) H
0
Pvalue
,
α
0.00003772842
0.05
At the 5% significance level, there (
is,
is not) significant evidence that, on average, that keyhole surgery requires a shorter hospital stay. (i.e. conventional requires a longer hospital stay)
24. Keyhole heart bypass surgery was performed on 8 patients and conventional surgery was performed on 10
patients. The time spent on breathing tubes was recorded for all patients and is summarized in the following table.
Sample #
Surgery Method
Sample Size
Mean
Standard Deviation
1
Keyhole
8
3.0 hours
1.5 hours
2
Conventional
10
7.0 hours
2.7 hours
Test the hypothesis that keyhole surgery reduces the time spent, on average, with breathing tubes. Use a significance level of 1%. 51
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
a) State necessary assumptions.
1. Two independent random samples
2. Two Normal Populations or two large samples (n1>30 and n2>30) or both
In this case, we must assume two normal populations, as n1 and n2 are both too small
3. Check Variances: 2.7/1.5 < 2, so equal variances are assumed
b) State the null and alternative hypothesis
Hypotheses
H
0
: µK - µC
0
H
a
: µK - µC < 0
c)Verify by hand that the pooled standard deviation is 2.2555, the test statistic is -3.7396 and the df = 16
s
p
= √
(
n
1
−
1
)
s
1
2
+
(
n
2
−
1
)
s
2
2
n
1
+
n
2
−
2
= √
(
8
−
1
)
(
1.5
)
2
+
(
10
−
1
)
(
2.7
)
2
8
+
10
−
2
= 2.2555
t = (
X
1
−
X
2
)
–
(
μ
¿¿
1
−
μ
2
)
S
p
√
(
1
n
1
+
1
n
2
)
¿
= (
3.0
−
7.0
)
−
0
(
2.255
)
√
(
1
8
+
1
10
)
= -3.7396 df = n
1
+
n
2
−
2
= 10 + 8 - 2 = 16
d) Use SOFTWARE to find the p-value and then make your decision using a 5% significance level.
Output
P-value Decision and Reason
Decision in “English”
Pvalue Statement
P(t < -3.7396 ) pvalue = 0.00008932788
(
Reject
, Do not reject) H
0
Pvalue
,
α
0.00008932788
0.05
At the 1% significance level, there (
is
, is not) significant evidence that, on average, keyhole surgery reduces the time spend with breathing tubes.
25. In a study of internet users, the average time spent online per day was determined for a group of college graduates as well as for a group of non-college graduates. The following summary results were obtained. The data can be found in the file DAILYTIMEONLINE.
Sample #
Surgery Method
Sample Size
Mean
Standard Deviation
1
College Graduates
14
8.6 hours
1.1 hours
2
Non-college graduates
12
6.3 hours
2.7 hours
52
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
a)Fully and completely state the three assumptions that must be made to perform two independent sample t tests with this data.
1. simple random samples 2. independent random samples 3. a)large samples (both above 30) or b)the two populations from which the samples were taken were normal or c) both. Here since our sample sizes are both below 30, we must investigate the sample data shape to see if it suggests that assuming normal populations to do this problem makes sense.
NOTE: This question provides practice with two independent sample t problems. The literature provides for two possible
versions of this test; in one case, a “pooled” variance approach is made (the two variances are “close” in value) and in
the other case an “unpooled” variance approach is made (the two variances are “not close” in value). Some instructors like to teach the pooled t approach as the assumption of equal variances is necessary in other tests that look at comparing the means of several populations. Other instructors point out that assuming equal variances is an assumption we would only make that if we were very familiar with our background population in general, and it is better to be safe than sorry. Your instructor will let you know what approach you should take in lecture. In lab, we will do all two independent sample problems with the assumption of “unpooled” variances. R allows for either approaches; the default is “unpooled”. For two independent sample t problems, R does all hypothesized differences of means in alphabetical order. (Questions 23, 24 and 26 do have examples that use that pooled approach if students ever need to reference them for any reason other than for lab.)
b)
Create side-by-side normal probability plots, histograms, boxplots, and dotplots for daily hours online for the college and non-college grads. In each case, explain whether the graph suggests that the populations can be considered normal or non-normal. (NOTE: histograms will be stacked rather than side-by-side).
The boxplots tell us that the data in both sample groups is spread fairly evenly about its median, and that the tails are of equal length for each of the two distributions. The boxplots flag no outliers. Neither boxplot flags any concerns about assuming normality of the populations from which we sampled. But we must remember that just because a boxplot has a nice bulk in the middle and even length tails, it doesn’t mean the data it represents is normal. We examine the histograms and normal probability plots 53
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
for more information.
The sample data histogram for CG is a nice
small bell. The sample data histogram for NCG indicates a bump in the middle and longer tails. Neither of these histograms flag any concerns about assuming normality of the populations from which we sampled.
DAILY HOURS ONLINE: COLLEGE AND NON-COLLEGE GRADS
Points seem fairly close to the lines for both the CG and the NCG sample data groups. The probability plot does not suggest overt non-normality of the data. It would not be untoward to assume normal populations for the data
from which we sampled.
DAILY HOURS ONLINE: COLLEGE AND NON-COLLEGE GRADS
There aren’t enough values for the dotplots to be particularly useful in suggesting the distribution shapes of the
sample datasets. But we can see that data tends to be clustered towards the middle in the CG group and a bit more evenly spread out in the NCG group. Neither of these dotplots flag any concerns about assuming normality of the populations from which we sampled.
54
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
c) BY HAND: Using the summary data from the question preamble, test by hand if there is significant evidence that on average, college graduates are online more than non-college graduates. Use R to find your p-value. Use the alternative H
a
: µCG- µNCG > 0. Use a level of significance of 5%.
Hypothesis: H
0
: µC
G
- µN
CG
0
H
a
: µC
G
- µN
CG >
0
Test statistic: t = (
x
1
−
x
2
)
–
(
μ
¿¿
1
−
μ
2
)
√
s
1
2
/
n
1
+
s
2
2
/
n
2
¿
= (
8.6
−
6.3
)
−
0
√
(
1.1
)
2
/
14
+(
2.7
)
2
/
12
= 2.7611
df = min(
n
1
−
1
,n
2
−
1
) = min(14 - 1, 12 - 1) = min(13, 11) = 11
(conservative formula used here)
Pvalue Statement
:
P(t > 2.7611 ) for 11 df
pvalue = 0.009259704
R output:
Decision: (
Reject
, Do not reject) H
0
Pvalue
,
α
0.009259704
0.05
Conclusion: At the 5% significance level, there (
is
, is not) significant evidence that, on average, college graduates are online more than non-college graduates.
d) Use R to find the summary statistics for daily online hours for the college and non-college graduates for the dataset DAILYTIMEONLINE. Note that we rounded the standard deviations to 1 decimal place in the question preamble. This means that questions done by hand will yield slightly different test statistics and pvalues than questions done by hand. Keep that in mind below.
e) USING R: Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, college graduates are online more than non-college graduates. By default, the µ1– µ2 that R uses in a two independent samples t test uses the outcome name that starts with the letter than comes earlier in the alphabet first and the outcome name that starts with the letter than coms later in the alphabet second. Here it will use µCG- µNCG, and you can test with the alternative H
a
: µCG- µNCG > 0. Use the grouping column EDUCATION. Use a level of significance of 5%.
OUTPUT:
Hypothesis: H
0
: µC
G
- µN
CG
0
Test statistic: (from R)
t = 2.7376
55
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
H
a
: µC
G
- µN
CG >
0
df=14.054 (Welch’s formula from R used here) Pvalue Statement
:
P(t > 2.7
376
) P
value
= 0.007933 (from R)
Decision: (
Reject
, Do not reject) H
0
Pvalue
,
α
0.007933
0.05
Conclusion: At the 5% significance level, there (
is
, is not) significant evidence that, on average, college graduates are online more than non-
college graduates
Students should note that their test statistic and their pvalue will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used
the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom).
f) BY HAND: Using the summary data from the question preamble, test by hand if there is significant evidence that on average, college graduates are online less than non-college graduates. Use R to find your p-value. Use the alternative H
a
:µNCG-µCG < 0. Use a level of significance of 5%.
Hypothesis: H
0
: µ
N
C
G
- µ
CG
0
H
a
: µ
NCG
- µ
CG
<
0
Test statistic: t = (
x
1
−
x
2
)
–
(
μ
¿¿
1
−
μ
2
)
√
s
1
2
/
n
1
+
s
2
2
/
n
2
¿
= (
6.3
−
8.6
)
−
0
√
(
1.1
)
2
/
14
+(
2.7
)
2
/
12
= −
2.7611
df = min(
n
1
−
1
,n
2
−
1
) = min( 14-1, 12-1) = min(13, 11) = 11
(conservative formula used here)
Pvalue Statement
:
P(t < -
2.7611 ) with 11 df
pvalue = 0.009259704
R output:
Decision: (
Reject
, Do not reject) H
0
Pvalue
,
α
0.009259704
0.05
Conclusion: At the 5% significance level, there (
is
, is not) significant evidence that, on average, non-college graduates are online less than
college graduates g) USING R: Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, non-college graduates are online less than college graduates. You must use R to solve this problem with a “workaround”. Note that if µNCG – µCG is < 0, then µCG – µNCG > 0. So, if using R, you must instead test with
the alternative H
a
: µCG – µNCG > 0 with the grouping column EDUCATION but write your conclusion to match the wording of the question. Use a level of significance of 5%.
OUTPUT:
56
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Hypothesis: H
0
: µ
N
C
G
- µ
CG
0
H
a
: µ
NCG
- µ
CG
<
0
But we test the equivalent
H
0
: µC
G
- µN
CG
0
H
a
: µC
G
- µN
CG >
0
Test statistic: (from R)
t = 2.7376
df=14.054 (Welch’s formula from R used here) Note: the test statistic of 2.7376 is obtained when we do right sided test with Ha: µC
G
- µN
CG >
0
. If we could do a left sided test with Ha: µ
NCG
- µ
CG
<
0
in R, our test statistic value would be -2.7376. Pvalue Statement
:
P(t < -2.7376) = P(t > 2.7376)
P
value
= 0.007933 (verify from R)
Decision: (
Reject
, Do not reject) H
0
Pvalue
,
α
0.007933
0.05
Conclusion: At the 5% significance level, there (is, is not) significant evidence that, on average, non-
college graduates are online less than college graduates
Students should note that their test statistic and their pvalue will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom).
h) USING R: An alternative way to set up to do the test in part g) is to rewrite your grouping variable outcome names so that the outcome name that comes first alphabetically is the one for non-college graduates. In the column EDREORDER, NCG is relabeled 1NCG and CG is relabeled 2CG. Then, you can ask R to use the column EDREORDER to solve the test with the alternative H
a
: µ1NCG – µ2CG < 0. Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, non-college graduates are online less than college graduates. Use a level of significance of 5%.
Hypothesis: H
0
: µ
1N
C
G
- µ
2CG
0
H
a
: µ
1NCG
- µ
2CG
<
0
Test statistic: t = -
2.7376
df = 14.054 (Welch’s formula from R used here) Pvalue Statement
:
P(t < -2.7376) = 0.007933
P
value
= 0.007933
Decision: (
Reject
, Do not reject) H
0
Pvalue
,
α
0.0
07933
0.05
Conclusion: At the 5% significance
Students should note that their test statistic and their pvalue will differ 57
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
level, there (
is,
is not) significant evidence that, on average, non-
college graduates are online less
than college graduates
slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which
yielded 11 degrees of freedom).
i) BY HAND: Using the summary data from the question preamble, test by hand if there is significant evidence that on average, time online differs for college graduates and non-college graduates. Use R to find your p-
value. Use the alternative H
a
: µCG- µNCG ≠ 0. Use a level of significance of 5%. That way, you can work with a positive test statistic.
Hypothesis: H
0
: µC
G
- µN
CG
= 0
H
a
: µC
G
- µN
CG ≠
0
Test statistic: t = (
x
1
−
x
2
)
–
(
μ
¿¿
1
−
μ
2
)
√
s
1
2
/
n
1
+
s
2
2
/
n
2
¿
= (
8.6
−
6.3
)
−
0
√
(
1.1
)
2
/
14
+(
2.7
)
2
/
12
= 2.7611
df = min
¿¿
) = min(14-1, 12-1) = min(13, 11) = 11
(conservative formula used here)
Pvalue Statement
:
2P(t > 2.7611 ) = 2(0.009259704)
pvalue = 0.018519408
R output:
Decision: (Reject, Do not reject) H
0
Pvalue
,
α
0.018519408
0.05
Conclusion: At the 5% significance level, there (
is
, is not) significant evidence that, on average, the time spent online by college and non-
college graduates differs.
j) USING R: Test, using R with the dataset DAILYTIMEONLINE, if there is significant evidence that on average, the amount of online time differs for college and non-college graduates. Use the alternative H
a
: µCG- µNCG ≠ 0 with the column EDUCATION. Use a level of significance of 5%. That way, you can work with a positive test statistic. Hypothesis: H
0
: µC
G
- µN
CG
= 0
H
a
: µC
G
- µN
CG ≠
0
Test statistic: t = 2.7376
df = 14.054
Pvalue Statement
:
2P(t > 2.7
376
) =
Decision: (Reject, Do not reject) H
0
Pvalue
,
α
58
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pvalue = 0.0
1599
0.0
1599
0.05
Conclusion: At the 5% significance level, there (
is
, is not) significant evidence that, on average, the time spent online by college and non-
college graduates differs.
Students should note that their test statistic and their pvalue will differ slightly when doing the problem with the dataset for two reasons. We rounded the standard deviations to one decimal places when doing the problem by hand and the degrees for freedom calculated by the software for an unpooled problem will used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom), but we used the conservative formula for calculating the degrees of freedom when doing the problem with summary data (which yielded 11 degrees of freedom).
k) BY HAND: Using the summary data from the question preamble, calculate a 95% confidence interval for µCG- µNCG, the difference, on average, between online hours of college and non-college graduates. Use R to calculate your t
α
/
2
,n
−
1
. Is there evidence that time online differs for college graduates and non-college graduates? Why or why not? Answer by filling in the blanks in the sentences below. (4m)
A 95% confidence interval is: (
x
1
−
x
2
)
±t
α
/
2
,n
−
1
√
s
1
2
n
1
+
s
2
2
n
2
¿
(
8.6
−
6.3
)
±t
0.025,11
√
1.1
2
14
+
2.7
2
12
¿
(
8.6
−
6.3
)
±
2.200985
×
√
1.1
2
14
+
2.7
2
12
= 2.3 ±
2.200985
×
0.8330237
= 2.3 ±
1.8334726
= ( 0.4662574, 4.1334726) hours (verify)
You will need t
0.025,11
from R:
Note: remember that by hand we use df = 11 (the conservative choice) and α = 0.05, so α/2 = 0.025
A 95% CI for the average difference in online daily hours between college and non-college graduates is
(0.4662574, 4.1334726) hours. We are 95% confident that the interval contains the true population
mean difference in online daily hours, on average, between college and non-college graduates. At the
5% significance level, since 0 is not
in the confidence interval, there is
significant evidence that there is a
difference in online daily hours between college and non-college graduates.
l) USING R: Using R with the dataset DAILYTIMEONLINE, calculate a 95% confidence interval for µCG- µNCG, the difference, on average, in online hours of college and non-college graduates for the dataset DAILYTIMEONLINE. Verify that it closely matches the confidence interval you found in part k). Use the column
EDUCATION. (2m) A 95% CI for the average difference in online daily hours between college and non-college graduates is
(0.4986789, 4.1013211) hours. We are 95% confident that the interval contains the true population
59
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
mean difference in online daily hours, on average, between college and non-college graduates. At the
5% significance level, since 0 is not
in the confidence interval, there is
significant evidence that there is a
difference in online daily hours between college and non-college Students should note that their confidence interval will differ slightly when doing the problem with the
dataset for two reasons. We rounded the standard deviations to one decimal places when doing the
problem by hand and the degrees for freedom calculated by the software for an unpooled problem will
used the Welch’s formula for calculating degrees of freedom (which yields 14.054 degrees of freedom),
but we used the conservative formula for calculating the degrees of freedom when doing the problem
with summary data (which yielded 11 degrees of freedom).
26. Two independent samples of students are taken, 35 of whom identify as male and 35 of whom identify as female. The number of hours of homework the students do per classroom hour is recorded. You wish to test, at
the 5% significance level, whether there is significant evidence that the amount of homework done on average by female identifying students differs from the amount of homework done on average by male identifying students. Data is in the file HOURSHOMEWORKGENDER. SOFTWARE was used to create stacked frequency histograms for the homework hours of each group. a) Fully and completely state the three assumptions that must be met in order to perform a two sample t confidence interval test for the difference µHHM-µHHF. (3m)
1. simple random samples
2. independent random samples
3. a)large samples (both above 30) or b)the two populations from which the samples were taken were normal or c) both. HERE SAMPLES ARE LARGE (BOTH ABOVE 30)
b) Highlight or circle correct answers below. (4m)
60
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The shape of the histogram of sample data for females looks (uniform, normal
, skewed right, skewed left).
The shape of the histogram of the sample for males appears (uniform, normal, skewed right
, skewed left).
The shape of the SAMPLING distribution of the test statistic is (
approximately normal
, unknown).
We must assume both the female and male student populations are both normal in order to create confidence intervals and perform hypothesis tests with this data. (true, false
)
c) Create a normal probability QQ plot of the sample data, and comment on whether it indicates that the population could be considered non-normal.
Most of the “hours homework” points follow a straight-line pattern for both the female and the male gender QQ plots for the sample data, so there is not an overt suggestion that the data is non-normal for each group of sample data. Note the occasional “stray” points at the ends of the lines, but also note
that nevertheless these points lie within the dotted lines. We *might* feel fairly comfortable assuming that the populations for the two groups are normally distributed because the QQplots of the sample distributions do not suggest non-
normality.
d) Calculate descriptive statistics (sample sizes, mean and standard deviations suffice) for the hours of homework performed by the female and male students in our dataset. Output
e) Calculate the ratio of the largest standard deviation to the smallest standard deviation for these two groups of students. Are the standard deviations for the two histograms of sample data close enough to consider using a pooled variance when creating confidence intervals and performing hypothesis tests for the difference of means with this set of data? (2m)
1.275/1.1165 < 2. We can certainly consider these standard deviations close enough together to use a pooled variance.
61
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
f) Calculate a 95% confidence interval for µHHF - µHHM, the difference, on average, in hours of homework per classroom hour, for female and male identifying students using a pooled variance
. Is there evidence that there is a difference in the average homework hours per classroom hour for the two groups? Verify that you can obtain the following output. Why or why not? Answer by filling in the blanks in the sentences below. (4m)
Output
A 95% CI for the average difference in homework hours per classroom hour between female and male
students is (0.05119424, 1.19452005) hours. We are 95% confident that the interval contains the true
population mean difference in hours of homework per classroom hour, on average, for male and female
identifying students. At the 5% significance level, since 0 is not
in the confidence interval, there is
significant
evidence that there is a difference hours of homework per classroom hour, on average, for male and female
identifying students.
g) Calculate a 95% confidence interval for µHHF - µHHM, the difference, on average, in hours of homework per
classroom hour, for female and male identifying students using an unpooled variance
. Is there evidence that there is a difference in the average homework hours per classroom hour for the two groups? Why or why not?
Answer by filling in the blanks in the sentences below. (4m)
Output
A 95% CI for the average difference in hours of homework per classroom hour between female and male
students is (0.05101378, 1.19470050) hours. We are 95% confident that the interval contains the true
population mean difference in hours of homework per classroom hour, on average, for male and female
identifying students. At the 5% significance level, since 0 is not
in the confidence interval, there is
significant
evidence that there is a difference hours of homework per classroom hour, on average, for male and female
identifying students.
62
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
h) Use SOFTWARE output to perform a hypothesis test to test at the 5% significance level, whether, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at MacEwan differs. Use the difference µHHF- µHHM with a pooled variance. Verify that you can obtain the following output. Fill in answers below. Highlight or circle chosen answers. (10m)
Output
Hypotheses
Level of Significance
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µHHF - µHHM = 0
H
a
: µHHF - µHHM ≠ 0
α = 0.05
t =2.1742 df= 68
Pvalue Statement
2P(t > 2.1742 ) pvalue = 0.03318 (
Reject
, Do not reject) Ho Pvalue
,
α
0.03318
0.05
At the 5% significance level, there (
is,
is not) significant evidence that, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at MacEwan differs.
i)
Use SOFTWARE output to perform a hypothesis test to test at the 5% significance level, whether, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at MacEwan differs. Use the difference µHHF- µHHM with an unpooled variance. Fill in answers below. Highlight or circle chosen answers. (10m)
Hypotheses
Level of Significance
Test stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µHHF - µHHM = 0
α= 0.05
t =2.1742 Pvalue Statement
2P(t>2.1742 ) (Reject, Do not reject) Ho At the 5% significance level, there (
is
, is not) 63
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
H
a
: µHHF - µHHM ≠ 0
df= 66
pvalue= 0.03324 Pvalue
<=, >
α
0.03324
<= 0.05
significant evidence that, on average, the homework hours performed per classroom hour by male and female identifying students in Stat 151 at MacEwan differs.
j)
What are your degrees of freedom with the two independent samples pooled variances t test? What are
your degrees of freedom with the two independent samples unpooled variances t test? Are they the same? Why or why not?
k)
The degrees of freedom for the pooled t test are 68 = 35 + 35 -2. The degrees of freedom for the unpooled t test are 66 (calculated from an intricate and messy formula learned in lecture). They differ as they are calculated with different formulas.
Students should note that the nearly identical t-values and p-values obtained with the pooled variances and non-
pooled variances approach in this example are not typical. In this case, this occurs because the denominator of the test statistic is similar for both cases and because the degrees of freedom are close for both cases. Students should also note that assuming equal variances means that an additional assumption about the populations must be made in order to perform a test. This is not an assumption made lightly, and would only be made in real life if we were very familiar with our background populations in general. Some instructors and researchers prefer not to teach the pooled t approach as this extra assumption introduces an additional possibility of error. Some instructors and researchers like to teach the pooled t approach as the assumption of equal variances is necessary in other tests that look at comparing the means of several populations. Your instructor will let you know which approach she or he prefers.
k) Use SOFTWARE output to perform a hypothesis test to test at the 5% significance level, whether, on average, the homework hours performed per classroom hour by female identifying students is more than the homework hours performed per classroom hour by male identifying students. Use an unpooled variance. Use the difference µHHF- µHHM. Fill in answers below. Highlight or circle chosen answers. (10m). Output
Hypotheses
Level of Significance
Test
stat/df
P-value Decision and Reason
Decision in “English”
H
0
: µHHF - µHHM
0
H
a
: µHHF - µHHM > 0
α = 0.05
t = 2.17 df = 66
Pvalue Statement
P(t > 2.17) (
Reject
, Do not reject) H
0
Pvalue
,
α
At the 5% significance level, there (
is
, is not) significant evidence 64
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pvalue = 0.0166
0.0166
0.05
that, on average, the homework hours performed per classroom hour is greater for female identifying students than male identifying students.
27. Consider the STATISTICSSTUDENTSSURVEYFORR file. For this problem, assume we can consider the students in introductory statistics who were surveyed for the file STATISTICSSTUDENTSSURVEYFORR to be a random sample drawn from a larger population of students in introductory statistics classes in all universities in Alberta. a) Students who take introductory statistics classes at MacEwan are a mix of students who are pursuing a BS (biology, mathematics, chemistry, etc) and students who are pursuing a BA (psychology, sociology, etc) . Your instructor found information at Statistics Canada that indicated that if we consider only students pursuing an undergraduate BA or BS degree, 43% of them will be pursuing a BS degree and 57% of them will be pursuing a BA degree. Find a 99% confidence interval to determine whether the proportion of BA students in universities in Alberta differs from 57%. Test using SOFTWARE and then complete the table below by hand. Use the normal approximation (as in your textbook). In your interval, use 3 decimals for your proportions and 1 decimal for your percents.
Output
Assumptions (4M)
State in full and show work when checking.
ASSUMPTIONS: 1. Simple Random Sample (stated in problem)
2. BA: # successes =24 > 5, # failures = 36
5 3. Normal Approximation Interval (2M)
Confidence Interval: (0.255, 0.564) = ( 25.5, 56.4 )%
Interpretation (4M)
At the 1% significance level, the 99% CI (does
, does not
) cover 57%, so we (
do
, do not) have significant evidence that the proportion of introductory statistics university students pursuing a BA in Alberta differs from 57%.
b) Use SOFTWARE to perform a hypothesis test to determine, at a significance level of 1%, whether there is evidence that the proportion of Alberta university introductory statistics students who are pursuing a BA differs from 57%. Post your output and then write up your test in the spaces provided.
Use the normal approximation (as in your textbook). (Note that a test of proportion that uses the normal approximation with a
Z test statistic is equivalent to a
2 test with #outcomes -1 = degrees of freedom and that both tests yield identical p-values. Software such as R returns a
2 test statistic value. Your instructor will mention this to you in class and may or may not provide more details according to time constraints.)
65
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Output
Write up by hand:
i) Hypotheses: H
0
: p = 0.57 versus H
a
: p ≠ 0.57
ii) Assumptions: 1. Simple Random Sample (stated in problem)
2. BA: # successes = 24 > 5, BS = 60 – 24 = 36
5 iii) Test statistic:
2 = 7.047
iv) Pvalue statement and pvalue: = P(
2
> 7.047) for 1 degree of freedom = 0.007818
v) Decision and reason for it: Since pvalue of 0.007818
0.01, we reject H
0
. vi) English Statement: At the 1% significance level, we (
do
, do not) have significant evidence that the proportion
introductory statistics university students pursuing a BA in Alberta differs from 57%.
c) Assume that this data represents two independent random samples drawn from larger populations of students viewing Rachel Notley as a poor leader (no’s) and Rachel Notley as a good leader (yeses) for Alberta. Calculate and interpret a 95% confidence interval to determine whether the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a poor (not good) leader (no’s) in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader (yeses). Bold and italicize chosen answers. Estimate proportions separately, like in your textbook. Use the normal approximation (like in your textbook). Note: SOFTWARE, like the textbook
, always
estimates proportions separately for confidence intervals for a two-proportion confidence interval. (Note that this confidence interval can also be calculated by doing a normal approximation to a
2 test with degrees of freedom = (#rows -1)(#columns -1) and yields an identical confidence interval. Software such as R takes this approach. Your instructor will mention this to you in class and may or may not provide more details according to time constraints. We will use the latter approach here)
Output 1 (using the two-sample proportion test approach in the software R). This output provides a confidence interval and proportion estimates. 66
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Output 2 (using the two-way table approach in the software R) This output provides the counts. Note that the software assigns the letter 1 to the “no” group and the letter 2 to the “yes” group by default. So for the No group, p
1
= pgraduate = 28.6% means that 28.6% (or 8 individuals) of the no group who thought Rachel Notley would not do a good job (do a poor job) were pursuing graduate/professional degrees and for the yes group, p
2
= pgraduate = 59.4% means that 59.4% (or 19 individuals) of the yes group who thought Rachel Notley would do a good job were pursuing graduate/professional degrees. Fill in the table below.
Assumptions (4M)
State in full and
show work when checking.
ASSUMPTIONS:
1. Simple Random Samples (stated in problem)
2. Independent Random Samples (stated in problem)
3. (each cell in the contingency table has more that 5 observations)
No: #graduate/professional = 28(0.286) = 8 > 5, #undergraduates = 28(0.714) =20 > 5 (work shown)
Yes: #graduate/professional = 32(0.594) = 19 > 5, #undergraduates = 32(0.406) = 13 > 5 (work shown)
Interval (2M)
Confidence Interval: (-0.547,-0.069)
= (-54.7%, -6.9%)
Interpretation (4M)
At the 5% significance level, the CI (does, does not
) cover 0, so we (
do
, do not) have significant evidence that the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a poor (not good) leader in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader.
d. Use SOFTWARE to perform a hypothesis test to determine, at a significance level of 5%, whether there is significant evidence that the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a 67
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
poor (not good) leader in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader. Note: SOFTWARE can calculate a z test statistic value using either a pooled estimate of the proportion or separate proportion estimates. We will use a pooled proportion as our estimate in order to match the textbook approach and use the normal approximation (like in your textbook). (Note that this Z test can also be done by doing a normal approximation to a
2 test with degrees of freedom
= (#rows -1)(#columns -1) and that both tests yield identical p-values. Software such as R returns a
2
test statistic value. Your instructor will mention this to you in class and may or may not provide more details according to time constraints. We will use the latter approach here.)
You will need the same output as above. Repaste it here for easy reference. Output 1 (using the two-sample proportion test approach in the software R). This output provides a confidence interval and proportion estimates. Output 2 (using the two-way table approach in the software R) This output provides the counts. 68
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Both pieces of output above provides the test statistic value, degrees of freedom, and p-value for the test. Note that the software assigns the letter 1 to the “no” group and the letter 2 to the “yes” group by default. So for the No group, p
1
= pgraduate = 28.6% means that 28.6% (or 8 individuals) of the no group who thought Rachel Notley would not do a good job (do a poor job) were pursuing graduate/professional degrees and for the yes group, p
2
= pgraduate = 59.4% means that 59.4% (or 19 individuals) of the yes group who thought Rachel Notley would do a good job were pursuing graduate/professionals degrees. Fill in the table below.
i) Hypotheses: H
0
: p
1
– p
2 = 0 versus H
a
: p
1
– p
2 ≠ 0
ii) Assumptions
1. Simple Random Samples (stated in problem)
2. Independent Random Samples (stated in problem)
3. (each cell in the contingency table has more that 5 observations)
No: #graduate/professional = 28(0.286) = 8 > 5, #undergraduates = 28(0.714) = 20 > 5 (work shown)
Yes: #graduate/professional = 32(0.594) = 19 > 5, #undergraduates = 32(0.406) = 13 > 5 (work shown)
iii) Test statistic:
2 = 5.7251
iv) Pvalue statement and pvalue: = P(
2
> 7.047) = 0.01672 for (r-1)(c-1) = 1 df for r = 2 rows and c = 2 columns
v) Decision and reason for it: Since pvalue of 0.01672
0.05, we reject H
0
. vi) English statement: At the 5% significance level, we (
do
, do not) have significant evidence that the proportion of graduate/professionals in the group 1 who viewed Rachel Notley as a poor (not good) leader in 2015 differed from the proportion of graduates/professionals in the group 2 who viewed Rachel Notley as a good leader.
69
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
28. The overall proportion of all Albertans who voted for the 4 major parties In Canada in the 2015 Federal election follow.
Conservative
Green
Liberal
NDP 0.6062
0.0257
0.2496
0.1185
*The Alberta proportions were calculated by your instructor using data online at Elections Canada. Interested students can google and look at the statistics provided there. These include ONLY Albertans who voted for one of the 4 main parties. This will allow us to make our comparisons sensibly. (Your instructor notes that out of all votes cast, 98.26% of Albertans and 94.53% of Canadians voted for those 4 parties. The Canadian percent is lower because of those voting BQ.
No BQ candidates ran in Alberta). We consider, somewhat artificially, the Statistics 151 student data in the FULLPOSTELECTIONSURVEYFORR file to be a random sample from a much larger hypothetical population of Alberta students. We make a null hypothesis that the preferred political platform proportions of the population of Alberta students matches the overall Albertan voting proportions. Perform a goodness of fit test to test this hypothesis. Choose α = 0.05 as your level of significance. a)
State your null and alternative hypothesis. (2m) Let 1 be Conservative, 2 Green, 3 Liberal, 4 NDP H
0
: p
0Conservative
= 0.6062, p
0Green
= 0.0257, p
0Liberal
= 0.2496, p
0NDP
= 0.1185
H
a
: H
0 is not true
b)
State your assumptions and comment on their validity. (3m)
1. SIMPLE RANDOM SAMPLE 2. ALL EXPECTED COUNTS EXCEED 5 Comment: (THESE BOTH HOLD – SEE TABLE BELOW) c) The table below includes the observed counts for chosen best platform (see the BestPlat column) for students in the file FULLPOSTELECTIONSURVEYFORR. Fill in the following by hand. Use 4 decimals.
Test Statistic:
PARTY
p
Oi
Observed Counts, O
i
Expected Counts, E
i
= 446p
0i
(
O
i
−
E
i
)
2
E
i
Conservative
0.6062
130
270.3652
72.8732
Green
0.0257
23
11.4622
11.6139
Liberal
0.2496
220
111.3216
106.0980
NDP
0.1185
73
52.8510
7.6816
TOTALS
1.0001
446
446
198.2667
Test statistic = total of cell contributions =
χ
2*
= Σ Σ (
O
i
−
E
i
)
2
E
i
= 198.2667
Degrees of Freedom = # choices – 1 = k -1 = 4 – 1 = 3
d) Run the SOFTWARE to verify your test statistic and degrees of freedom and to find you p-value. Paste it below. (4m)
70
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Output Counts and Test:
e) Report your test statistic, degrees of freedom, and your p-value. Include a full p-value statement. (4m)
Test Statistic
Degrees of freedom
P Statement
PValue
χ
2* = 198.27
3
P(χ
2 > 198.27)
<2.2e-16
f) Does the evidence lead to rejection of your null hypothesis? Why or why not? (2m)
Reject or Fail to Reject H
0
?
Why or why not?
Reject H
0
Since pvalue
α, 2.2e-16
0.05
g) Fill in the following sentence. (3m)
At the 5% significance level, we (
have
, do not have) significant evidence that the students’ proportions do not
fit to the actual proportions. h) Comment on the results. What cells contributed greatly to the test statistic value? What cells did not? How
did observed counts differ from expected counts?
The Liberal cells contributed most, and then, from largest to smallest, the Conservative, Green and NDP contributed quite a bit, in that order. More young people preferred the Liberal and Green party than was expected. Less young people preferred the Conservative and NDP that might have been expected.
29. The proportions of blood types O, A, B and AB in a general population of a particular country are in the ratio 49:38:9:4, respectively. This means if a person is randomly selected from this country, the probability of having type O is 49/100 = 0.49, the probability of having type A is 38/100 = 0.38, the probability of having type
B is 9/100 = 0.09 and the probability of having type AB is 4/100 = 0.04. A research team investigating a small isolated community in this country obtained the following frequencies of blood type: O for 87 individuals, A for 59 individuals, B for 20 individuals and AB for 4 individuals. Test the hypothesis that the proportions in this
community do not differ significantly from those in the general population. Use a significance level of 0.05. You are provided with a data file of individual responses called BLOODTYPES.xls in which the column labeled BloodType contains individual blood types for the 170 people in the research study.
a) State your null and alternative hypothesis. (2m) Let 1 be type A, 2 be type AB, 3 be type B, and 4 be type 0 H
0
: p
0TypeA
= 0.38, p
0TypeAB
= 0.04, p
0TypeB
= 0.09, p
0TypeO
= 0.49 H
a
: H
0
is not true
(at least one proportion in the small community differs from its corresponding
specified value in the null hypothesis)
71
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
b) State your assumptions and comment on their validity. (3m)
1. SIMPLE RANDOM SAMPLE 2. ALL EXPECTED COUNTS EXCEED 5 Comment: (THESE ALL HOLD – SEE TABLE BELOW) Note: STEP 2 α
=
0.05
c) Use SOFTWARE to find your observed counts and perform a goodness of fit test to test this hypothesis. Paste your output below. (4m)
Output: (Counts and Test)
d) Report your test statistic, degrees of freedom, and your p-value. Include a full p-value statement. (4m)
Test Statistic
Degrees of freedom
P Statement
PValue
χ
2*
= 3.2465
df = 4 – 1 = 3
P(χ
2 > 3.2465) 0.3552
e) Does the evidence lead to rejection of your null hypothesis? Why or why not? (2m)
Reject or Fail to Reject H
0
?
Why or why not?
Fail to Reject H
0
Since pvalue > α, 0.3552 > 0.05
f) Fill in the following sentence. (3m)
At the 5% significance level, we do not have
significant evidence that the blood types of people in the community differ from the null proportions of the general population.
g) The table below includes the observed counts for the blood types for people in the small community that can be found in the BLOODTYPE Excel file. Fill in the following by hand to verify your SOFTWARE results. Use 4 decimals.
Test Statistic:
PARTY
p
Oi
Observed Counts, Oi
Expected Counts, Ei = 170p
0i
(
O
i
−
E
i
)
2
E
i
Type A
0.38
59
64.6
0.4854
Type AB
0.04
4
6.8
1.1529
Type B
0.09
20
15.3
1.4438
Type O
0.49
87
83.3
0.1643
TOTALS
1
170
170.0
3.2464
72
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Test statistic = total of cell contributions =
χ
2*
= Σ (
O
i
−
E
i
)
2
E
i
= 3.2464
Degrees of Freedom = # choices – 1 = 4 – 1 = 3
h) Comment on the results. What cells contributed greatly to the test statistic value? What cells did not? How
did observed counts differ from expected counts?
The type AB and Type B cells contributed the most to the test statistic, but even these values were below 2, and generally it is only when a cell contribution value exceeds 2 that it contributes a good amount towards bringing a test statistic value high enough to be significant. The Type AB cell is of note here as it had so few observations that the assumption that required its expected value to be above 5 was barely met, and any contribution it therefore made to the test statistic value needs to be observed warily. We can say that we observed slightly less type A people than expected, and slightly more Type B and Type O people than expected, but we did not have significance.
30. A random sample of people are surveyed and then classified according to their age group (below 40, 40 or older) and the likelihood that they would purchase an electronic device that could be calibrated to help them find lost keys. Perform a test of independence to determine if age group and likelihood of purchasing the device are related. Use a significance level of 5%. a) State your null and alternative hypothesis. (2m)
H
0
: age group and likelihood of purchasing device are
independent (not related)
H
a
: age group and likelihood of purchasing device are
dependent (related)
b) State your assumption and comment on their validity. (4m)
1. A random sample of people,
2. at least 80% of E
i
5, and all E
i
1.
Comment: (both hold for this example) c) Set up your problem in SOFTWARE. Check that it matches the data in the file KEYS, as below.
AGE GROUP
VERY UNLIKELY
UNLIKELY
NEUTRAL
LIKELY
VERY LIKELY
<40
6
12
20
30
15
83
40+
5
8
26
42
30
111
TOTALS
11
20
46
72
45
194
d) From SOFTWARE, paste your output and then fill in the following tables.
Expected Counts Output
73
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Expected Values
AGE GROUP
VERY UNLIKELY
UNLIKELY
NEUTRAL
LIKELY
VERY LIKELY
Totals
<40
4.71
8.56
19.68
30.80
19.25
83
40+
6.29
11.44
26.32
41.20
25.75
111
Totals
11
20
46
72
45
194
Cell Contributions Output
Cell Contributions
AGE GROUP
VERY UNLIKELY
UNLIKELY
NEUTRAL
LIKELY
VERY LIKELY
<40
0.36
1.39
0.01
0.20
0.94
40+
0.27
1.04
0.00
0.02
0.70
e) Paste test output and then report your test statistic, degrees of freedom, and pvalue in the table. Include a full p-value statement. (4m)
Test Output
Test Statistic
Degrees of freedom
P Statement
PValue
χ
2*
= 4.7308
4
P(χ
2*
> 4.7308)
Pvalue = 0.316
f) Does the evidence lead to rejection of your null hypothesis? Why or why not? (2m)
Reject or Fail to Reject H
0
?
Why or why not?
Fail to Reject H
0
.
Pvalue > α, 0.3160 > 0.05
g) Fill in the following sentence. (3m): 74
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
At the 5% significance level, there is not
significant evidence that age group and likelihood of purchasing device are dependent (related). 31. Joseph Lister was a 19
th
century surgeon who pioneered the use of disinfectants to reduce infection rates. Over the years, he performed 75 amputations; 40 using carbolic acid as a disinfectant and 35 without
any disinfectant. The following results were obtained: Patient Patient
Died Survived
Total
With carbolic acid
Without carbolic acid
6 34
16 19
40
35
Total
22 53
75
At the 98% confidence level, determine if the patient survival or death is independent of whether or
not carbolic acid was used during surgery. a) State your null and alternative hypothesis. (2m)
H
0
: Patient status (survived or died) is independent of whether or not carbolic acid was used during surgery.
H
a
: Patient status (survived or died) is not independent of whether or not carbolic acid was used during surgery. b) State your assumption and comment on their validity. (4m)
1. A random sample of patients,
2. at least 80% of E
i
5, and all E
i
1. Comment: (both hold for this example).
c) Use SOFTWARE with the file LISTER to obtain the expected values, the cell contributions to the test statistic, and the test output. Use Disinfectant as the row variable and Patient_Status as the column variable. (If using R, students should note that they can do this problem using the data file or, alternatively, by typing data counts into a two-way table, and that they do not need the data file. Please see the commands file for further details.) Paste output in the provided boxes below. (3m) Expected Counts Output
Cell Contributions Output
75
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Test Output
d) Fill in the following table to perform the test (3m). Test stat/df
P-value Decision and Reason
Decision in “English”
χ
2 = 8.4952 df = 1
Pvalue Statement
P( χ
2
> 8.4952) Pvalue = 0.003561 (
Reject
, Do not reject) H
0
Pvalue
,
α
0.003561
0.02
At the 2% significance level, there (
is
, is not)
significant evidence that the status of the patient (survive or die) is not independent of whether or not the carbolic acid was used during surgery. That is, the patient status is dependent on if the carbolic acid is used.
32. Consider the data set STATISTICSSTUDENTSSURVEYFORR. a) Create a scatter plot to illustrate the relationship BEFPULSEMIN and BEFBREATHMIN. Let BEFPULSEMIN be the dependent (response) variable and X = BEFBREATHMIN be the independent (explanatory) variable. Scatterplot
b) Comment on the strength, direction and form you see on the scatterplot.
We see a strong positive linear relationship.
c) Find the line of best fit (regression line) for this data.
Output:
76
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Line of Best fit: BEFPULSEMIN = 46.1865 + 1.5101(BEFBREATHMIN) (
^
y
= 46.1865 + 1.5101x)
d) Give an interpretation of intercept and slope in the context of this problem. Intercept: If BEFBREATHMIN breaths per minute was 0, BEFPULSEMIN would be 46.1865. Given that our line is fit to x values that range from 12 to 24, and the line is fit over that range of values, this is not useful. The point (0,46.1865) might be useful for graphing purposes when drawing the line if our graph origin was (0,0) and scale was maintained.
Slope: For every unit increase of 1 breath per minute in BEFBREATHMIN minutes, there is an increase of 1.5101 BEFPULSEMIN beats per minute. As BEFBREATHMIN increase, so does BEFPULSEMIN. Slope is positive.
e) Find the correlation between the two variables BEFPULSEMIN and BEFBREATH min. r = √
r
2
= √
0.6851
= 0.8277 (from output above)
r = 0.8276997 from the correlation matrix output below
e) Perform a test to test the null hypothesis that there is no linear relationship between pre survey breath rate
and pre survey pulse rate against the two-sided alternative. That is, determine if there is significant evidence that the slope is non-zero. Use the output above to help you fill in some of the answer. Use a significance level of 5%. You do not have to state assumptions. (SEE OUTPUT ABOVE)
Hypothese
s
Test statistic and degrees of freedom
Pvalue statement and Pvalue
Decision: Reject or Fail
to Reject H
0
. Why? English conclusion:
significance or non significance.
H
0
: β
1
= 0
H
a
: β
1
≠ 0
t* = 11.23
df = n-2 = 60 -2 =58 For df = 58
2P(t>11.23) = 3.51e-16
Reject H
0
since Pvalue
0.05
At the 5% significance level, there (
is
, is not) significant evidence to support the alternative that the slope is nonzero.
77
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
33. Students in three treatment groups (no fertilizer, liquid fertilizer, pellet fertilizer) monitored and recorded the number of days that it took until they could perform the first pruning for a basil plant they were growing in
a class experiment. They then calculated the average and standard deviation of days to the time of first pruning for their treatment groups (generally between about 35 days to 49 days. Data can be found in the file BASIL. We will perform an ANOVA test to determine if the mean growth differs significantly in at least one of the groups. Our level of significance will be 5%.
a) State your null and alternative hypotheses and your level of significance. Draw a picture of the background populations to show what you mean. H
0
: µ
NONE
= µ
LIQU
= µ
PELL versus H
a
: at least one of the µ
j
s differs α = 0.05
b) State your assumptions. 1. Random samples of observations (numerical values) are drawn from treatment populations
2. These samples are independent of each other
3. The treatment populations are normally distributed
4. The treatment populations all have the same variance, σ
2
(c) Use SOFTWARE to obtain an ANOVA table from which you can determine if at least one mean differs from the others. Paste your output below.
Output
(d) State your test statistic, degrees of freedom and p-value.
F* ¿
10.15, Degrees of freedom = (2, 57)
p-value statement and p-value: P(F >10.15) = 0.000169 (e) Use software to find the critical value for your level of significance, paste your output, and state the 78
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
critical region.
Critical Region: Reject H
0
if F* > 3.158843
(f) Does the evidence lead you to reject or fail to reject your null hypothesis? Explain. Critical Value approach: Since 10.15 > 3.158843 and is in the rejection region, we reject H
0
.
Pvalue Approach: Since 0.000169 < 0.05, we reject H
0
.
(g) State whether you have significant evidence to conclude that at least one of the population treatment means differs from the others. We conclude that there (
is
, is not) significant evidence that at least one of the population treatment means (mean days to first pruning for the 3 treatments (fertilizers)) differs from the others.
34. Allan taught three sections of Stat 151 labs during Fall 2019. The lab times were Tuesday AM, Thursday AM, and Monday AM. At the end of the semester, all students in Stat 151 wrote a common lab final exam worth 61 marks. The lab grades for all the students in all three of Allan’s sections can be found in the Excel file ALLANSLABEXAMGRADESFALL2019. The column TimeOfDay contains the time the exam was written (TuesdayAM, ThursdayAM or MondayAM) and the column Grades contains the student’s grade on the lab exam out of 61. Allan was curious to see if the mean grades from the lab exams on different days would be the same. Assuming these results are representative of populations of students writing at the indicated times, perform a one-way ANOVA test to determine if the mean grades are equal at these various times. Use a 0.05 significance level. Use SOFTWARE to compute the ANOVA table including the test statistic and p-value.
a) State your null and alternative hypotheses and your α.
H
0
: µ
Tuesday AM
= µ
Thursday AM
= µ
Monday AM versus H
a
: at least one of the µ
j
s differs
α = 0.05
b) State your assumptions. 1. Random samples of observations (numerical values) are drawn from treatment populations
2. These samples are independent of each other
3. The treatment populations are normally distributed
4. The treatment populations all have the same variance, σ
2
(c) Use SOFTWARE to obtain an ANOVA table from which you can determine if at least one mean differs from the others. Paste your output below.
Output
79
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
d) Use the software output to fill in the following ANOVA table.
Test stat/df
P-value Decision and Reason
Decision in “English”
F = 5.514
df = 2, 75
Pvalue Statement
P(F > 5.514) Pvalue =
0.00583
(
Reject
, Do not reject) H
0
Pvalue
,
α
0.00583
0.05
At the 5 % significance level, there (
is
, is not) significant evidence there is a difference in the mean lab exam grades among the three labs with exams written at different times.
e) Use software to find the critical value for your level of significance, paste your output, and state the critical region.
Critical Region: Reject H
0
if F* > 3.118642
f) Does the evidence lead you to reject or fail to reject your null hypothesis? Explain. Critical Value approach: Since 5.514 is > 3.118642 and is in the rejection region, we reject H
0
.
Pvalue Approach: Since 0.00583 < 0.05, we reject H
0
g) State whether you have significant evidence to conclude that at least one of the population treatment means differs from the others. We conclude that there (
is
, is not) significant evidence that at least one of the population means (average grade on final lab exam for the 3 treatments (time of day)) differs from the others. (It is interesting to observe that the Monday am class didn’t do as well as the Tuesday am or Thursday am class.)
80
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt