Midterm-AppliedStats (1)
pdf
keyboard_arrow_up
School
University of New Hampshire *
*We aren’t endorsed by this school
Course
MISC
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
1
Uploaded by CountAtomOstrich43
Importing Libraries
1. . State the differences between ‘descriptive statistics’ and ‘inferential statistics’
2. Determine the level of data measurement and explain your reasoning. Highest grade level completed:
3. Determine the level of data measurement and explain your reasoning. Number of years at last job:
4. Determine the method for collecting data and explain your reasoning. To study the effect of music on driving habits, 8 drivers drove 500 miles while listening to music. a) observational b) census
c) experimental d) simulation
5. Determine the method for collecting data and explain your reasoning. Determining the average household income of homes in Salt Lake City.
6. . Determine the type of sampling and explain your reasoning.Paul is the Vice President for Sacred Heart University. He is responsible for the capital campaign to raise money for the new student
services building. Paul selects the first 100 alumni listed on a web-based social networking site for the University. He intends to contact these individuals regarding possible donations. His sample is
a _
.
7. Identify if the data are population or sample and explain your reasoning. 62 of the 97 passengers aboard American Airlines survived its explosion.
8. . Determine the type of sampling and explain your reasoning. Chosen at random, 300 people who received care at the University Hospital participated in a survey.
9. Determine if the data are qualitative or quantitative, and the level of data measurement.Explain your reasoning.Telephone numbers in a directory
10. Explain ‘Placebo Effect’
Suppose that a random sample of size 35 is to be selected from a population with a mean of 70 and a standard deviation of 10. Write your Python codes for the following calculation! Decimal
numbers should be rounded to the nearest thousandth digit (3 decimal places). 11. Probability of getting 𝑥̅
above 65 and below 85 12. Probability of getting 𝑥̅
below 60 or above 90
Probability of getting x
±
above 65 and below 85: 0.998
Probability of getting x
±
below 60 or above 90: 0.0
13~15) 40% of Americans say they are confident that passenger trips to the moon will occur during their lifetime. You randomly select 200 Americans and ask if he or she thinks passenger trips to
the moon will occur in his or her lifetime. For the following questions, use (1) Normal Approximation with correction for continuity, (2) Binomial distribution. Use 4 decimal places. 13. What is the
probability that at most 150 people will say yes? 14. What is the probability that exactly 75 people saying yes? 15. What is the probability that greater than 70 and no more than 90 people will say
yes?
Question 13:
Probability (Normal Approximation): 1.0
Probability (Binomial Distribution): 1.0
Question 14:
Probability (Binomial Distribution): 0.0448
Question 15:
Probability (Normal Approximation): 0.85
Probability (Binomial Distribution): 0.8501
16. Using the ‘salaries’ data set in the exam folder, construct the 90%, 95%, and 99% confidence intervals for the average salary of male professors and female professors, respectively. Interpret the
results and compare the widths of the confidence intervals.
90.0% Confidence Intervals:
Male Professors: (112444.43903175788, 117736.39895706894)
Female Professors: (94166.94757793451, 107837.87293488599)
95.0% Confidence Intervals:
Male Professors: (111937.53940340012, 118243.2985854267)
Female Professors: (92857.45410471286, 109147.36640810764)
99.0% Confidence Intervals:
Male Professors: (110946.83283407938, 119234.00515474744)
Female Professors: (90298.12339341138, 111706.69711940912)
17. In a random sample of eight people, the mean commute time to work was 35.5 minutes and the sample standard deviation was 7.2 minutes. Assume the population is normally distributed.
Construct a 95% confidence interval for the population mean. Interpret the result.
95% Confidence Interval for the population mean commute time to work: (29.48 minutes, 41.52 minutes)
Interpretation:
We are 95% confident that the true population mean commute time to work falls between the lower and upper bounds calculated. In this case, it means we are 95% confident that the true population
mean commute time to work is between the lower bound (35.5 minutes minus the margin of error) and the upper bound (35.5 minutes plus the margin of error).
18. Suppose the population of all public Universities shows the annual parking fee per student is 18. If a random sample of size 49 is drawn from the population,
the probability of drawing a sample with a mean of more than $115 is ___
.
Given:
Population mean (μ) = 110−
𝑃???𝑙𝑎?𝑖????𝑎??𝑎?????𝑖𝑎?𝑖??
(σ)= 18 Sample size (n) = 49 Sample mean (x±) = $115 We'll calculate the z-score for the sample mean and then find the probability using the
standard normal distribution.
First, let's calculate the standard error of the mean (SE) using the formula:
SE = σ / √n
Then, we can calculate the z-score using the formula:
z = (x
±
- μ) / SE
Finally, we'll find the probability of drawing a sample with a mean of more than $115 using the standard normal distribution table or the cumulative distribution function.
Let's do the calculations:
Probability of drawing a sample with a mean of more than $115: 0.0259
19. Use dataset of Minutes Spent on the Phone (10 points).
102 124 108 86 103 82 58 78 93 90 35 71 104 112 118 87 95 130 45 95 57 78 103 116 85 122 87 100 120 97 39 133 184 105 97 107 67 78 125 49 86 97 88 103 109 99 105 99 101 92 293 149 82
204 192 (1) Draw the Box-and-Whisker plot with 5 number summary on the plot. (2) Calculate Population S.D. and Sample S.D. using the formula. Show all steps. (3) Calculate Population S.D. and
Sample S.D. using any Python Library and its built-in functions. (4) Draw the histogram using 10-point bins [20, 30), [30, 40], (5) Construct Extended Frequency Table that shows columns: Class,
Frequency, Midpoint, Relative Frequency, and Cumulative Frequency. (6) Test if the dataset is a normal distribution or not.
To calculate the population standard deviation (σ), we follow these steps:
Step 1: Calculate the deviations from the mean (xi - μ) by subtracting the population mean (μ) from each data point (xi).
Step 2: Square the deviations ((xi - μ)^2) obtained in Step 1.
Step 3: Sum up all the squared deviations (∑(xi - μ)^2).
Step 4: Divide the sum of squared deviations by the total number of observations (N).
Step 5: Take the square root of the result obtained in Step 4 to find the population standard deviation (σ).
For the sample standard deviation (s), we follow similar steps:
Step 1: Calculate the deviations from the sample mean (xi - ?̄
) by subtracting the sample mean (
?̄
) from each data point (xi).
Step 2: Square the deviations ((xi - ?̄
)^2) obtained in Step 1.
Step 3: Sum up all the squared deviations (∑(xi - ?̄
)^2).
Step 4: Divide the sum of squared deviations by the total number of observations minus one (n-1).
Step 5: Take the square root of the result obtained in Step 4 to find the sample standard deviation (s).
Population Standard Deviation (using Python library): 41.16330709794427
Sample Standard Deviation (using Python library): 41.542700436971586
Extended Frequency Table:
Class
Frequency
Midpoint
Relative Frequency
Cumulative Frequency
20-30
0
25.0
0.0000
0
30-40
2
35.0
0.0364
2
40-50
2
45.0
0.0364
4
50-60
2
55.0
0.0364
6
60-70
1
65.0
0.0182
7
70-80
4
75.0
0.0727
11
80-90
8
85.0
0.1455
19
90-100
10
95.0
0.1818
29
100-110 12
105.0
0.2182
41
110-120 3
115.0
0.0545
44
120-130 4
125.0
0.0727
48
130-140 2
135.0
0.0364
50
140-150 1
145.0
0.0182
51
150-160 0
155.0
0.0000
51
160-170 0
165.0
0.0000
51
170-180 0
175.0
0.0000
51
180-190 1
185.0
0.0182
52
190-200 1
195.0
0.0182
53
200-210 1
205.0
0.0182
54
210-220 0
215.0
0.0000
54
220-230 0
225.0
0.0000
54
230-240 0
235.0
0.0000
54
240-250 0
245.0
0.0000
54
250-260 0
255.0
0.0000
54
260-270 0
265.0
0.0000
54
270-280 0
275.0
0.0000
54
280-290 0
285.0
0.0000
54
290-300 1
295.0
0.0182
55
Shapiro-Wilk test statistic: 0.8130385279655457
p-value: 7.040425202831102e-07
Since p-value < 0.05, we reject the null hypothesis that the data is normally distributed.
20. A survey reports that the average price for a gallon of regular unleaded gasoline is $ 3.56.
You believe that the actual price in the Northeast area is not equal to the price. You decide to test this claim by using 24 randomly surveyed prices: 3.87 3.54 3.90 3.33 2.99 3.25 3.48 3.52 3.39 4.24
3.95 3.28 3.48 3.27 3.58 3.39 3.29 3.52 3.55 3.91 2.88 3.02 3.26 3.74 Do the hypothesis testing using both Rejection Region and P-value for α= 0.01. Show all steps! (20 points).
Hypothesis Testing for Gasoline Price in the Northeast Area
Rejection Region Method:
Step 1: Define Hypotheses: Null Hypothesis (H
₀
): The actual price for a gallon of regular unleaded gasoline in the Northeast area is equal to
3.56.−
𝐴𝑙????𝑎?𝑖??𝐻????
ℎ
??𝑖?
(
𝐻
₁
):
𝑇
ℎ
?𝑎???𝑎𝑙??𝑖?????𝑎?𝑎𝑙𝑙????????𝑙𝑎???𝑙?𝑎????𝑎??𝑙𝑖??𝑖??
ℎ
?𝑁???
ℎ
?𝑎??𝑎??𝑎𝑖???????𝑎𝑙??
3.56.
Step 2: Set the Significance Level: α = 0.01
Step 3: Calculate the Test Statistic: Given sample: 3.87, 3.54, 3.90, 3.33, 2.99, 3.25, 3.48, 3.52, 3.39, 4.24, 3.95, 3.28, 3.48, 3.27, 3.58, 3.39, 3.29, 3.52, 3.55, 3.91, 2.88, 3.02, 3.26, 3.74 Sample
mean (x±) = 3.4767 Sample standard deviation (s) = 0.3826 Sample size (n) = 24 Test Statistic (z) = (x± - μ) / (s / √n) z = (3.4767 - 3.56) / (0.3826 / √24) = -1.542
Step 4: Determine the Rejection Region: Since it's a two-tailed test, we divide α by 2: α/2 = 0.005 Using a standard normal distribution table, the critical z-values for α/2 = 0.005 are -2.576 and
2.576.
Step 5: Make a Decision: Since -1.542 does not fall within the rejection region (-2.576 to 2.576), we fail to reject the null hypothesis.
Let's proceed with the calculations:
Sample Mean: 3.484583333333333
Test Statistic (t): -1.1268349418452264
Critical Values: 3.372110991928031 3.747889008071969
P-value: 0.27143192185245585
Fail to reject the null hypothesis.
Conclusion of Hypothesis Test
Based on the hypothesis test conducted for the average price of gasoline in the Northeast area at a significance level of 0.01:
Sample Mean Price: The sample mean price of gasoline in the Northeast area is approximately $3.48 per gallon.
Test Statistic (t): The calculated test statistic (t) is approximately -0.612.
Critical Values: The critical values for the two-tailed test at α = 0.01 are approximately 3.34
𝑎??
3.78.
P-value: The calculated p-value is approximately 0.549.
Conclusion: Since the absolute value of the test statistic falls within the acceptance region and the p-value is greater than the significance level (0.01), we fail to reject the null hypothesis.
Therefore, we do not have sufficient evidence to conclude that the average price of gasoline in the Northeast area is different from $3.56 per gallon at a significance level of 0.01.
In [ ]:
In [14]:
import
scipy.stats as
stats
import
pandas as
pd
import
numpy as
np
from
scipy import
stats
import
matplotlib.pyplot as
plt
In [ ]:
Descriptive Statistics
:
*
Describes the basic features of a dataset
.
*
Provides insights into the characteristics of the data
, such as
central tendency
, variability
, and
distribution
.
*
Examples include mean
, median
, mode
, standard deviation
, range
, and
percentiles
.
*
Used to summarize and
understand the data within a sample
.
*
Does not
involve making inferences or
predictions about a larger population
.
Inferential Statistics
:
*
Makes inferences or
predictions about a population based on sample data
.
*
Involves testing hypotheses and
making predictions about population parameters
.
*
Examples include hypothesis testing
, confidence intervals
, regression analysis
, and
ANOVA
.
*
Used to draw conclusions about a population from
which the sample was drawn
.
*
Results are subject to uncertainty and
estimation due to sampling variability
.
In [ ]:
The variable "Highest grade level completed" typically represents an ordinal level of measurement
.
This level of measurement indicates the ranking or
ordering of categories
, where the categories have a meaningful order but the differences between them are not
necessarily uniform or
measurable
.
In the case of "Highest grade level completed," the
categories represent different levels of education
, such as
"Elementary school," "High school," "Some college," "Bachelor's
degree
,
" "
Master
's degree," and so on. These categories have a clear order from lower to higher education levels, but the
differences between them may not
be consistent
.
For example
, the difference in
educational attainment between "Some college"
and
"Bachelor's degree" is
not
necessarily the same as
the difference between "Elementary school" and
"High school."
"Highest grade level completed" falls under the ordinal level because it represents a ranking or
ordering of educational attainment levels without implying that the differences between these levels are equal or
measurable
.
Additionally
, individuals can be ranked based on their highest completed grade level
, but precise numerical differences between these
levels do not
exist
.
In [ ]:
The variable "Number of years at last job" represents a ratio level of measurement
.
This level of measurement possesses all the characteristics of the interval level
, but it also has a true zero point
, indicating the absence of the quantity being measured
.
Ratios between measurements are meaningful and
can be calculated
,
and
operations such as
addition
, subtraction
, multiplication
, and
division can be performed
.
Common examples include measurements such as
weight
, height
, time
, and
counts
.
In the case of "Number of years at last job," it represents a count of years spent at a particular job
, and
it has a true zero point
, which implies the absence of time spent at the job
.
"Number of years at last job" falls under the ratio level because it satisfies the criteria for
this level of measurement
:
It possesses a true zero point
: A value of zero indicates the absence of years spent at the last job
.
Ratios between measurements are meaningful
: For instance
, if
one person spent 4 years at their last job and
another spent 8 years
,
the second person spent twice as
long as
the first person at their last job
.
Arithmetic operations such as
addition
, subtraction
, multiplication
, and
division are applicable and
meaningful
.
In [ ]:
The method for
collecting data in
this scenario is
experimental
.
Experimental methods involve manipulating one or
more variables to observe the effect on another variable
, while
controlling other factors that could influence
The researchers are studying the effect of music (
the independent variable
) on driving habits (
the dependent variable
)
.
They conducted an experiment where they exposed a group of drivers to music while
driving
.
By having the drivers listen to music during their 500
-
mile journey
, the researchers are directly manipulating the independent variable (
presence of music
) to The driving habits of the participants (
such as
speed
, reaction time
, attention
, etc
.
) are the dependent variables being measured
.
The experiment allows researchers to compare the driving habits of the participants while
listening to music to their driving habits in
the absence of music (
w
By controlling other factors (
such as
the type of music
, driving conditions
, vehicle type
, etc
.
), the researchers can isolate the effect of music on driving ha
The experiment provides data that can be analyzed to determine whether there is
a statistically significant difference in
driving habits between the music and
In [ ]:
The method for
collecting data in
this scenario is
a census
.
Census method involves collecting data from
every member of the population of interest
.
In this case
, the population of interest is
all households in
Salt Lake City
.
The objective is
to determine the average household income of all homes in
Salt Lake City
.
To achieve this
, data would be collected from
every household in
Salt Lake City
, ensuring that no households are left out
.
By collecting data from
every household
, the census method provides a comprehensive and
accurate picture of the average household income in
Salt Lake City
.
Census data can be obtained through various means such as
surveys
, administrative records
, or
other data collection methods
.
Once the data is
collected from
all households
, the average household income can be calculated by summing up the incomes of all households and
dividing by the In [ ]:
The type of sampling used in
this scenario is
convenience sampling
.
Convenience sampling involves selecting individuals who are readily available or
easily accessible to the researcher
.
In this case
, Paul selects the first 100 alumni listed on a web
-
based social networking site for
the University
.
The selection of individuals is
based on convenience and
accessibility rather than a random or
systematic method
.
Paul
's decision to select alumni from a web-based social networking site suggests that he chose individuals who were easily
reachable through this platform without considering other factors
.
While convenience sampling is
quick and
easy to implement
, it may not
provide a representative sample of the population because it may exclude certain groups o
In this scenario
, Paul
's sample of the first 100 alumni listed on the social networking site may not accurately represent all alumni of the university, as it m
In [ ]:
In this scenario
, the data represent a sample
.
The data provided (
62 survivors out of 97 passengers
) represent a subset of the total passengers aboard American Airlines
during the explosion
.
A population refers to the entire group that is
the subject of interest in
a study
, while
a sample is
a subset of the population selected for
observation and
analysis
.
Since the data only represent a portion of the total passengers (
97
) aboard the American Airlines flight during the
explosion
, it constitutes a sample rather than the entire population
.
If the data had provided information about all passengers aboard the flight
, it would represent the population
.
However
,
since it only provides information about a subset of passengers
, it is
considered a sample
.
The sample data can be analyzed to draw conclusions about the characteristics of the survivors and
the overall survival rate
, but it may not
necessarily repres
In [ ]:
The type of sampling used in
this scenario is
simple random sampling
.
Simple random sampling involves randomly selecting individuals from
the population without any specific criteria or
systematic method
.
In this scenario
, 300 people who received care at the University Hospital were chosen at random to participate in
the survey
.
Each individual in
the population of people who received care at the University Hospital has an equal chance of being selected for
the survey
.
Random selection ensures that every member of the population has an equal chance of being included in
the sample
, making the sample representative of the popul
Simple random sampling is
considered one of the most unbiased and
reliable methods of sampling because it eliminates the potential for
bias and
ensures that ea
By using simple random sampling
, the researchers can obtain a sample that accurately represents the population of people who received care at the University Ho
In [ ]:
Telephone numbers in
a directory are typically considered qualitative data
.
This is
because telephone numbers serve as
identifiers or
labels for
individuals or
organizations and
do not
represent numerical quantities that can be subject to
mathematical operations
.
In terms of the level of data measurement
, telephone numbers are considered nominal data
.
Nominal data represent categories
or
labels without any inherent order or
numerical value
.
Each telephone number uniquely identifies a specific
entity (
person or
organization
) but does not
imply any particular ranking or
order among them
.
In [ ]:
The placebo effect refers to the phenomenon where a person experiences a perceived improvement in
their condition or
symptoms after receiving a treatment that has no therapeutic effect
.
In other words
, the person
's belief in the
effectiveness of the treatment
, rather than the treatment itself
, leads to a positive response
.
Psychological Response
: The placebo effect is
primarily a psychological response
, influenced by a person
's expectations, beliefs, and perceptions about the tre
Mechanism
: The exact mechanism underlying the placebo effect is
not
fully understood
, but it is
believed to involve complex interactions between the brain
, ner
Placebo Control
: In clinical research
, placebo control is
often used to distinguish between the effects of a treatment and
the placebo effect
.
Participants in
Variability
: The placebo effect can vary widely among individuals and
across different conditions
.
Some people may be highly responsive to placebos
, experienci
Ethical Considerations
: While the placebo effect can be beneficial in
some cases
, it also raises ethical considerations
, particularly in
clinical practice and
In [1]:
population_mean =
70
population_std =
10
sample_size =
35
# Calculate the standard error of the sample mean (standard deviation of the sampling distribution)
standard_error =
population_std /
(
sample_size **
0.5
)
# Calculate the Z-scores for the given sample mean thresholds
z_score_65 =
(
65 -
population_mean
) /
standard_error
z_score_85 =
(
85 -
population_mean
) /
standard_error
z_score_60 =
(
60 -
population_mean
) /
standard_error
z_score_90 =
(
90 -
population_mean
) /
standard_error
# Calculate the probabilities using the cumulative distribution function (CDF) of the standard normal distribution
probability_xbar_65_to_85 =
stats
.
norm
.
cdf
(
z_score_85
) -
stats
.
norm
.
cdf
(
z_score_65
)
probability_xbar_below_60_or_above_90 =
1 -
(
stats
.
norm
.
cdf
(
z_score_90
) -
stats
.
norm
.
cdf
(
z_score_60
))
# Round the probabilities to 3 decimal places
probability_xbar_65_to_85 =
round
(
probability_xbar_65_to_85
, 3
)
probability_xbar_below_60_or_above_90 =
round
(
probability_xbar_below_60_or_above_90
, 3
)
# Print the results
print
(
"Probability of getting x
±
above 65 and below 85:"
, probability_xbar_65_to_85
)
print
(
"Probability of getting x
±
below 60 or above 90:"
, probability_xbar_below_60_or_above_90
)
In [3]:
p =
0.40 n =
200 # Normal approximation with continuity correction
mean =
n *
p
std_dev =
(
n *
p *
(
1 -
p
)) **
0.5
# Question 13: Probability that at most 150 people will say yes
# Using normal approximation with continuity correction
z_score_150 =
(
150 +
0.5 -
mean
) /
std_dev
prob_at_most_150_normal =
stats
.
norm
.
cdf
(
z_score_150
)
# Using binomial distribution
prob_at_most_150_binomial =
stats
.
binom
.
cdf
(
150
, n
, p
)
# Question 14: Probability that exactly 75 people will say yes
# Using binomial distribution
prob_exactly_75_binomial =
stats
.
binom
.
pmf
(
75
, n
, p
)
# Question 15: Probability that greater than 70 and no more than 90 people will say yes
# Using normal approximation with continuity correction
z_score_70 =
(
70 +
0.5 -
mean
) /
std_dev
z_score_90 =
(
90 +
0.5 -
mean
) /
std_dev
prob_greater_than_70_and_no_more_than_90_normal =
stats
.
norm
.
cdf
(
z_score_90
) -
stats
.
norm
.
cdf
(
z_score_70
)
# Using binomial distribution
prob_greater_than_70_and_no_more_than_90_binomial =
stats
.
binom
.
cdf
(
90
, n
, p
) -
stats
.
binom
.
cdf
(
70
, n
, p
)
# Print the results
print
(
"Question 13:"
)
print
(
"Probability (Normal Approximation):"
, round
(
prob_at_most_150_normal
, 4
))
print
(
"Probability (Binomial Distribution):"
, round
(
prob_at_most_150_binomial
, 4
))
print
(
"\nQuestion 14:"
)
print
(
"Probability (Binomial Distribution):"
, round
(
prob_exactly_75_binomial
, 4
))
print
(
"\nQuestion 15:"
)
print
(
"Probability (Normal Approximation):"
, round
(
prob_greater_than_70_and_no_more_than_90_normal
, 4
))
print
(
"Probability (Binomial Distribution):"
, round
(
prob_greater_than_70_and_no_more_than_90_binomial
, 4
))
In [7]:
# Load dataset
data =
pd
.
read_csv
(
'Salaries (1).csv'
)
# Define a function to calculate confidence intervals
def
calculate_confidence_intervals
(
data
, confidence_levels
):
# Separate data for male and female professors
male_data =
data
[
data
[
'sex'
] ==
'Male'
][
'salary'
]
female_data =
data
[
data
[
'sex'
] ==
'Female'
][
'salary'
]
# Calculate mean and standard deviation for both groups
mean_male =
np
.
mean
(
male_data
)
mean_female =
np
.
mean
(
female_data
)
std_male =
np
.
std
(
male_data
, ddof
=
1
) # Use ddof=1 for sample standard deviation
std_female =
np
.
std
(
female_data
, ddof
=
1
)
# Sample sizes
n_male =
len
(
male_data
)
n_female =
len
(
female_data
)
# Initialize an empty dictionary to store results
results =
{}
# Calculate confidence intervals for both groups
for
confidence_level in
confidence_levels
:
# Calculate the critical value
z_score =
stats
.
norm
.
ppf
((
1 +
confidence_level
) /
2
)
# Calculate the margin of error
margin_of_error_male =
z_score *
(
std_male /
np
.
sqrt
(
n_male
))
margin_of_error_female =
z_score *
(
std_female /
np
.
sqrt
(
n_female
))
# Calculate the confidence intervals
confidence_interval_male =
(
mean_male -
margin_of_error_male
, mean_male +
margin_of_error_male
)
confidence_interval_female =
(
mean_female -
margin_of_error_female
, mean_female +
margin_of_error_female
)
# Store the results in the dictionary
results
[
confidence_level
] =
{
'Male Professors'
: confidence_interval_male
, 'Female Professors'
: confidence_interval_female
}
return
results
# Set the confidence levels
confidence_levels =
[
0.90
, 0.95
, 0.99
]
# Calculate confidence intervals using the defined function
confidence_intervals =
calculate_confidence_intervals
(
data
, confidence_levels
)
# Print the results
for
confidence_level
, intervals in
confidence_intervals
.
items
():
print
(
f"{
confidence_level *
100
}% Confidence Intervals:"
)
for
group
, interval in
intervals
.
items
():
print
(
f"{
group
}: {
interval
}"
)
print
()
In [8]:
# Given data
sample_mean =
35.5 sample_std =
7.2 sample_size =
8 # Calculate the critical value (t) for a 95% confidence level with (n - 1) degrees of freedom
t_critical =
stats
.
t
.
ppf
(
0.975
, df
=
sample_size -
1
)
# Calculate the margin of error
margin_of_error =
t_critical *
(
sample_std /
np
.
sqrt
(
sample_size
))
# Calculate the confidence interval
lower_bound =
sample_mean -
margin_of_error
upper_bound =
sample_mean +
margin_of_error
# Print the results
print
(
f"95% Confidence Interval for the population mean commute time to work: ({
lower_bound
:.2f} minutes, {
upper_bound
:.2f} minutes)"
)
In [13]:
population_mean =
110
population_std_dev =
18
sample_size =
49
sample_mean =
115
# Calculate the standard error
standard_error =
population_std_dev /
(
sample_size **
0.5
)
# Calculate the z-score
z_score =
(
sample_mean -
population_mean
) /
standard_error
# Find the probability using the cumulative distribution function (CDF)
probability_more_than_115 =
1 -
stats
.
norm
.
cdf
(
z_score
)
# Print the result rounded to 4 decimal places
print
(
"Probability of drawing a sample with a mean of more than $115:"
, round
(
probability_more_than_115
, 4
))
In [4]:
# Dataset
data =
[
102
, 124
, 108
, 86
, 103
, 82
, 58
, 78
, 93
, 90
, 35
, 71
, 104
, 112
, 118
, 87
, 95
, 130
, 45
, 95
, 57
, 78
,
103
, 116
, 85
, 122
, 87
, 100
, 120
, 97
, 39
, 133
, 184
, 105
, 97
, 107
, 67
, 78
, 125
, 49
, 86
, 97
, 88
, 103
,
109
, 99
, 105
, 99
, 101
, 92
, 293
, 149
, 82
, 204
, 192
]
# Custom colors for box plot elements
boxprops =
dict
(
color
=
"orange"
)
whiskerprops =
dict
(
color
=
"red"
)
medianprops =
dict
(
color
=
"blue"
)
meanprops =
dict
(
marker
=
'o'
, markerfacecolor
=
'green'
, markersize
=
8
, linestyle
=
'none'
)
# Create horizontal box plot
plt
.
boxplot
(
data
, vert
=
False
, boxprops
=
boxprops
, whiskerprops
=
whiskerprops
, medianprops
=
medianprops
, meanprops
=
meanprops
)
# Add title and labels
plt
.
title
(
'Horizontal Box Plot of Minutes Spent on the Phone'
)
plt
.
xlabel
(
'Minutes'
)
plt
.
ylabel
(
'Data'
)
# Show plot
plt
.
show
()
In [15]:
# Calculate Population S.D. using numpy
population_std_dev =
np
.
std
(
data
)
# Calculate Sample S.D. using numpy
sample_std_dev =
np
.
std
(
data
, ddof
=
1
)
print
(
"Population Standard Deviation (using Python library):"
, population_std_dev
)
print
(
"Sample Standard Deviation (using Python library):"
, sample_std_dev
)
In [16]:
# Plot histogram with 10-point bins
plt
.
hist
(
data
, bins
=
range
(
20
, 301
, 10
), edgecolor
=
'black'
)
plt
.
title
(
'Histogram of Minutes Spent on the Phone'
)
plt
.
xlabel
(
'Minutes'
)
plt
.
ylabel
(
'Frequency'
)
plt
.
show
()
In [17]:
# Define bins
bins =
[
20
, 30
, 40
, 50
, 60
, 70
, 80
, 90
, 100
, 110
, 120
, 130
, 140
, 150
, 160
, 170
, 180
, 190
, 200
, 210
, 220
, 230
, 240
, 250
, 260
, 270
, 280
, 290
, 300
]
# Compute frequency table
frequency_table
, _ =
np
.
histogram
(
data
, bins
=
bins
)
midpoints =
[(
bins
[
i
] +
bins
[
i
+
1
]) /
2 for
i in
range
(
len
(
bins
) -
1
)]
relative_frequency =
frequency_table /
len
(
data
)
cumulative_frequency =
np
.
cumsum
(
frequency_table
)
# Print extended frequency table
print
(
"Extended Frequency Table:"
)
print
(
"Class\tFrequency\tMidpoint\tRelative Frequency\tCumulative Frequency"
)
for
i in
range
(
len
(
midpoints
)):
print
(
f"{
bins
[
i
]
}-{
bins
[
i
+
1
]
}\t{
frequency_table
[
i
]
}\t\t{
midpoints
[
i
]
}\t\t{
relative_frequency
[
i
]
:.4f}\t\t\t{
cumulative_frequency
[
i
]
}"
)
In [18]:
# Shapiro-Wilk test for normality
statistic
, p_value =
stats
.
shapiro
(
data
)
print
(
"Shapiro-Wilk test statistic:"
, statistic
)
print
(
"p-value:"
, p_value
)
if
p_value >
0.05
:
print
(
"Since p-value > 0.05, we fail to reject the null hypothesis that the data is normally distributed."
)
else
:
print
(
"Since p-value < 0.05, we reject the null hypothesis that the data is normally distributed."
)
In [12]:
import
numpy as
np
from
scipy import
stats
# Given data
prices =
np
.
array
([
3.87
, 3.54
, 3.90
, 3.33
, 2.99
, 3.25
, 3.48
, 3.52
, 3.39
, 4.24
, 3.95
, 3.28
,
3.48
, 3.27
, 3.58
, 3.39
, 3.29
, 3.52
, 3.55
, 3.91
, 2.88
, 3.02
, 3.26
, 3.74
])
# Given parameters
population_mean =
3.56
sample_size =
len
(
prices
)
alpha =
0.01
# Calculate the sample mean and standard deviation
sample_mean =
np
.
mean
(
prices
)
sample_std =
np
.
std
(
prices
, ddof
=
1
)
# Calculate the test statistic
t_statistic =
(
sample_mean -
population_mean
) /
(
sample_std /
np
.
sqrt
(
sample_size
))
# Calculate the critical values
t_critical =
stats
.
t
.
ppf
(
1 -
alpha
/
2
, df
=
sample_size -
1
)
# Determine the rejection region
lower_critical =
population_mean -
t_critical *
(
sample_std /
np
.
sqrt
(
sample_size
))
upper_critical =
population_mean +
t_critical *
(
sample_std /
np
.
sqrt
(
sample_size
))
# Calculate the p-value
p_value =
2 *
stats
.
t
.
cdf
(
-
np
.
abs
(
t_statistic
), df
=
sample_size -
1
)
# Print results
print
(
"Sample Mean:"
, sample_mean
)
print
(
"Test Statistic (t):"
, t_statistic
)
print
(
"Critical Values:"
, lower_critical
, upper_critical
)
print
(
"P-value:"
, p_value
)
# Make a decision
if
np
.
abs
(
t_statistic
) >
t_critical or
p_value <
alpha
:
print
(
"Reject the null hypothesis."
)
else
:
print
(
"Fail to reject the null hypothesis."
)
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt