Statistics_Final
pdf
keyboard_arrow_up
School
Institute of Health Profession Education & Research, KMU Peshawar *
*We aren’t endorsed by this school
Course
177
Subject
Mathematics
Date
Nov 24, 2024
Type
Pages
10
Uploaded by CountAntelope14965
MAT 177 – Statistics 100 Points Final Project The primary objective
of this project is to employ statistical methodologies to analyze and interpret data, focusing on computing confidence intervals for the mean, conducting hypothesis tests, and performing regressing analysis. Through rigorous statistical analysis, we aim to provide valuable insights into the relationships within the data, assess the significance of observed trends, and enhance decision-making processes. This project will involve the application of advanced statistical techniques to ensure the accuracy and reliability of our findings, ultimately contributing to a deeper understanding of the underlying patterns and factors influencing the variables under investigation. Dataset
: The project is in four parts. Parts 1, 2, 3, and 4. Parts 1 and 4 have data in the attached Excel file. There are two worksheets in that Excel workbook. When you open the Excel file, at the bottom left corner, you will see the two names of the two worksheets: Heights and Cars. Click on Cars to access the cars data and vice versa. Completing and Uploading your Project Use Excel, TI-83/84, StatCrunch, or any other computer software to compute the descriptive statistics and draw the scatter diagram. You can either work on the Word document or print and complete the project. If you decide to work on the Word document, convert your document to a PDF file before uploading it to the Blackboard
. If you print out the project, you still have to create the scatter diagram using software and attach it to the Blackboard. Do not draw by hand. Instructions
for scanning and uploading files are in the folder on how to upload the completed project on Blackboard.
PART 1
(Use the data titled Height for this part) (20 points) (a)
Find from the dataset titled Heights in the attached excel file. Sample size, n = 50
Mean height = 145.42
Standard deviation = 15.71
(b)
The histogram below represents the height data. Does the histogram suggest a normal distribution? Explain Yes, the histogram having a bell-shaped curve suggests a normal distribution. The symmetric bell shape is a key characteristic of normal distributions.
(c)
Use the mean and standard deviation obtained in (a) above as point estimates for the population parameters, calculate the z-score for a student who is 175 cm tall. 0
2
4
6
8
10
12
14
120
130
140
150
160
170
180
190
200
More
Height Histogram for Heights of Students
𝑍 =
175 − 145.42
15.71
= 1.88
So, the z-score for a student with a height of 175 cm is approximately 1.88.
(d)
Interpret the z-score computed in the context of the data. A z-score of 1.88 means that the student's height is 1.88 standard deviations above the mean.
In the context of the data, this indicates that the student's height is relatively high compared to the average height in the dataset.
(e)
What is the probability that a randomly selected student from the dataset has a height greater than 175 cm? Use the properties of the normal distribution to calculate this probability. 𝑃(𝑋 > 175)
= 𝑃(𝑍 > 1.88)
𝑃(𝑋 > 175) = 1 − 𝑃(𝑍 < 1.88)
= 1- 0.96995
= 𝑃(𝑋 > 175) = 0.030054
Therefore, the probability that a randomly selected student has a height greater than 175 cm is approximately 0.0301 or 3.01%.
(f)
If the heights of students follow a normal distribution, how might this information be useful in designing a new classroom doorway? The current doorway is 175 cm If the heights of students follow a normal distribution, this information can be valuable in designing a new classroom doorway by leveraging the statistical properties of the dataset. The calculated z-score for a student with a height of 175 cm, which is approximately 1.88, indicates that this height is 1.88 standard deviations above the mean height of the student population. In the context of designing a doorway, this suggests that the current doorway height of 175 cm may not accommodate a significant portion of students, as it corresponds to a height that is relatively high in the distribution. Therefore, considering a new doorway height within a range that covers a higher percentage of students, perhaps within ±2 standard deviations of the mean, could enhance inclusivity and ensure that the doorway accommodates a broader spectrum of student heights, aligning with the normal distribution characteristics observed in the dataset.
PART 2 (Confidence Interval Estimate for a Population Mean) (15 points) Jobs and productivity! How do banks rate? One way to answer this question is to examine annual profits per employee. Forbes Top Companies, gave the following data about annual profits per employee (in units of one thousand dollars per employee) for representative companies in financial services. Companies such as Wells Fargo, First Bank System, and Key Banks were included. Assume that 𝞂
= 10.2, and for a sample of 42 annual profits, ??̅ = 36.0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(a)
Let us say that the preceding data are representative of the entire sector of (successful) financial services corporations. Find a 90% confidence interval for µ, the mean annual profit per employee for all successful banks. Given:
•
??̅
=36.0
•
Z
(for 90% confidence) is approximately 1.645
•
𝞂
=10.2
•
n
=42
Confidence interval= ??̅
± ? (
𝜎
√𝑛
)
= 36 ± 1.645 ∗ (
10.2
√42
)
= 36 ± 2.589
Lower bound= 36 − 2.589 =
33.411
Upper bound= 36 + 2.589 =
38.589
Now, the confidence interval is (33.411,38.589)
(b)
Let us say that you are the manager of a local bank with a large number of employees. Suppose the annual profits per employee are less than $30 thousand per employee. Do you think this might be somewhat low compared with other successful financial institutions? Explain by referring to the confidence interval you computed in part (a) Yes, based on the confidence interval computed in part (a) (33.411, 38.589), if the annual profits per employee at the local bank are less than $30 thousand, it might be considered somewhat low compared to other successful financial institutions. This is because $30 thousand is below the lower bound of the confidence interval, suggesting that the mean annual profit per employee for all successful banks is likely to be higher than $30 thousand.
(c)
Let us say that you are the manager of a local bank with a large number of employees. Suppose the annual profits per employee are less than $40 thousand per employee. Do you think this might be somewhat low compared with other successful financial institutions? Explain by referring to the confidence interval you computed in part (a) In part (a), you computed a 90% confidence interval for the mean annual profit per employee for all successful banks, and the interval was (33.411, 38.589). If the annual profits per employee at the local bank are less than $40 thousand, this amount is still within the confidence interval.
Since $40 thousand is above the upper bound of the confidence interval (38.589), it does not provide strong evidence that the mean annual profit per employee for all successful banks is likely to be lower than $40 thousand. Therefore, if the local bank's annual profits per employee are less than $40 thousand, it may not be considered particularly low compared to other successful financial institutions, according to the given confidence interval.
PART 3
Constructing a Statistical Test for µ (normal distribution) (25 points) The Environmental Protection Agency has been studying Miller Creek regarding ammonia nitrogen concentration. For many years, the concentration has been 2.3 mg/L. However, a new golf course and new housing developments are raising concern that the concentration may have changed because of lawn fertilizer. Any change (either an increase or a decrease) in the ammonia nitrogen concentration can affect plant and animal life in and around the creek. Let
??̅
be a random variable representing ammonia nitrogen concentration (in mg/L). Based on recent studies of Miller Creek, we may assume that ??̅
has a normal distribution with 𝞂
= 0.30. Recently, a random sample of eight water tests from the creek gave the following ??̅
values: 2.1 2.5 2.2 2.8 3.0 2.2 2.4 2.9 The sample mean is ??̅ = 2.51.
Construct a statistical test to examine the claim that the concentration of ammonia nitrogen has changed from 2.3 mg/L. Use level of significance α = 0.01. (a)
State the null hypothesis Null Hypothesis (H0):
The concentration of ammonia nitrogen in Miller Creek has not changed, and the mean concentration is still 2.3 mg/L.
H
0: μ
=2.3mg/L
State the alternative hypothesis Alternative Hypothesis (H1):
The concentration of ammonia nitrogen in Miller Creek has changed, either increased or decreased. H
1: μ ≠
2.3mg/L
(b)
Is this a right-tailed, left-tailed, or two-tailed test? This is a two-tailed test because we are interested in whether the concentration has changed in either direction (increase or decrease), not just one specific direction.
(c)
What sampling distribution shall we use? The student t-distribution or the normal z-distribution? Explain Since we are given the population standard deviation, we will be using the normal z-distribution.
(d)
Compute the sample test statistic? ??̅
= 2.51
, 𝜇 = 2.3
, 𝜎 = 0.3
, n=8
? =
??̅
− 𝜇
𝜎
√𝑛
=
2.51 − 2.3
0.3
√8
= 1.98
Z= 1.98
(e)
Find the P-value of the test. p-value = 0.047704 (from the normal distribution table)
(f)
Compare the significance level (α) and the P-value. What is your conclusion? •
p-value=0.047704
•
α
=0.01
Since p-value > α
, Conclusion: we fail to reject the null hypothesis.
(g)
Interpret your results in the context of this problem. There is not enough evidence at the 0.01 significance level to conclude that the concentration of ammonia nitrogen in Miller Creek has changed from the assumed value of 2.3 mg/L. The results do not provide sufficient statistical evidence to reject the null hypothesis.
PART 4
(Use data in the Excel file Titled Cars for this part) (40 points) The Quick Sell car dealership has been using 1-minute spot ads on a local TV station. The ads always occur during the evening hours and advertise the different models and price ranges of cars on the lot that week. During a 10-week period a Quick Sell dealer kept a weekly record of the number ??̅
of TV ads versus the number ?
of cars sold. The results are given in the table below. The same table are in the attached excel file titled Cars. Number of TV ads, ??̅
6 20 0 14 26 16 28 18 10 8 Number of Cars sold, ?
15 31 10 16 28 20 40 25 12 15 (a)
Draw a scatter diagram with the number of TV ads, ??̅
and number of Cars sold,
?
. Copy and Paste diagram below
. Do not draw by hand.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(b)
Based on the scatterplot, does a correlation appear to exist between ??̅
and ?
? If so, describe the correlation (strength, direction) Yes, there is a correlation between the number of TV ads (
x
) and the number of cars sold (
y
). The correlation is strong and positive, since the number of Tv ads increases as the number of cars sold increases and also the points are closer to the regression line. (c)
Use your calculator or any software to find the correlation coefficient, 𝑟
0
5
10
15
20
25
30
35
40
45
0
5
10
15
20
25
30
Number of Cars sold
Number of TV ads
Scatterplot
The correlation coefficient, r is 0.919
(d)
What does the value of r tell you about the relationship between the number of TV ads and the number of cars sold? The positive value of r
(0.919) implies that as the number of TV ads increases, the number of cars sold also increases, and the relationship is very strong.
(e)
Use your calculator or any software to compute the coefficient of determination, 𝑟
2
The coefficient of determination, r
2
is 0.8449 or 84.49%.
(f)
Interpret the value of 𝑟
2
in the context of this problem.
The coefficient of determination (
r
2) tells us the proportion of the variability in the dependent variable (
y
) that can be explained by the independent variable (
x
). In this case, 84.49% of the variability in the number of cars sold can be explained by the number of TV ads.
(g)
Using your calculator or any software, find the regression equation for this data set? The regression equation is y= 1.011x + 6.5407
(h)
Identify the slope of the regression equation. The slope of the regression equation is the coefficient of x
, which is 1.011. (i)
What does the slope represent in the context of this problem. For each additional TV ad (
x
), the number of cars sold (
y
) is expected to increase by 1.011 units.
(j)
Draw the regression equation on your scatter diagram
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(k)
Use your model (regression equation) to predict the number of cars expected to sell if 12 ads per week are aired on TV. For predicting the number of cars sold (
y
) when x
=12, plug in x
=12 into the regression equation:
y
=1.011(12) + 6.5407
y= 18.6727
So, if 12 ads per week are aired on TV, the model predicts that approximately 19 cars are expected to be sold.
y = 1.011x + 6.5407
R² = 0.8449
0
5
10
15
20
25
30
35
40
45
0
5
10
15
20
25
30
Number of Cars sold
Number of TV ads
Scatterplot