(bedrooms ~ waterfront, data = house_prices) ## ## Welch Two Sample t-test ## ## data: bedrooms by waterfront ## t = 0.83568, df = 163.83, p-value = 0.4046 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.09643533 0.23795892 ## sample estimates: ## mean in group FALSE mean in group TRUE ## 3.371375 3.300613
(bedrooms ~ waterfront, data = house_prices) ## ## Welch Two Sample t-test ## ## data: bedrooms by waterfront ## t = 0.83568, df = 163.83, p-value = 0.4046 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.09643533 0.23795892 ## sample estimates: ## mean in group FALSE mean in group TRUE ## 3.371375 3.300613
(bedrooms ~ waterfront, data = house_prices) ## ## Welch Two Sample t-test ## ## data: bedrooms by waterfront ## t = 0.83568, df = 163.83, p-value = 0.4046 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.09643533 0.23795892 ## sample estimates: ## mean in group FALSE mean in group TRUE ## 3.371375 3.300613
Interpret the 95% confidence interval given in the t.test() output from Lesson 10, in context. Does this interval agree with the conclusion you reached in the hypothesis test by using the p-value?
t.test(bedrooms ~ waterfront, data = house_prices) ## ##
Welch Two Sample t-test ## ## data: bedrooms by waterfront
## t = 0.83568, df = 163.83, p-value = 0.4046
## alternative hypothesis: true difference in means is not equal to 0
Transcribed Image Text:ysis
Test Statistic
Let's revisit our sample statistics:
favstats (bedrooms waterfront,
## waterfront min Q1 median Q3 max
## 1
FALSE 0 3
## 2
TRUE 1 3
data
waterfront, data = house_prices)
Using R as a calculator:
We have the following:
• xf = 3.37, t = 3.30, soft = 0.07
• Sf = 0.93, st = 1.08
• nf = 21450, nt = 163
We'll use all these in calculating the test statistic:
## [1] 0.8251786
mean
34 33 3.371375 0.9288559 21450
3 4 6 3.300613 1.0780353 163
Therefore, the test statistic is T = 0.83.
T=
(3.37 3.30)/sqrt(0.93^2/21450 +1.08^2/163)
=
sd
(x-x) - 0
n missing
+
0
0
p-value
Because the alternative hypothesis, HA: Mf - Mt 0, does not specify a direction, the p-value must account for
statistically significant "evidence" being in either direction (in favor of waterfront homes or non-waterfront homes). Therefore,
we need to multiply the regular one-sided p-value by 2:
Transcribed Image Text:# Take 1 - pt() because T>0
2* (1 pt(0.83,
df = 162).)
-
## [1] 0.4077602
The p-value is 0.41. Because this is greater than 0.05, we fail to reject the null hypothesis.
Conclusion: We do not have enough evidence to conclude that the average number of bedrooms for waterfront homes is
different from the average number of bedrooms for non-waterfront homes.
t.test()
Because the two-sample t-test is such a frequently-used statistical procedure, R has a built-in function that does all of these
"by-hand" calculations for us! Though, it is still important to be able to read and interpret the output.
The t.test() syntax follows some of the usual mosaic syntax that we've seen throughout the course:
t.test(bedrooms ~ waterfront, data = house_prices)
##
## Welch Two Sample t-test
##
## data: bedrooms by waterfront
## t = 0.83568, df = 163.83, p-value = 0.4046
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.09643533 0.23795892
## sample estimates:
## mean in group FALSE mean in group TRUE
##
3.371375
3.300613
This gives the test statistic ( t ), the p-value, and a 95% confidence interval all in the same output! Note that the t and
p-value we found "by-hand" were slightly different due to round off error. We can use this output to reach the same
conclusion as above - not enough evidence to conclude there is a difference in mean bedrooms between waterfront and non-
waterfront homes.
Definition Definition Measure of central tendency that is the average of a given data set. The mean value is evaluated as the quotient of the sum of all observations by the sample size. The mean, in contrast to a median, is affected by extreme values. Very large or very small values can distract the mean from the center of the data. Arithmetic mean: The most common type of mean is the arithmetic mean. It is evaluated using the formula: μ = 1 N ∑ i = 1 N x i Other types of means are the geometric mean, logarithmic mean, and harmonic mean. Geometric mean: The nth root of the product of n observations from a data set is defined as the geometric mean of the set: G = x 1 x 2 ... x n n Logarithmic mean: The difference of the natural logarithms of the two numbers, divided by the difference between the numbers is the logarithmic mean of the two numbers. The logarithmic mean is used particularly in heat transfer and mass transfer. ln x 2 − ln x 1 x 2 − x 1 Harmonic mean: The inverse of the arithmetic mean of the inverses of all the numbers in a data set is the harmonic mean of the data. 1 1 x 1 + 1 x 2 + ...
Expert Solution
Step 1
Concept: Confidence interval is the range in which at certain level of significance,the population parameter is gonna lie.