The scatter plot , whether the provided values of ∑ x , ∑ y , ∑ x 2 , ∑ y 2 , ∑ x y are correct or not, and the correlation coefficient .

Question

Want to see more full solutions like this?

Answer 1

Question

Chapter 4.1, Problem 23P

(a)

To determine

The scatter plot, whether the provided values of ∑x, ∑y, ∑x2 ,∑y2, ∑xy are correct or not, and the correlation coefficient.

(a)

Expert Solution

Answer to Problem 23P

Solution: The provided values, that is, ∑x=154, ∑y=249, ∑x2=3712, ∑y2=9959 ∑xy=6067 are correct and the value of r is 0.991.

Explanation of Solution

Given: The provided table consists of values of x and y, where x represents the average annual hours spent by a person in traffic delay, y represents the average annual gallons of fuel wasted per person due to traffic delay. The data consists of 8 data pairs, thus n is 8.

Calculation: Follow the steps given below in MS Excel to obtain the scatter plot of the data.

Step 1: Enter the data into an MS Excel sheet. The screenshot is given below.

Bundle: Understanding Basic Statistics, Loose-leaf Version, 7th + WebAssign Printed Access Card for Brase/Brase's Understanding Basic Statistics, ... for Peck's Statistics: Learning from Data, Chapter 4.1, Problem 23P , additional homework tip 1

Step 2: Select the data and click on ‘Insert’. Go to charts and select the chart type ‘Scatter’.

Bundle: Understanding Basic Statistics, Loose-leaf Version, 7th + WebAssign Printed Access Card for Brase/Brase's Understanding Basic Statistics, ... for Peck's Statistics: Learning from Data, Chapter 4.1, Problem 23P , additional homework tip 2

Step 3: Select the first plot and then click ‘add chart element’ provided in the left corner of the menu bar. Insert the ‘Axis titles’ and ‘Chart title’. The scatter plot for the provided data is shown below:

Bundle: Understanding Basic Statistics, Loose-leaf Version, 7th + WebAssign Printed Access Card for Brase/Brase's Understanding Basic Statistics, ... for Peck's Statistics: Learning from Data, Chapter 4.1, Problem 23P , additional homework tip 3

To calculate ∑x, ∑y, ∑x2 ,∑y2 and ∑xy, it is easy to form the data in a table of five columns. The table is given below:

x	y	x2	y2	xy
28	48	784	2304	1344
5	3	25	9	15
20	34	400	1156	680
35	55	1225	3025	1925
20	34	400	1156	680
23	38	529	1444	874
18	28	324	784	504
5	9	25	81	45
∑x=154	∑y=249	∑x2=3712	∑y2=9959	∑xy=6067

The provided values, ∑x=154, ∑y=249, ∑x2=3712, ∑y2=9959 and ∑xy=6067 have been verified.

Now, the value of r can be calculated by using the formula below:

r=n∑xy -(∑x)(∑y)n∑x2−(∑x)2n∑y2−(∑y)2

Substituting the values in the above formula. Thus:

r=8(6067)−(154)(249)(8)(3712)−(154)2(8)(9959)−(249)2≈0.991

Therefore, the correlation coefficient is 0.991.

(b)

To determine

The averages x¯,y¯ and the standard deviations sx,sy for both the data sets, the comparison between the standard deviations of both the samples, and the reason behind the tendency of an increase in the value of r for smaller standard deviations sx and sy.

(b)

Expert Solution

Answer to Problem 23P

Solution: The values for data set 1 are x¯=19.25,y¯=31.13,sx≈10.33 and sy=17.76.

The values for data set 2 are x¯=20.13,y¯=31.87,sx≈13.84 and sy=25.18.

Explanation of Solution

Given: The provided table consists of values of x and y, where x represents the average annual hours spent by a person in traffic delay, y represents the average annual gallons of fuel wasted per person due to traffic delay.

The second table consists of x and y values where, x represent the annual hours lost by a person spent in traffic delay, y represents the annual gallons of fuel wasted by that person in traffic delay.

The data sets consist of 8 data pairs, thus n is 8 for both the data sets.

The provided values of data set 1 are, ∑x=154, ∑y=249, ∑x2=3712, ∑y2=9959 ∑xy=6067.

The provided values of data set 2 are, ∑x=161, ∑y=255, ∑x2=4583, ∑y2=12565 ∑xy=7071.

Calculation:

The value of x¯ for data set 1 can be calculated as follows:

x¯=∑xn=1548=19.25

The value of y¯ for data set 1 can be calculated as follows:

y¯=∑yn=2498=31.125

The standard deviation of x for data set 1 can be calculated as,

sx=∑x2−(∑x)2nn−1=3712−154288−1≈10.33

The standard deviation of y for data set 1 can be calculated as,

sy=∑y2−(∑y)2nn−1=9959−249288−1≈17.76

The value of x¯ for data set 2 can be calculated as follows:

x¯=∑xn=1618=20.13

The value of y¯ for data set 2 can be calculated as follows:

y¯=∑yn=2558=31.87

The standard deviation of x for data set 2 can be calculated as,

sx=∑x2−(∑x)2nn−1=4583−161288−1≈13.84

The standard deviation of y for data set 2 can be calculated as,

sy=∑y2−(∑y)2nn−1=12565−255288−1≈25.18

For the second data set, that is, for the variables based on single individuals, the standard deviations sx and sy are larger.

The values sx and sy are in the denominator in the formula for calculating r. Dividing by smaller values of sx and sy tends to increase the value of r.

(c)

To determine

The scatter plot, whether the provided values of ∑x, ∑y, ∑x2 ,∑y2, ∑xy are correct or not, and the correlation coefficient.

(c)

Expert Solution

Answer to Problem 23P

Solution: The provided values, that is, ∑x=161, ∑y=255, ∑x2=4583, ∑y2=12565 ∑xy=7071 are correct and the value of r is 0.794.

Explanation of Solution

The provided table consists of values of x and y, where x represents the average annual hours spent by a person in traffic delay, y represents the average annual gallons of fuel wasted per person due to traffic delay.

The data sets consist of 8 data pairs, thus n is 8.

Calculation: Follow the steps given below in MS Excel to obtain the scatter plot of the data.

Step 1: Enter the data into an MS Excel sheet. The screenshot is given below.

Bundle: Understanding Basic Statistics, Loose-leaf Version, 7th + WebAssign Printed Access Card for Brase/Brase's Understanding Basic Statistics, ... for Peck's Statistics: Learning from Data, Chapter 4.1, Problem 23P , additional homework tip 4

Step 2: Select the data and click on ‘Insert’. Go to charts and select the chart type ‘Scatter’.

Bundle: Understanding Basic Statistics, Loose-leaf Version, 7th + WebAssign Printed Access Card for Brase/Brase's Understanding Basic Statistics, ... for Peck's Statistics: Learning from Data, Chapter 4.1, Problem 23P , additional homework tip 5

Step 3: Select the first plot and then click ‘add chart element’ provided in the left corner of the menu bar. Insert the ‘Axis titles’ and ‘Chart title’. The scatter plot for the provided data is shown below:

Bundle: Understanding Basic Statistics, Loose-leaf Version, 7th + WebAssign Printed Access Card for Brase/Brase's Understanding Basic Statistics, ... for Peck's Statistics: Learning from Data, Chapter 4.1, Problem 23P , additional homework tip 6

Calculation: The calculation for ∑x, ∑y, ∑x2 ,∑y2 and ∑xy is shown below;

x	y	x2	y2	xy
20	60	400	3600	1200
4	8	16	64	32
18	12	324	144	216
42	50	1764	2500	2100
15	21	225	441	315
25	30	625	900	750
2	4	4	16	8
35	70	1225	4900	2450
∑x=161	∑y=255	∑x2=4583	∑y2=12565	∑xy=7071

The provided values, ∑x=161, ∑y=255, ∑x2=4583, ∑y2=12565 and ∑xy=7071 have been verified.

Now, the value of r can be calculated by using the formula below:

r=n∑xy -(∑x)(∑y)n∑x2−(∑x)2n∑y2−(∑y)2

Substituting the values in the above formula. Thus:

r=8(7071)−(161)(255)(8)(4583)−(161)2(8)(12565)−(255)2≈0.794

Therefore, the correlation coefficient is 0.794.

(d)

To determine

Comparison between the values of r that are calculated in part (a) and part (c), whether the data for average have a higher correlation coefficient than the data for individual measurement or not, and the reason for it.

(d)

Expert Solution

Answer to Problem 23P

Solution: Yes, the data for average has a higher correlation coefficient than the data for individual measurement because, according to the central limit theorem, the standard deviation of averages will be smaller than the standard deviation of individual values.

Explanation of Solution

Given: The values of correlation coefficient from part (a) and part (b) are 0.991 and 0.794, respectively.

It can be seen that 0.991>0.794. The data for average has a higher correlation coefficient than the data for individual measurement. This is because the standard deviation for the average is smaller than the standard deviation for individual measurements.

According to the central limit theorem, the standard deviation is smaller for the x¯ distribution than the corresponding x distribution.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Students have asked these similar questions

A researcher wishes to estimate, with 90% confidence, the population proportion of adults who support labeling legislation for genetically modified organisms (GMOs). Her estimate must be accurate within 4% of the true proportion. (a) No preliminary estimate is available. Find the minimum sample size needed. (b) Find the minimum sample size needed, using a prior study that found that 65% of the respondents said they support labeling legislation for GMOs. (c) Compare the results from parts (a) and (b). ... (a) What is the minimum sample size needed assuming that no prior information is available? n = (Round up to the nearest whole number as needed.)

The table available below shows the costs per mile (in cents) for a sample of automobiles. At a = 0.05, can you conclude that at least one mean cost per mile is different from the others? Click on the icon to view the data table. Let Hss, HMS, HLS, Hsuv and Hмy represent the mean costs per mile for small sedans, medium sedans, large sedans, SUV 4WDs, and minivans respectively. What are the hypotheses for this test? OA. Ho: Not all the means are equal. Ha Hss HMS HLS HSUV HMV B. Ho Hss HMS HLS HSUV = μMV Ha: Hss *HMS *HLS*HSUV * HMV C. Ho Hss HMS HLS HSUV =μMV = = H: Not all the means are equal. D. Ho Hss HMS HLS HSUV HMV Ha Hss HMS HLS =HSUV = HMV

Question: A company launches two different marketing campaigns to promote the same product in two different regions. After one month, the company collects the sales data (in units sold) from both regions to compare the effectiveness of the campaigns. The company wants to determine whether there is a significant difference in the mean sales between the two regions. Perform a two sample T-test You can provide your answer by inserting a text box and the answer must include: Null hypothesis, Alternative hypothesis, Show answer (output table/summary table), and Conclusion based on the P value. (2 points = 0.5 x 4 Answers) Each of these is worth 0.5 points. However, showing the calculation is must. If calculation is missing, the whole answer won't get any credit.

Answer 2