1.42 Solver and least squares fits 1.43 Error and statistics The objectives of this section are to: 1. Create error bars within a chart in a spreadsheet. 2. Apply statistics-related functions for a data series. 3. Calculate and distinguish between standard deviation and quartiles for a data series. 4. Complete a linear regression within a spreadsheet using functions or built-in regression tools. Error and error bars An error is commonly known as a mistake, but in a spreadsheet, an error usually quantifies a statistical difference. Ex: A residual is the difference between the data and model, which is detailed in another section, and is one example of a quantifiable error. When plotting data and models in charts, error bars provide one method to visualize quantified error. The animation below shows several types of error bars. PARTICIPATION ACTIVITY 1.43.1: Types of error bars. Start 2x speed 100 y-axis error bar Property 80 with cap Error value y-axis 60- error bar without cap x and y axis Fixed value 1.0 error Percentage 1.5 bars 40 Standard deviation 1.0 x-axis y-axis error error 20 bars bars HAH 0 2 6 8 x Captions ^ 1. A data point is plotted (blue triangle) and error bars are added as lines extending from the point, with or without a cap. 2. A property window allows formatting of error bars by entering values or selecting values in cells. Increasing the percentage makes larger error bars. 3. Error bars may be used on the x or y-axis only, or both. PARTICIPATION ACTIVITY 1.43.2: Basics of error bars. 1) Error bars demonstrate the magnitude of the error for data points within a chart. ○ True O False 2) The magnitude of error bars can only be a percentage of the magnitude of the data point. True O False 3) Error bars are only used on scatter charts. O True O False Some statistics Feedback? Feedback? When collecting data, multiple measurements are made to quantify the error of the measurement. First, important quantities related to sets of data are presented, followed by definitions and animations related to distributions. Table 1.43.1: Summarizing data terms. Definition Sample calculation using data set [2.4, 2.9, 3.3, 2.7,2.6] An average, or mean, sums the data and divides by =(2.4+2.9 +3.3 +2.7 the number of data points. A median is the data point separating the larger half from the smaller half of a data series. When a series contains an even number of data points, the average of the two data points separating the larger and smaller halves can be used. +2.6)/5 = 2.78 Sorting from smallest to largest-2.4, 2.6, 2.7, 2.9, 3.3-finds the median equals 2.7. A maximum is the largest data point of the data 3.3 series. A minimum is the smallest data point of the data 2.4 series. Feedback? Table 1.43.2: Data terms to spreadsheet functions. The cell addresses and data set are: A2-2.4, A3-2.9, A4-3.3, A5-2.7, A6-2.6. Manual spreadsheet actions Spreadsheet Term function AVERAGE(cell Average =(A2+A3+A4+A5+A6)/5 group) Spreadsheet calculation using function =AVERAGE(A2:A6) MEDIAN(cell Median group) Maximum MAX(cell group) Sort data/cells from smallest to largest, identify data/cell dividing -MEDIAN(A2:A6) larger and smaller halves of data series. Sort data/cells from largest to smallest, first cell is maximum. Sort data/cells from smallest to largest, first =MAX(A2:A6) =MIN(A2:A6) MIN(cell Minimum group) cell is minimum. Feedback? A number of methods for quantifying error exist and are covered in a statistics course. Quantifying error usually begins by identifying the type of distribution found in the data. A distribution is the mathematical representation of a data series. While many types of distributions are covered in a statistics course, a normal distribution and quartiles are covered here. A normal distribution represents data symmetrically spread around the average. A normal distribution is also known as a bell curve or Gaussian distribution. Ex: Many natural things fit a normal distribution, including humans' height or snowflakes' size. Standard deviation is one common method for quantifying error. The definition, spreadsheet function, and use in a chart are demonstrated below. Table 1.43.3: Quantifying error terms. Definition Equation Sample calculation using data set (2.4, 2.9, 3.3, 2.7, 2.6] A standard deviation (sd) is a measure of the variation within a data series and returns a value with the same units as the data sd = (-average) N-1 where N is the number of data =(((2.4-2.78)^2 +(2.9-2.78)^2+ (3.3-2.78)^2+ (2.7-2.78)^2+ points and i represents each data (2.6-2.78)^2 point in the series. )/4)^0.5 0.34 series. Feedback? When a data series does not follow a normal distribution, quartiles can represent data in a chart. A quartile divides a data series into four equal parts. The first quartile divides the smallest 25% of the data series from the rest. A second quartile divides the data series into equal parts, which is also the median. The third quartile divides the smallest 75% of the data series from the rest. Ex: Exam scores do not fit a normal distribution in many cases, so reporting first quartile, median, and third quartile scores provide a method to compare one student's exam to an entire class. A box-whisker chart is a type of bar chart where a data series is bounded with error bars for the maximum and minimum, with a box bounding the first quartile and third quartile. The box is divided by the median, and the box encompasses the middle 50% of the data within the series. Spreadsheet functions and animations expand on the definitions of standard deviation and quartiles below. Table 1.43.4: Quantifying error terms using spreadsheet functions. The data set and cell addresses are: A2-2.4, A3-2.9, A4-3.3, A5-2.7, A6-2.6. Value returned by spreadsheet Spreadsheet Term function Standard deviation 1st Quartile 3rd Quartile Spreadsheet formula using function STDEV(cell group) =STDEV(A2:A6) QUARTILE(cell group, quartile number) QUARTILE(cell group, quartile number) 0.34 =QUARTILE(A2 A6, 1) 2.6 =QUARTILE(A2:A6, 3) 2.9 PARTICIPATION ACTIVITY 1.43.3: Visualizing a normal distribution. Start 2x speed 0.4 0.3- Normal distribution Scatter chart Average = 0 sd=1 1 sd (68.2%) 0.2- 0.1 -4 -2 0 2 x Minimum Medan 1st quartile 3rd quartile, (50%) 2 sd (95.4%) formula Box-whisker chart Feedback? Captions A distribution is plotted with an average of 0 and standard 1. A normal deviation of 1. 2. One standard deviation accounts for ~68% of the distribution while two SD covers over 95%. 3. A box-whisker chart also represents a distribution with 50% of the data accounted for in the box, divided by the median. PARTICIPATION ACTIVITY 1.43.4: Visualizing standard deviation and quartiles. Start 10 Measurement (units) 2x speed OOOOOO H Raw Average ± Box- data whisker standard deviation Data representation Feedback? Raw, unsorted data 6.5 4.5 7.1 8.3 7.8 7.3 9.4 Captions A 1. Seven data points are recorded and will be plotted three ways. 2. First, the raw data are plotted as individual data points (circles). 3. Next, the average (triangle) with error bars of one standard deviation are plotted 4. Finally, a box shows 1st, 2nd, and 3rd quartiles and minimum/maximum with error bars (whiskers). PARTICIPATION ACTIVITY 1.43.5: Distributions. 1) The median and average are always the same number. O True O False 2) The second quartile is also the O average O median O standard deviation 3) A data series with 100 data points follows a normal distribution. The number of data points within one standard deviation of the average is O 68 95 99 4) The median divides the two boxes in a box-whisker chart. The percentage of data points accounted for within either box is 50 75 =zyBooks ▾ AUTIVITY Feedback? Feedback? 584450.1699390 qay? Jump to level 1 Enter a formula in Cell D3 for the average of only the data point: average in Cell D4. A 1 Data 2 64 3 73 4 68 5 89 6 55 7 63 8 37 9 36 10 14 Copy sheet Check Next B C D =AVERAG 71 X Each incorrect answer is highlighted. Expected: D3-AVERAGE(A2:A10) and D4-55.4 An average sums the data points and divides by the number of data point: using the AVERAGE spreadsheet function. >-0-0-0-0- Feedback? Linear regression A linear regression is a model that fits data with a slope and intercept from a linear model, such as y=mx+ Linear regression can be done for multip dependent (x) variables, which is covered elsewhere by following the link in the Exploring Further section below. Linear regression can be completed using spreadsheet functions, as shown in the table below, or using a regression analysis tool, as shown in the animations below. Therefore, linear regression is a another method that can be applied within a spreadsheet to model data with a linear model; trendlines and least squares fits also create optimized linear models, which are covered in other sections. Table 1.43.5: Linear regression spreadsheet functions. A data series with four data points will be used in the definitions below. A B 1 x data y data 21 20 3 3 37 45 57 57 78 Spreadsheet Spreadsheet formula using Term function function Value returned using spreadsheet above Slope SLOPE(y cell range, x cell range) INTERCEPT(y =SLOPE(B2:B5A2:A5) 9.7 Intercept cell range, x cell -INTERCEPT(B2:B5,A2:A5) 9.2 R- squared range) RSQ (y cell range, x cell range) =RSQ(B2:B5A2:A5) 0.998 Feedback? A confidence interval provides the range which a value will fall between a lower and upper bound. Ex: A 95% confidence interval predicts that 95% of values will fall within the bounds. Spreadsheets have data analysis packages to perform linear regression and provide 95% confidence intervals for all parameters. Ex: A slope of 5.3 may have a 95% confidence interval between 3.9 and 6.7. The animations below demonstrate using linear regression and data analysis packages in a spreadsheet and a chart. PARTICIPATION ACTIVITY 1.43.6: Entering values into a regression analysis tool. Start 2x speed Tx A B C D 1 x data y data 2 1 20 3 3 37 4 5 57 5 7 78 Regression analysis Input y range: B2:B5 Input x range: A2:A5 Output range: E1:H10 Other options Residuals Residual plots OK R-squared Upper bound 95% confidence interval Lower bound 95% confidence interval Parameters and many other statistical measures Captions A 1. A spreadsheet is filled with x and y data for four data points. ranges entered, analysis options selected, and OK is clicked. 3. Many statistical measures are outputted from the Regression analysis 2. Next, the regression analysis tool is located, selected, appropriate cell tool, such as R-squared and optimized parameters. Results are shown in another animation. PARTICIPATION ACTIVITY 1.43.7: Visualizing a confidence interval. Start 2x speed 100 80 60- 40- 20 y=9.7x+9.2 R=0.998 Upper 95%: y 11.1x+15.6 Lower 95%: y = 8.3x+2.8 0 2 4 6 x Captions A Linear model: y = mx + b Feedback? Regression analysis tool Optimized parameters and R-squared and upper and lower 95% confidence interval with other statistical measures available 1. Four data points are plotted and the regression analysis tool is used within a spreadsheet. 2. First, the optimized line is plotted in green and the R-squared is 0.998. Then the upper and lower 95% confidence interval lines are added. PARTICIPATION ACTIVITY 1.43.8: Linear regression. 1) A linear model is found to have a slope of 97. The 95% confidence interval is 97 ± 14, and the 98% confidence interval is 97 ± 5. O True ○ False 2) A data set includes 35 data points of pressure as a function of temperature, which is fit by a linear model: P mT+b. Two engineers find m and b differently. One engineer uses SLOPE and INTERCEPT spreadsheet functions, while another engineer uses the linear regression data analysis tool. Both engineers find the same values for m and b. ○ True O False 3) Vivian and Alexander have a data set with pressure, temperature, and volume. They can use the data analysis tool within their spreadsheet to find a 95% confidence interval for a linear model. ○ True O False Feedback? Feedback? Linearizing an equation Linear regression can also be applied to non-linear models using a process called linearization. A linearization converts a non-linear model into a linear model. While many models cannot be linearized, some guidelines can direct the linearization process. The equation should be separable, so y and x expressions can be placed on different sides of the equation. The linear model of y = mx +b with two constants still applies, but now expands to account for a function of y instead of y, and a function of x instead of x. The animations below demonstrate the linearization process in both a spreadsheet and chart. PARTICIPATION ACTIVITY 1.43.9: Linearization of an exponential model. Start 2x speed fx A 1 A= B 2 C D E From 6.9 least B= 0.35 3 x y squares In(A)= 1.9 INTERCEPT B= 0.35 SLOPE(E4:E linearized x linearized y 4 1 10 1 2.30 5 4 29 4 3.35 6 6 55 6 4.05 7 7 84 7 4.40 Exponential model: Linearized model: y = A exp(Bx) Fit using: Least squares fit ог Trendline Linearization In(y) Bx + In(A) I is wry. Least squares fit or Trendline or Functions or Linear regression data analysis to Captions A 1. Four data points can be modeled by an exponential model. 2. The exponential model can be optimized using a least squares fit or trendline, which were covered elsewhere. 3. Linearization converts the model into a linear form. In this case, linearized x is still x while linearized y is in y. 4. Finally, fits can be completed by at least four methods covered in this section and other sections. For example, functions are used in cells E1 and E2. PARTICIPATION ACTIVITY 1.43.10: Visualizing a linearized model. Start 2x speed 100 80 60- 40- 20 y=6.9 exp(0.35x) Linearization Feedback? y = 0.35'x' +1.9 Non-linear model Linearized model: y= A exp(Bx) In(y) Bx+In(A) 0 2 4 8 0 2 4 x x Captions ^ 1. Four data points are plotted and a non-linear model fits the data. 2. Linearization alters only the y-axis for this model. The data points are replotted and a linearized model fits the data. Note: Parameter A converts as In(6.9)=1.9 PARTICIPATION ACTIVITY 1.43.11: Match the linearized equations. If unable to drag and drop, refresh the page y = e(4+4) y = A. eBz A.x y = B+x Feedback? y = A. B In(y) =Bx+ In(A) In(y) B. In(x) + In(A) 1 B 1 1 = =+ y A I A In(y) = B + A Reset CHALLENGE ACTIVITY 1.43.2: More and more error and statistics calculations. 584450.1699390 qv3zay? Start Converts a non-linear model into a linear model. Choose the correct term. Check Next How was this 10 section? 91 Provide section feedback Feedback? 4 Feedback? 1.44 Spreadsheet resources

MATLAB: An Introduction with Applications
6th Edition
ISBN:9781119256830
Author:Amos Gilat
Publisher:Amos Gilat
Chapter1: Starting With Matlab
Section: Chapter Questions
Problem 1P
icon
Related questions
Question
1.42 Solver and least squares fits
1.43 Error and statistics
The objectives of this section are to: 1. Create error bars within a chart in a
spreadsheet. 2. Apply statistics-related functions for a data series. 3. Calculate and
distinguish between standard deviation and quartiles for a data series. 4.
Complete a linear regression within a spreadsheet using functions or built-in
regression tools.
Error and error bars
An error is commonly known as a mistake, but in a spreadsheet, an error usually
quantifies a statistical difference. Ex: A residual is the difference between the data
and model, which is detailed in another section, and is one example of a
quantifiable error. When plotting data and models in charts, error bars provide one
method to visualize quantified error. The animation below shows several types of
error bars.
PARTICIPATION
ACTIVITY
1.43.1: Types of error bars.
Start 2x speed
100
y-axis
error bar
Property
80 with cap
Error value
y-axis
60-
error bar
without cap
x and y
axis
Fixed value
1.0
error
Percentage
1.5
bars
40
Standard deviation
1.0
x-axis
y-axis
error
error
20
bars
bars
HAH
0
2
6
8
x
Captions ^
1. A data point is plotted (blue triangle) and error bars are added as lines
extending from the point, with or without a cap.
2. A property window allows formatting of error bars by entering values or
selecting values in cells. Increasing the percentage makes larger error
bars.
3. Error bars may be used on the x or y-axis only, or both.
PARTICIPATION
ACTIVITY
1.43.2: Basics of error bars.
1) Error bars demonstrate the
magnitude of the error for
data points within a chart.
○ True
O False
2) The magnitude of error bars
can only be a percentage of
the magnitude of the data
point.
True
O False
3) Error bars are only used on
scatter charts.
O True
O False
Some statistics
Feedback?
Feedback?
When collecting data, multiple measurements are made to quantify the error of the
measurement. First, important quantities related to sets of data are presented,
followed by definitions and animations related to distributions.
Table 1.43.1: Summarizing data terms.
Definition
Sample calculation
using data set [2.4, 2.9,
3.3, 2.7,2.6]
An average, or mean, sums the data and divides by =(2.4+2.9 +3.3 +2.7
the number of data points.
A median is the data point separating the larger
half from the smaller half of a data series. When a
series contains an even number of data points, the
average of the two data points separating the
larger and smaller halves can be used.
+2.6)/5 = 2.78
Sorting from smallest
to largest-2.4, 2.6, 2.7,
2.9, 3.3-finds the
median equals 2.7.
A maximum is the largest data point of the data
3.3
series.
A minimum is the smallest data point of the data
2.4
series.
Feedback?
Table 1.43.2: Data terms to spreadsheet functions.
The cell addresses and data set are: A2-2.4, A3-2.9, A4-3.3, A5-2.7, A6-2.6.
Manual spreadsheet
actions
Spreadsheet
Term
function
AVERAGE(cell
Average
=(A2+A3+A4+A5+A6)/5
group)
Spreadsheet
calculation using
function
=AVERAGE(A2:A6)
MEDIAN(cell
Median
group)
Maximum
MAX(cell
group)
Sort data/cells from
smallest to largest,
identify data/cell dividing -MEDIAN(A2:A6)
larger and smaller halves
of data series.
Sort data/cells from
largest to smallest, first
cell is maximum.
Sort data/cells from
smallest to largest, first
=MAX(A2:A6)
=MIN(A2:A6)
MIN(cell
Minimum
group)
cell is minimum.
Feedback?
A number of methods for quantifying error exist and are covered in a statistics
course. Quantifying error usually begins by identifying the type of distribution
found in the data. A distribution is the mathematical representation of a data
series. While many types of distributions are covered in a statistics course, a
normal distribution and quartiles are covered here.
A normal distribution represents data symmetrically spread around the average. A
normal distribution is also known as a bell curve or Gaussian distribution. Ex:
Many natural things fit a normal distribution, including humans' height or
snowflakes' size. Standard deviation is one common method for quantifying error.
The definition, spreadsheet function, and use in a chart are demonstrated below.
Table 1.43.3: Quantifying error terms.
Definition
Equation
Sample
calculation using
data set (2.4, 2.9,
3.3, 2.7, 2.6]
A standard
deviation (sd) is a
measure of the
variation within a
data series and
returns a value
with the same
units as the data
sd =
(-average)
N-1
where N is the number of data
=(((2.4-2.78)^2
+(2.9-2.78)^2+
(3.3-2.78)^2+
(2.7-2.78)^2+
points and i represents each data
(2.6-2.78)^2
point in the series.
)/4)^0.5 0.34
series.
Feedback?
When a data series does not follow a normal distribution, quartiles can represent
data in a chart. A quartile divides a data series into four equal parts. The first
quartile divides the smallest 25% of the data series from the rest. A second
quartile divides the data series into equal parts, which is also the median. The third
quartile divides the smallest 75% of the data series from the rest. Ex: Exam scores
do not fit a normal distribution in many cases, so reporting first quartile, median,
and third quartile scores provide a method to compare one student's exam to an
entire class. A box-whisker chart is a type of bar chart where a data series is
bounded with error bars for the maximum and minimum, with a box bounding the
first quartile and third quartile. The box is divided by the median, and the box
encompasses the middle 50% of the data within the series. Spreadsheet functions
and animations expand on the definitions of standard deviation and quartiles
below.
Table 1.43.4: Quantifying error terms using spreadsheet
functions.
The data set and cell addresses are: A2-2.4, A3-2.9, A4-3.3, A5-2.7, A6-2.6.
Value returned by
spreadsheet
Spreadsheet
Term
function
Standard
deviation
1st
Quartile
3rd
Quartile
Spreadsheet formula
using function
STDEV(cell group) =STDEV(A2:A6)
QUARTILE(cell
group, quartile
number)
QUARTILE(cell
group, quartile
number)
0.34
=QUARTILE(A2 A6, 1) 2.6
=QUARTILE(A2:A6, 3) 2.9
PARTICIPATION
ACTIVITY
1.43.3: Visualizing a normal distribution.
Start
2x speed
0.4
0.3-
Normal distribution
Scatter chart
Average = 0
sd=1
1 sd (68.2%)
0.2-
0.1
-4
-2
0
2
x
Minimum
Medan
1st quartile
3rd quartile,
(50%)
2 sd (95.4%)
formula
Box-whisker
chart
Feedback?
Captions A
distribution is plotted with an average of 0 and standard
1. A normal
deviation of 1.
2. One standard deviation accounts for ~68% of the distribution while two
SD covers over 95%.
3. A box-whisker chart also represents a distribution with 50% of the data
accounted for in the box, divided by the median.
PARTICIPATION
ACTIVITY
1.43.4: Visualizing standard deviation and quartiles.
Start
10
Measurement (units)
2x speed
OOOOOO
H
Raw Average ±
Box-
data
whisker
standard
deviation
Data representation
Feedback?
Raw, unsorted data
6.5
4.5
7.1
8.3
7.8
7.3
9.4
Captions A
1. Seven data points are recorded and will be plotted three ways.
2. First, the raw data are plotted as individual data points (circles).
3. Next, the average (triangle) with error bars of one standard deviation are
plotted
4. Finally, a box shows 1st, 2nd, and 3rd quartiles and minimum/maximum
with error bars (whiskers).
PARTICIPATION
ACTIVITY
1.43.5: Distributions.
1) The median and average are
always the same number.
O True
O False
2) The second quartile is also
the
O average
O median
O standard deviation
3) A data series with 100 data
points follows a normal
distribution. The number of
data points within one
standard deviation of the
average is
O 68
95
99
4) The median divides the two
boxes in a box-whisker chart.
The percentage of data
points accounted for within
either box is
50
75
=zyBooks ▾
AUTIVITY
Feedback?
Feedback?
584450.1699390 qay?
Jump to level 1
Enter a formula in Cell D3 for the average of only the data point:
average in Cell D4.
A
1
Data
2
64
3
73
4
68
5
89
6
55
7
63
8
37
9
36
10
14
Copy sheet
Check
Next
B
C
D
=AVERAG
71
X Each incorrect answer is highlighted.
Expected: D3-AVERAGE(A2:A10) and D4-55.4
An average sums the data points and divides by the number of data point:
using the AVERAGE spreadsheet function.
>-0-0-0-0-
Feedback?
Linear regression
A linear regression is a model that fits data with a slope and intercept from a linear
model, such as y=mx+ Linear regression can be done for multip dependent
(x) variables, which is covered elsewhere by following the link in the Exploring
Further section below. Linear regression can be completed using spreadsheet
functions, as shown in the table below, or using a regression analysis tool, as
shown in the animations below. Therefore, linear regression is a another method
that can be applied within a spreadsheet to model data with a linear model;
trendlines and least squares fits also create optimized linear models, which are
covered in other sections.
Table 1.43.5: Linear regression spreadsheet functions.
A data series with four data points will be used in the definitions below.
A
B
1 x data
y data
21
20
3 3
37
45
57
57
78
Spreadsheet Spreadsheet formula using
Term
function
function
Value returned
using
spreadsheet
above
Slope
SLOPE(y cell
range, x cell
range)
INTERCEPT(y
=SLOPE(B2:B5A2:A5)
9.7
Intercept cell range, x cell -INTERCEPT(B2:B5,A2:A5) 9.2
R-
squared
range)
RSQ (y cell
range, x cell
range)
=RSQ(B2:B5A2:A5)
0.998
Feedback?
A confidence interval provides the range which a value will fall between a lower
and upper bound. Ex: A 95% confidence interval predicts that 95% of values will fall
within the bounds. Spreadsheets have data analysis packages to perform linear
regression and provide 95% confidence intervals for all parameters. Ex: A slope of
5.3 may have a 95% confidence interval between 3.9 and 6.7. The animations
below demonstrate using linear regression and data analysis packages in a
spreadsheet and a chart.
PARTICIPATION
ACTIVITY
1.43.6: Entering values into a regression analysis tool.
Start
2x speed
Tx
A
B
C
D
1
x data
y data
2
1
20
3
3
37
4
5
57
5
7
78
Regression analysis
Input y range: B2:B5
Input x range: A2:A5
Output range: E1:H10
Other options
Residuals
Residual plots
OK
R-squared
Upper bound 95% confidence interval
Lower bound 95% confidence interval
Parameters
and many other statistical measures
Captions A
1. A spreadsheet is filled with x and y data for four data points.
ranges entered, analysis options selected, and OK is clicked.
3. Many statistical measures are outputted from the Regression analysis
2. Next, the regression analysis tool is located, selected, appropriate cell
tool, such as R-squared and optimized parameters. Results are shown in
another animation.
PARTICIPATION
ACTIVITY
1.43.7: Visualizing a confidence interval.
Start
2x speed
100
80
60-
40-
20
y=9.7x+9.2
R=0.998
Upper 95%: y 11.1x+15.6
Lower 95%: y = 8.3x+2.8
0
2
4
6
x
Captions A
Linear model: y = mx + b
Feedback?
Regression analysis tool
Optimized parameters and R-squared
and
upper and lower 95% confidence interval
with
other statistical measures available
1. Four data points are plotted and the regression analysis tool is used
within a spreadsheet.
2. First, the optimized line is plotted in green and the R-squared is 0.998.
Then the upper and lower 95% confidence interval lines are added.
PARTICIPATION
ACTIVITY
1.43.8: Linear regression.
1) A linear model is found to
have a slope of 97. The 95%
confidence interval is 97 ± 14,
and the 98% confidence
interval is 97 ± 5.
O True
○ False
2) A data set includes 35 data
points of pressure as a
function of temperature,
which is fit by a linear model:
P mT+b. Two engineers
find m and b differently. One
engineer uses SLOPE and
INTERCEPT spreadsheet
functions, while another
engineer uses the linear
regression data analysis tool.
Both engineers find the same
values for m and b.
○ True
O False
3) Vivian and Alexander have a
data set with pressure,
temperature, and volume.
They can use the data
analysis tool within their
spreadsheet to find a 95%
confidence interval for a
linear model.
○ True
O False
Feedback?
Feedback?
Linearizing an equation
Linear regression can also be applied to non-linear models using a process called
linearization. A linearization converts a non-linear model into a linear model. While
many models cannot be linearized, some guidelines can direct the linearization
process. The equation should be separable, so y and x expressions can be placed
on different sides of the equation. The linear model of y = mx +b with two
constants still applies, but now expands to account for a function of y instead of y,
and a function of x instead of x. The animations below demonstrate the
linearization process in both a spreadsheet and chart.
PARTICIPATION
ACTIVITY
1.43.9: Linearization of an exponential model.
Start
2x speed
fx
A
1
A=
B
2
C
D
E
From
6.9
least
B= 0.35
3
x
y
squares
In(A)= 1.9 INTERCEPT
B= 0.35 SLOPE(E4:E
linearized x linearized y
4
1
10
1
2.30
5
4
29
4
3.35
6
6
55
6
4.05
7
7
84
7
4.40
Exponential model:
Linearized model:
y = A exp(Bx)
Fit using:
Least squares fit
ог
Trendline
Linearization
In(y) Bx + In(A)
I is wry.
Least squares fit or
Trendline or
Functions or
Linear regression data analysis to
Captions A
1. Four data points can be modeled by an exponential model.
2. The exponential model can be optimized using a least squares fit or
trendline, which were covered elsewhere.
3. Linearization converts the model into a linear form. In this case,
linearized x is still x while linearized y is in y.
4. Finally, fits can be completed by at least four methods covered in this
section and other sections. For example, functions are used in cells E1
and E2.
PARTICIPATION
ACTIVITY
1.43.10: Visualizing a linearized model.
Start
2x speed
100
80
60-
40-
20
y=6.9 exp(0.35x)
Linearization
Feedback?
y = 0.35'x' +1.9
Non-linear model
Linearized model:
y= A exp(Bx)
In(y) Bx+In(A)
0
2
4
8
0
2
4
x
x
Captions ^
1. Four data points are plotted and a non-linear model fits the data.
2. Linearization alters only the y-axis for this model. The data points are
replotted and a linearized model fits the data. Note: Parameter A
converts as In(6.9)=1.9
PARTICIPATION
ACTIVITY
1.43.11: Match the linearized equations.
If unable to drag and drop, refresh the page
y =
e(4+4)
y = A. eBz
A.x
y =
B+x
Feedback?
y = A. B
In(y) =Bx+ In(A)
In(y) B. In(x) + In(A)
1
B 1
1
=
=+
y
A I
A
In(y) =
B
+ A
Reset
CHALLENGE
ACTIVITY
1.43.2: More and more error and statistics calculations.
584450.1699390 qv3zay?
Start
Converts a non-linear model into a linear model.
Choose the correct term.
Check
Next
How was
this
10
section?
91
Provide section feedback
Feedback?
4
Feedback?
1.44 Spreadsheet resources
Transcribed Image Text:1.42 Solver and least squares fits 1.43 Error and statistics The objectives of this section are to: 1. Create error bars within a chart in a spreadsheet. 2. Apply statistics-related functions for a data series. 3. Calculate and distinguish between standard deviation and quartiles for a data series. 4. Complete a linear regression within a spreadsheet using functions or built-in regression tools. Error and error bars An error is commonly known as a mistake, but in a spreadsheet, an error usually quantifies a statistical difference. Ex: A residual is the difference between the data and model, which is detailed in another section, and is one example of a quantifiable error. When plotting data and models in charts, error bars provide one method to visualize quantified error. The animation below shows several types of error bars. PARTICIPATION ACTIVITY 1.43.1: Types of error bars. Start 2x speed 100 y-axis error bar Property 80 with cap Error value y-axis 60- error bar without cap x and y axis Fixed value 1.0 error Percentage 1.5 bars 40 Standard deviation 1.0 x-axis y-axis error error 20 bars bars HAH 0 2 6 8 x Captions ^ 1. A data point is plotted (blue triangle) and error bars are added as lines extending from the point, with or without a cap. 2. A property window allows formatting of error bars by entering values or selecting values in cells. Increasing the percentage makes larger error bars. 3. Error bars may be used on the x or y-axis only, or both. PARTICIPATION ACTIVITY 1.43.2: Basics of error bars. 1) Error bars demonstrate the magnitude of the error for data points within a chart. ○ True O False 2) The magnitude of error bars can only be a percentage of the magnitude of the data point. True O False 3) Error bars are only used on scatter charts. O True O False Some statistics Feedback? Feedback? When collecting data, multiple measurements are made to quantify the error of the measurement. First, important quantities related to sets of data are presented, followed by definitions and animations related to distributions. Table 1.43.1: Summarizing data terms. Definition Sample calculation using data set [2.4, 2.9, 3.3, 2.7,2.6] An average, or mean, sums the data and divides by =(2.4+2.9 +3.3 +2.7 the number of data points. A median is the data point separating the larger half from the smaller half of a data series. When a series contains an even number of data points, the average of the two data points separating the larger and smaller halves can be used. +2.6)/5 = 2.78 Sorting from smallest to largest-2.4, 2.6, 2.7, 2.9, 3.3-finds the median equals 2.7. A maximum is the largest data point of the data 3.3 series. A minimum is the smallest data point of the data 2.4 series. Feedback? Table 1.43.2: Data terms to spreadsheet functions. The cell addresses and data set are: A2-2.4, A3-2.9, A4-3.3, A5-2.7, A6-2.6. Manual spreadsheet actions Spreadsheet Term function AVERAGE(cell Average =(A2+A3+A4+A5+A6)/5 group) Spreadsheet calculation using function =AVERAGE(A2:A6) MEDIAN(cell Median group) Maximum MAX(cell group) Sort data/cells from smallest to largest, identify data/cell dividing -MEDIAN(A2:A6) larger and smaller halves of data series. Sort data/cells from largest to smallest, first cell is maximum. Sort data/cells from smallest to largest, first =MAX(A2:A6) =MIN(A2:A6) MIN(cell Minimum group) cell is minimum. Feedback? A number of methods for quantifying error exist and are covered in a statistics course. Quantifying error usually begins by identifying the type of distribution found in the data. A distribution is the mathematical representation of a data series. While many types of distributions are covered in a statistics course, a normal distribution and quartiles are covered here. A normal distribution represents data symmetrically spread around the average. A normal distribution is also known as a bell curve or Gaussian distribution. Ex: Many natural things fit a normal distribution, including humans' height or snowflakes' size. Standard deviation is one common method for quantifying error. The definition, spreadsheet function, and use in a chart are demonstrated below. Table 1.43.3: Quantifying error terms. Definition Equation Sample calculation using data set (2.4, 2.9, 3.3, 2.7, 2.6] A standard deviation (sd) is a measure of the variation within a data series and returns a value with the same units as the data sd = (-average) N-1 where N is the number of data =(((2.4-2.78)^2 +(2.9-2.78)^2+ (3.3-2.78)^2+ (2.7-2.78)^2+ points and i represents each data (2.6-2.78)^2 point in the series. )/4)^0.5 0.34 series. Feedback? When a data series does not follow a normal distribution, quartiles can represent data in a chart. A quartile divides a data series into four equal parts. The first quartile divides the smallest 25% of the data series from the rest. A second quartile divides the data series into equal parts, which is also the median. The third quartile divides the smallest 75% of the data series from the rest. Ex: Exam scores do not fit a normal distribution in many cases, so reporting first quartile, median, and third quartile scores provide a method to compare one student's exam to an entire class. A box-whisker chart is a type of bar chart where a data series is bounded with error bars for the maximum and minimum, with a box bounding the first quartile and third quartile. The box is divided by the median, and the box encompasses the middle 50% of the data within the series. Spreadsheet functions and animations expand on the definitions of standard deviation and quartiles below. Table 1.43.4: Quantifying error terms using spreadsheet functions. The data set and cell addresses are: A2-2.4, A3-2.9, A4-3.3, A5-2.7, A6-2.6. Value returned by spreadsheet Spreadsheet Term function Standard deviation 1st Quartile 3rd Quartile Spreadsheet formula using function STDEV(cell group) =STDEV(A2:A6) QUARTILE(cell group, quartile number) QUARTILE(cell group, quartile number) 0.34 =QUARTILE(A2 A6, 1) 2.6 =QUARTILE(A2:A6, 3) 2.9 PARTICIPATION ACTIVITY 1.43.3: Visualizing a normal distribution. Start 2x speed 0.4 0.3- Normal distribution Scatter chart Average = 0 sd=1 1 sd (68.2%) 0.2- 0.1 -4 -2 0 2 x Minimum Medan 1st quartile 3rd quartile, (50%) 2 sd (95.4%) formula Box-whisker chart Feedback? Captions A distribution is plotted with an average of 0 and standard 1. A normal deviation of 1. 2. One standard deviation accounts for ~68% of the distribution while two SD covers over 95%. 3. A box-whisker chart also represents a distribution with 50% of the data accounted for in the box, divided by the median. PARTICIPATION ACTIVITY 1.43.4: Visualizing standard deviation and quartiles. Start 10 Measurement (units) 2x speed OOOOOO H Raw Average ± Box- data whisker standard deviation Data representation Feedback? Raw, unsorted data 6.5 4.5 7.1 8.3 7.8 7.3 9.4 Captions A 1. Seven data points are recorded and will be plotted three ways. 2. First, the raw data are plotted as individual data points (circles). 3. Next, the average (triangle) with error bars of one standard deviation are plotted 4. Finally, a box shows 1st, 2nd, and 3rd quartiles and minimum/maximum with error bars (whiskers). PARTICIPATION ACTIVITY 1.43.5: Distributions. 1) The median and average are always the same number. O True O False 2) The second quartile is also the O average O median O standard deviation 3) A data series with 100 data points follows a normal distribution. The number of data points within one standard deviation of the average is O 68 95 99 4) The median divides the two boxes in a box-whisker chart. The percentage of data points accounted for within either box is 50 75 =zyBooks ▾ AUTIVITY Feedback? Feedback? 584450.1699390 qay? Jump to level 1 Enter a formula in Cell D3 for the average of only the data point: average in Cell D4. A 1 Data 2 64 3 73 4 68 5 89 6 55 7 63 8 37 9 36 10 14 Copy sheet Check Next B C D =AVERAG 71 X Each incorrect answer is highlighted. Expected: D3-AVERAGE(A2:A10) and D4-55.4 An average sums the data points and divides by the number of data point: using the AVERAGE spreadsheet function. >-0-0-0-0- Feedback? Linear regression A linear regression is a model that fits data with a slope and intercept from a linear model, such as y=mx+ Linear regression can be done for multip dependent (x) variables, which is covered elsewhere by following the link in the Exploring Further section below. Linear regression can be completed using spreadsheet functions, as shown in the table below, or using a regression analysis tool, as shown in the animations below. Therefore, linear regression is a another method that can be applied within a spreadsheet to model data with a linear model; trendlines and least squares fits also create optimized linear models, which are covered in other sections. Table 1.43.5: Linear regression spreadsheet functions. A data series with four data points will be used in the definitions below. A B 1 x data y data 21 20 3 3 37 45 57 57 78 Spreadsheet Spreadsheet formula using Term function function Value returned using spreadsheet above Slope SLOPE(y cell range, x cell range) INTERCEPT(y =SLOPE(B2:B5A2:A5) 9.7 Intercept cell range, x cell -INTERCEPT(B2:B5,A2:A5) 9.2 R- squared range) RSQ (y cell range, x cell range) =RSQ(B2:B5A2:A5) 0.998 Feedback? A confidence interval provides the range which a value will fall between a lower and upper bound. Ex: A 95% confidence interval predicts that 95% of values will fall within the bounds. Spreadsheets have data analysis packages to perform linear regression and provide 95% confidence intervals for all parameters. Ex: A slope of 5.3 may have a 95% confidence interval between 3.9 and 6.7. The animations below demonstrate using linear regression and data analysis packages in a spreadsheet and a chart. PARTICIPATION ACTIVITY 1.43.6: Entering values into a regression analysis tool. Start 2x speed Tx A B C D 1 x data y data 2 1 20 3 3 37 4 5 57 5 7 78 Regression analysis Input y range: B2:B5 Input x range: A2:A5 Output range: E1:H10 Other options Residuals Residual plots OK R-squared Upper bound 95% confidence interval Lower bound 95% confidence interval Parameters and many other statistical measures Captions A 1. A spreadsheet is filled with x and y data for four data points. ranges entered, analysis options selected, and OK is clicked. 3. Many statistical measures are outputted from the Regression analysis 2. Next, the regression analysis tool is located, selected, appropriate cell tool, such as R-squared and optimized parameters. Results are shown in another animation. PARTICIPATION ACTIVITY 1.43.7: Visualizing a confidence interval. Start 2x speed 100 80 60- 40- 20 y=9.7x+9.2 R=0.998 Upper 95%: y 11.1x+15.6 Lower 95%: y = 8.3x+2.8 0 2 4 6 x Captions A Linear model: y = mx + b Feedback? Regression analysis tool Optimized parameters and R-squared and upper and lower 95% confidence interval with other statistical measures available 1. Four data points are plotted and the regression analysis tool is used within a spreadsheet. 2. First, the optimized line is plotted in green and the R-squared is 0.998. Then the upper and lower 95% confidence interval lines are added. PARTICIPATION ACTIVITY 1.43.8: Linear regression. 1) A linear model is found to have a slope of 97. The 95% confidence interval is 97 ± 14, and the 98% confidence interval is 97 ± 5. O True ○ False 2) A data set includes 35 data points of pressure as a function of temperature, which is fit by a linear model: P mT+b. Two engineers find m and b differently. One engineer uses SLOPE and INTERCEPT spreadsheet functions, while another engineer uses the linear regression data analysis tool. Both engineers find the same values for m and b. ○ True O False 3) Vivian and Alexander have a data set with pressure, temperature, and volume. They can use the data analysis tool within their spreadsheet to find a 95% confidence interval for a linear model. ○ True O False Feedback? Feedback? Linearizing an equation Linear regression can also be applied to non-linear models using a process called linearization. A linearization converts a non-linear model into a linear model. While many models cannot be linearized, some guidelines can direct the linearization process. The equation should be separable, so y and x expressions can be placed on different sides of the equation. The linear model of y = mx +b with two constants still applies, but now expands to account for a function of y instead of y, and a function of x instead of x. The animations below demonstrate the linearization process in both a spreadsheet and chart. PARTICIPATION ACTIVITY 1.43.9: Linearization of an exponential model. Start 2x speed fx A 1 A= B 2 C D E From 6.9 least B= 0.35 3 x y squares In(A)= 1.9 INTERCEPT B= 0.35 SLOPE(E4:E linearized x linearized y 4 1 10 1 2.30 5 4 29 4 3.35 6 6 55 6 4.05 7 7 84 7 4.40 Exponential model: Linearized model: y = A exp(Bx) Fit using: Least squares fit ог Trendline Linearization In(y) Bx + In(A) I is wry. Least squares fit or Trendline or Functions or Linear regression data analysis to Captions A 1. Four data points can be modeled by an exponential model. 2. The exponential model can be optimized using a least squares fit or trendline, which were covered elsewhere. 3. Linearization converts the model into a linear form. In this case, linearized x is still x while linearized y is in y. 4. Finally, fits can be completed by at least four methods covered in this section and other sections. For example, functions are used in cells E1 and E2. PARTICIPATION ACTIVITY 1.43.10: Visualizing a linearized model. Start 2x speed 100 80 60- 40- 20 y=6.9 exp(0.35x) Linearization Feedback? y = 0.35'x' +1.9 Non-linear model Linearized model: y= A exp(Bx) In(y) Bx+In(A) 0 2 4 8 0 2 4 x x Captions ^ 1. Four data points are plotted and a non-linear model fits the data. 2. Linearization alters only the y-axis for this model. The data points are replotted and a linearized model fits the data. Note: Parameter A converts as In(6.9)=1.9 PARTICIPATION ACTIVITY 1.43.11: Match the linearized equations. If unable to drag and drop, refresh the page y = e(4+4) y = A. eBz A.x y = B+x Feedback? y = A. B In(y) =Bx+ In(A) In(y) B. In(x) + In(A) 1 B 1 1 = =+ y A I A In(y) = B + A Reset CHALLENGE ACTIVITY 1.43.2: More and more error and statistics calculations. 584450.1699390 qv3zay? Start Converts a non-linear model into a linear model. Choose the correct term. Check Next How was this 10 section? 91 Provide section feedback Feedback? 4 Feedback? 1.44 Spreadsheet resources
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Similar questions
Recommended textbooks for you
MATLAB: An Introduction with Applications
MATLAB: An Introduction with Applications
Statistics
ISBN:
9781119256830
Author:
Amos Gilat
Publisher:
John Wiley & Sons Inc
Probability and Statistics for Engineering and th…
Probability and Statistics for Engineering and th…
Statistics
ISBN:
9781305251809
Author:
Jay L. Devore
Publisher:
Cengage Learning
Statistics for The Behavioral Sciences (MindTap C…
Statistics for The Behavioral Sciences (MindTap C…
Statistics
ISBN:
9781305504912
Author:
Frederick J Gravetter, Larry B. Wallnau
Publisher:
Cengage Learning
Elementary Statistics: Picturing the World (7th E…
Elementary Statistics: Picturing the World (7th E…
Statistics
ISBN:
9780134683416
Author:
Ron Larson, Betsy Farber
Publisher:
PEARSON
The Basic Practice of Statistics
The Basic Practice of Statistics
Statistics
ISBN:
9781319042578
Author:
David S. Moore, William I. Notz, Michael A. Fligner
Publisher:
W. H. Freeman
Introduction to the Practice of Statistics
Introduction to the Practice of Statistics
Statistics
ISBN:
9781319013387
Author:
David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:
W. H. Freeman