BSTAT smartbook
pdf
keyboard_arrow_up
School
University of Texas, Arlington *
*We aren’t endorsed by this school
Course
2305
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
11
Uploaded by ColonelLyrebird2080
Chapter 1 Sta+s+cs . Science that deals with the collec+on, prepara+on, analysis, interpreta+on & presenta+on of data DESCRTIPTIVE STATISTICS . Branch of sta+s+cs that summarizes important aspects of data set INFERENTIAL STATISTICS . Branch of sta+s+cs that draws conclusions about a large set of data based on a smaller set of data Samples are primarily used to make inferences about popula+on parameters A sampled is a (measured) subset of a popula+on Sampling rather than surveying an en+re popula+on can offer some substan+al benefits saving $ and +me Everyday consumers & businesses use data from various sources to help make decisions Cross Sec+onal data . Data that’s collected about many subjects at same point in +me or without regard to differences in +me Company wants to es+mate mean price of oil over the past 10 years what type of data does the company need . TIME SERIES DATA A sales invoice is STRUCTURED data Time series data can include -
Hourly -
Daily -
Weekly -
Monthly -
Quarterly -
Annual observa+ons Structured data reside in a predefined row-column format A con+nuous variable is characterized by uncountable values within an interval
Nominal Ordinal Interval Ra+o Variety . Data comes in all types, forms, granularity both structured/unstructured A quali+ve variable is known as a categorial variable Quali+ve variable Described with labels/names than numerically Nominal . Observa+ons differ merely by name/label Ordinal . Observa+ons can be categorized & ranked however differences between ranked observa+ons are meaningless Interval . Observa+ons can be organized & ranked & differences between observa+ons are meaningful Ra+o . Observa+ons have all of the characteris+cs of interval-scaled data as well as a true 0 point Veracity . Credibility/Quality of data Ra+ng products from 1-5 stars generates ordinal data Variable . A characteris+c of interest that differs among various observa+ons Nominal sale . Least sophis+cated level of measurement Sor+ng data allows us to review range of values for each variable
Chapter 2 Describes frequency distribu+on for qualita+ve data? It groups data into categories & records # of observa+ons in each category A frequency distribu+on is a way to organize qualita+ve data into categories & record # observa+ons in each category 1 method of graphical presenta+on for qualita+ve data is a bar chart A bar chart is a useful graphical tool for qualita+ve data A pie chart is a segmented circle whose segments portray rela+ve frequencies of categories of some qualita+ve variable Rela+ve frequency distribu+ons are generally more useful than frequency distribu+ons when comparing data sets of different sizes Stacked column charts help us summarize rela+onship between 2 categorical variables Con+ngency table shows frequencies for 2 categorical variables Bar chart is some+mes referred to as a column chart Info in con+ngency table can be shown graphically using a stacked column chart Pie chart is a segmented circle whose segments add up to 360 degrees In a frequency distribu+on for a numerical variable intervals are exhaus+ve Cumula+ve rela+ve frequency for a par+cular interval indicates the propor+on of observa+ons that falls below upper limit of that par+cular interval In a frequency distribu+on for a numerical variable intervals are mutually exclusive Stacked column charts help us summarize rela+onship between 2 categorical variables Con+ngency table shows frequency for 2 categorical values Histogram is best used to display rela+ve frequency of grouped, quan+ta+ve data In a given cumula+ve frequency distribu+on, "cumula+ve frequency" column value for 3
rd
class represents - Sum of observa+ons in the 1st,2nd,3rd classes
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
When construc+ng classes for a frequency distribu+on for quan+ta+ve data, which of the following statements is LEAST accurate? - The # of classes should = # of observa+ons Generally for a frequency distribu+on width of each interval is the 20 for each interval Cumula+ve rela+ve frequency distribu+on for quan+ta+ve data iden+fies propor+on of observa+ons that fall below upper limit of each class When construc+ng a histogram, what values/labels go on horizontal (x) axis and ver+cal (y) axes? - Quan+ta+ve class limits on horizontal axis - Frequency or rela+ve frequency on ver+cal axis When a researcher examines quan+ta+ve data & wants to know # of observa+ons that fall below upper limit of a par+cular class, researcher is BEST served by crea+ng a - Cumula+ve frequency distribu+on Polygon gives a general idea of SHAPE of a distribu+on Rela+ve frequency distribu+on for quan+ta+ve data iden+fies propor+on of observa+ons that occur in each class Histogram is a series of rectangles where width & height of each rectangle represent interval width & frequency of respec+ve interval An ogive connects a series of neighboring points where each point represents upper limit of a par+cular interval & its associated cumula+ve frequency or cumula+ve rela+ve frequency Scagerplot is a graphical tool that helps in determining whether or not two numerical variables are related in some systema+c way Polygon connects a series of neighboring points where each point represents midpoint of a par+cular class & its associated frequency or rela+ve frequency Scagerplot with a categorical variable needs 3 variables Which of the following graphical depic+ons displays cumula+ve data? Ogive Line chart with 3 lines requires 4 variables With a scagerplot, if we have a 3
rd
variable in data set, we can incorporate this categorical variable within scagerplot by using different colors
An ogive is a graph that plots cumula+ve frequency, or cumula+ve rela+ve frequency, against upper limit of the corresponding class A line chart tends to be used to track changes of the variable over +me.
Chapter 3 Central Location . Relates to the way data tend to cluster around some middle/central value TRUE Mean is the most likely used measure of central location for quantitiave data N . Population size Notation μ represents . Population mean Data sets would the arithmetic mean NOT
be a good measure of central location? 7,8,8,9,25 Variance is NOT a measure of central location We refer to arithmetic mean as simply the mean or average μ . Population mean Measure of central location that can
BEST
be labeled as midpoint of data set is the MEDIAN
Notation
x
represents Sample mean
Arithmetic mean is usually
NOT
a good measure of central location if a outlier exists
In a neighborhood there are =ive houses listed for sale for the following amounts: $250,000;
$275,000; $280,000; $295,000; and $515,000. What is the
BEST
measure of central
location for the price of a house in the neighborhood?
. MEDIAN
Most widely-used measure of central location?
Mean
Measure of central location where half the values of the data set lie above this measure &
half the values of data set lie below this measure is known as Median
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
When there are an
odd
number of observations & observations are in order from smallest
to largest, the median is Middle observation
If a variable has 2/more modes, then we say it’s Multimodal
Median is the best measure of central location when outliners are present
An owner of a grocery store wants to determine brands of soda that customers purchase at
store. When summarizing data about soda brand purchases, meaningful measure of central
location is the mode
Measure of central location that can
BEST
be labeled as the midpoint of the data set is the
Median
When there are an
even
number of observations, and the observations are in order from
smallest to largest, the median is the average of the 2 middle observations
Mode is a measure of central location that is the most frequently occurring value in data set
Mean is usually greater than median when data are positively skewed When summarizing a qualitative data set, mode is best measure of central location
The 1
st
step to determine median is to place data in numerical order
The mode’s usefulness as a measure of central location tends to diminish with variables
that have more than 3 modes
A skewness coefficient of 0 indicates observations are evenly distributed on both sides of
the mean.
An owner of a grocery store wants to determine the brands of soda that customers
purchase at the store. When summarizing the data about soda brand purchases, the
meaningful measure of central location is the mode
The function to find the mean of a subset in R is tapply When a mean is calculated and some observations are given greater importance than
others, we refer to this measure of central location as a weighted mean
The mode is de=ined as the most frequently occurring value of a data set.
A boxplot is a visual representation of particular percentiles
Function to find the mean of a subset in Excel is AVERAGEIF(range, criteria, [average_range])
The formula for the weighted mean is
x
=Σw
i
x
i
. Using this formula, what is the restrictions
on the weights.
They must sum to 1
Henry's score on an accounting exam placed him in the 85
th
percentile in the class. What
percentage of students scored higher than Henry?
15%
Median is a measure of central location that divides the observations for a variable in half.
When calculating a percentile, the =irst step is to arrange the data set in ascending order
(from least to greatest)
Q1
Q1
25
th
percentile.
Q2
Q2.
50
th
percentile.
Q3
Q3
75
th
percentile.
Q4
Q4
100
th
percentile.
The p
th
percentile divides a variable into two parts. What percentage is greater than p?
(100-p)
Which of the following are included in a =ive-number summary?
Q3
Maximum value
Minimum value
Q2
Q1
Quartiles divide the data into 4 equal parts
A box-and-whisker plot is another name for a boxplot Which of the following values is included in a box plot?
2
nd
quartile
1
st
quartile
When a box plot is constructed, an outlier is a data point that is farther than
1.5×IQR from either Q1 or Q3
If the median price for a home is $200,000, then 50% of the homes cost less than $200,000.
The interquartile range (IQR) of a data set is the difference between 1
st
& 3
rd
quartiles.
In a box plot, if median is in the center of the box and the left and right whiskers are
equidistant from their respective quartiles, then the distribution is Symmetric
The
p
th percentile divides a variable into two parts. What % is less than the
p
th percentile?
approximately
p
percent
Geometric mean is the multiplicative average of a data set.
The geometric mean return accurately captures a negative annual return from an
investment.
The appropriate measure for evaluating investment returns over several years is the
geometric mean
Which of the following statements regarding geometric mean is
MOST
accurate?
Geometric mean is less sensitive to extreme values than arithmetic mean.
We interpret geometric mean return as the annual return that you will earn from
an investment
Diane wants to calculate her average annual return on an investment that she made three
years ago. What value will provide Diane with the most accurate measure of the average
annual return?
Geometric mean
Chebyshev’s theorem is applicable when the data have any shape.
What is the relationship between the variance and the standard deviation?
The standard deviation is the positive square root of the variance.
When calculating average growth rates, we apply the formula for the..
Geometric mean
The Geometric mean is the multiplicative average of a data set.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The geometric mean is the appropriate measure to analyze multi-year investments.
True
What is the symbol for the average growth rate?
G
g
In a box plot, if the median is right of center and the left whisker is longer than the right
whisker, then the distribution is...
Negatively skewed
The range is not considered a good measure of dispersion because it focuses solely on the
extreme values and ignores every other observation in the sample or the population.
You use the geometric mean when we calculate an average growth rate
Which of the following statements about the mean absolute deviation (MAD)
is
MOST
accurate?
MAD is denominated in the same units as the original data.
The range is the difference between largest & smallest values
When calculating average growth rates, we apply the formula for the geometric mean
Two widely used measures of dispersion are variance & standard deviation.
Chebyshev's theorem provides the proportion of observations that lie within
k
standard
deviations of the mean. The value
k
must be ______.
Greater than 1
Two widely used measures of dispersion are...
the variance and the standard deviation.
Converting observations into z-scores is also called standardizing the observations The notation
σ
2
represents population variance
The average of the absolute differences between values of the data set & the mean is the
mean absolute deviation
Which of the following statements about variance is
MOST
accurate?
Variance is the average of the squared deviations from the mean.
The empirical rule should be applied to data sets that are approximately bell-shaped.
True
Chebyshev’s theorem results in conservative bounds for the percentage of observations
falling in a particular interval.
What is the relationship between variance and standard deviation?
The standard deviation is the positive square root of the variance.
The average of the sum of squared differences from the mean is the population variance
The empirical rule is appropriate when distribution of a variable is symmetric and bell shaped
Related Documents
Recommended textbooks for you
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL