BSTAT smartbook

pdf

School

University of Texas, Arlington *

*We aren’t endorsed by this school

Course

2305

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

11

Uploaded by ColonelLyrebird2080

Report
Chapter 1 Sta+s+cs . Science that deals with the collec+on, prepara+on, analysis, interpreta+on & presenta+on of data DESCRTIPTIVE STATISTICS . Branch of sta+s+cs that summarizes important aspects of data set INFERENTIAL STATISTICS . Branch of sta+s+cs that draws conclusions about a large set of data based on a smaller set of data Samples are primarily used to make inferences about popula+on parameters A sampled is a (measured) subset of a popula+on Sampling rather than surveying an en+re popula+on can offer some substan+al benefits saving $ and +me Everyday consumers & businesses use data from various sources to help make decisions Cross Sec+onal data . Data that’s collected about many subjects at same point in +me or without regard to differences in +me Company wants to es+mate mean price of oil over the past 10 years what type of data does the company need . TIME SERIES DATA A sales invoice is STRUCTURED data Time series data can include - Hourly - Daily - Weekly - Monthly - Quarterly - Annual observa+ons Structured data reside in a predefined row-column format A con+nuous variable is characterized by uncountable values within an interval
Nominal Ordinal Interval Ra+o Variety . Data comes in all types, forms, granularity both structured/unstructured A quali+ve variable is known as a categorial variable Quali+ve variable Described with labels/names than numerically Nominal . Observa+ons differ merely by name/label Ordinal . Observa+ons can be categorized & ranked however differences between ranked observa+ons are meaningless Interval . Observa+ons can be organized & ranked & differences between observa+ons are meaningful Ra+o . Observa+ons have all of the characteris+cs of interval-scaled data as well as a true 0 point Veracity . Credibility/Quality of data Ra+ng products from 1-5 stars generates ordinal data Variable . A characteris+c of interest that differs among various observa+ons Nominal sale . Least sophis+cated level of measurement Sor+ng data allows us to review range of values for each variable
Chapter 2 Describes frequency distribu+on for qualita+ve data? It groups data into categories & records # of observa+ons in each category A frequency distribu+on is a way to organize qualita+ve data into categories & record # observa+ons in each category 1 method of graphical presenta+on for qualita+ve data is a bar chart A bar chart is a useful graphical tool for qualita+ve data A pie chart is a segmented circle whose segments portray rela+ve frequencies of categories of some qualita+ve variable Rela+ve frequency distribu+ons are generally more useful than frequency distribu+ons when comparing data sets of different sizes Stacked column charts help us summarize rela+onship between 2 categorical variables Con+ngency table shows frequencies for 2 categorical variables Bar chart is some+mes referred to as a column chart Info in con+ngency table can be shown graphically using a stacked column chart Pie chart is a segmented circle whose segments add up to 360 degrees In a frequency distribu+on for a numerical variable intervals are exhaus+ve Cumula+ve rela+ve frequency for a par+cular interval indicates the propor+on of observa+ons that falls below upper limit of that par+cular interval In a frequency distribu+on for a numerical variable intervals are mutually exclusive Stacked column charts help us summarize rela+onship between 2 categorical variables Con+ngency table shows frequency for 2 categorical values Histogram is best used to display rela+ve frequency of grouped, quan+ta+ve data In a given cumula+ve frequency distribu+on, "cumula+ve frequency" column value for 3 rd class represents - Sum of observa+ons in the 1st,2nd,3rd classes
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
When construc+ng classes for a frequency distribu+on for quan+ta+ve data, which of the following statements is LEAST accurate? - The # of classes should = # of observa+ons Generally for a frequency distribu+on width of each interval is the 20 for each interval Cumula+ve rela+ve frequency distribu+on for quan+ta+ve data iden+fies propor+on of observa+ons that fall below upper limit of each class When construc+ng a histogram, what values/labels go on horizontal (x) axis and ver+cal (y) axes? - Quan+ta+ve class limits on horizontal axis - Frequency or rela+ve frequency on ver+cal axis When a researcher examines quan+ta+ve data & wants to know # of observa+ons that fall below upper limit of a par+cular class, researcher is BEST served by crea+ng a - Cumula+ve frequency distribu+on Polygon gives a general idea of SHAPE of a distribu+on Rela+ve frequency distribu+on for quan+ta+ve data iden+fies propor+on of observa+ons that occur in each class Histogram is a series of rectangles where width & height of each rectangle represent interval width & frequency of respec+ve interval An ogive connects a series of neighboring points where each point represents upper limit of a par+cular interval & its associated cumula+ve frequency or cumula+ve rela+ve frequency Scagerplot is a graphical tool that helps in determining whether or not two numerical variables are related in some systema+c way Polygon connects a series of neighboring points where each point represents midpoint of a par+cular class & its associated frequency or rela+ve frequency Scagerplot with a categorical variable needs 3 variables Which of the following graphical depic+ons displays cumula+ve data? Ogive Line chart with 3 lines requires 4 variables With a scagerplot, if we have a 3 rd variable in data set, we can incorporate this categorical variable within scagerplot by using different colors
An ogive is a graph that plots cumula+ve frequency, or cumula+ve rela+ve frequency, against upper limit of the corresponding class A line chart tends to be used to track changes of the variable over +me.
Chapter 3 Central Location . Relates to the way data tend to cluster around some middle/central value TRUE Mean is the most likely used measure of central location for quantitiave data N . Population size Notation μ represents . Population mean Data sets would the arithmetic mean NOT be a good measure of central location? 7,8,8,9,25 Variance is NOT a measure of central location We refer to arithmetic mean as simply the mean or average μ . Population mean Measure of central location that can BEST be labeled as midpoint of data set is the MEDIAN Notation x represents Sample mean Arithmetic mean is usually NOT a good measure of central location if a outlier exists In a neighborhood there are =ive houses listed for sale for the following amounts: $250,000; $275,000; $280,000; $295,000; and $515,000. What is the BEST measure of central location for the price of a house in the neighborhood? . MEDIAN Most widely-used measure of central location? Mean Measure of central location where half the values of the data set lie above this measure & half the values of data set lie below this measure is known as Median
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
When there are an odd number of observations & observations are in order from smallest to largest, the median is Middle observation If a variable has 2/more modes, then we say it’s Multimodal Median is the best measure of central location when outliners are present An owner of a grocery store wants to determine brands of soda that customers purchase at store. When summarizing data about soda brand purchases, meaningful measure of central location is the mode Measure of central location that can BEST be labeled as the midpoint of the data set is the Median When there are an even number of observations, and the observations are in order from smallest to largest, the median is the average of the 2 middle observations Mode is a measure of central location that is the most frequently occurring value in data set Mean is usually greater than median when data are positively skewed When summarizing a qualitative data set, mode is best measure of central location The 1 st step to determine median is to place data in numerical order The mode’s usefulness as a measure of central location tends to diminish with variables that have more than 3 modes A skewness coefficient of 0 indicates observations are evenly distributed on both sides of the mean. An owner of a grocery store wants to determine the brands of soda that customers purchase at the store. When summarizing the data about soda brand purchases, the meaningful measure of central location is the mode The function to find the mean of a subset in R is tapply When a mean is calculated and some observations are given greater importance than others, we refer to this measure of central location as a weighted mean The mode is de=ined as the most frequently occurring value of a data set. A boxplot is a visual representation of particular percentiles Function to find the mean of a subset in Excel is AVERAGEIF(range, criteria, [average_range])
The formula for the weighted mean is x =Σw i x i . Using this formula, what is the restrictions on the weights. They must sum to 1 Henry's score on an accounting exam placed him in the 85 th percentile in the class. What percentage of students scored higher than Henry? 15% Median is a measure of central location that divides the observations for a variable in half. When calculating a percentile, the =irst step is to arrange the data set in ascending order (from least to greatest) Q1 Q1 25 th percentile. Q2 Q2. 50 th percentile. Q3 Q3 75 th percentile. Q4 Q4 100 th percentile. The p th percentile divides a variable into two parts. What percentage is greater than p? (100-p) Which of the following are included in a =ive-number summary? Q3 Maximum value Minimum value Q2 Q1 Quartiles divide the data into 4 equal parts A box-and-whisker plot is another name for a boxplot Which of the following values is included in a box plot? 2 nd quartile
1 st quartile When a box plot is constructed, an outlier is a data point that is farther than 1.5×IQR from either Q1 or Q3 If the median price for a home is $200,000, then 50% of the homes cost less than $200,000. The interquartile range (IQR) of a data set is the difference between 1 st & 3 rd quartiles. In a box plot, if median is in the center of the box and the left and right whiskers are equidistant from their respective quartiles, then the distribution is Symmetric The p th percentile divides a variable into two parts. What % is less than the p th percentile? approximately p percent Geometric mean is the multiplicative average of a data set. The geometric mean return accurately captures a negative annual return from an investment. The appropriate measure for evaluating investment returns over several years is the geometric mean Which of the following statements regarding geometric mean is MOST accurate? Geometric mean is less sensitive to extreme values than arithmetic mean. We interpret geometric mean return as the annual return that you will earn from an investment Diane wants to calculate her average annual return on an investment that she made three years ago. What value will provide Diane with the most accurate measure of the average annual return? Geometric mean Chebyshev’s theorem is applicable when the data have any shape. What is the relationship between the variance and the standard deviation? The standard deviation is the positive square root of the variance. When calculating average growth rates, we apply the formula for the.. Geometric mean The Geometric mean is the multiplicative average of a data set.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The geometric mean is the appropriate measure to analyze multi-year investments. True What is the symbol for the average growth rate? G g In a box plot, if the median is right of center and the left whisker is longer than the right whisker, then the distribution is... Negatively skewed The range is not considered a good measure of dispersion because it focuses solely on the extreme values and ignores every other observation in the sample or the population. You use the geometric mean when we calculate an average growth rate Which of the following statements about the mean absolute deviation (MAD) is MOST accurate? MAD is denominated in the same units as the original data. The range is the difference between largest & smallest values When calculating average growth rates, we apply the formula for the geometric mean Two widely used measures of dispersion are variance & standard deviation. Chebyshev's theorem provides the proportion of observations that lie within k standard deviations of the mean. The value k must be ______. Greater than 1 Two widely used measures of dispersion are... the variance and the standard deviation. Converting observations into z-scores is also called standardizing the observations The notation σ 2 represents population variance The average of the absolute differences between values of the data set & the mean is the mean absolute deviation Which of the following statements about variance is MOST accurate? Variance is the average of the squared deviations from the mean. The empirical rule should be applied to data sets that are approximately bell-shaped. True Chebyshev’s theorem results in conservative bounds for the percentage of observations falling in a particular interval.
What is the relationship between variance and standard deviation? The standard deviation is the positive square root of the variance. The average of the sum of squared differences from the mean is the population variance The empirical rule is appropriate when distribution of a variable is symmetric and bell shaped