Chapter 2

pdf

School

Camosun College *

*We aren’t endorsed by this school

Course

116

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

15

Uploaded by MateKookaburaPerson921

Report
Ch2-page 1 Chapter 2 Summarizing Data 2.1 Examining numerical data 1. Graphs Scatterplot: a plot for describing relationship between two _______________ variables Example 1. a) Description of the plot : b) Description of the plot :
Ch2-page 2 Dotplots for single numerical variable : each dot represents an ________________. Suitable for small data sets. Example 2. Describe the shape, centre, and variability of the distribution of Interest Rate based on the dotplot: Time Series Plot (text page 78) The histogram and box plots below show the distribution of finishing times for male and female winners of the New York Marathon between 1970 and 1999. Describe the trend of the finishing times for males and females: Class EX1. Describe the trend of the total number of reported COVID-19 cases in BC https://experience.arcgis.com/experience/a6f23959a8b14bfa989e3cda29297ded
Ch2-page 3 Frequency table for single variable: Example 3. Frequency table of the loan50 data set Classes/bins/intervals: Frequency: Relative frequency: Q: What are the most common interest rates? Histograms for single numerical variable : the taller the bars, the higher the _________ of data. Suitable for large data sets. Example 4. Describe the shape, centre, and variability of the distribution of Interest Rate based on the histogram: A mode is represented by a prominent ____________ in the distribution.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Ch2-page 4 ______________ ______________ ________________ Class EX2. Describe the shape, mode, and spread of the distribution of COVID cases reported in BC 2. Descriptive Statistics Example 5. Highest daily temperature of 5 summer days in Victoria Data: 17 24 26 26 28 Notation: n=5 Measures of the Centre of a distribution: 1) Mean (the balance point): Sample mean of 1 2 , ,..., n x x x is 1 ... n x x x n + + =
Ch2-page 5 Q1. What is the mean of the 5 sample temperatures in the example? Use the STAT 0 mode of the Sharp EL-531 calculator to compute. Computing mean using STAT 0: Step1: Set Stat 0 mode: MODE 1 0 Step2: Enter data using the “M+” key as “Enter” 17M+ 24M+ 26M+ 26M+ 28M+ Step 3: Compute the mean : RCL x (button 4) 2) Median: the value in the middle, or the average of the two middle values, in a sorted set of values. Q2. What is the median of the 5 sample temperatures in the example? Class EX3. Find the median of 17, 24, 28, 26 3) Mode: the value(s) that occur(s) most often. In the example, mode = ______ 4) Weighted mean Example 6. A student’s course grade is determined as follows: HW=20%, tests=40%, final=40% Sarah has 85 in HW, 70 in tests, and 75 in final. Compute Sarah’s course average.
Ch2-page 6 Mean vs Median and skewness a) If mean >> median, then the distribution is _____________ skewed b) If mean << median, then the distribution is _____________ skewed c) If mean ≈ median, then the distribution is nearly ______________ Choose ____________ over mean when distribution is very ______________ Use ____________ when distribution is nearly _______________ Example 7. Data A: 10, 15, 20, 26, 50 mean = _______________________, median = ____________ Data B: 10, 15, 20, 26, 200 mean = _______________________, median = ____________ Measures of the Spread (or variability) of a distribution: a) Sample variance for 1 2 , ,..., n x x x 2 2 2 1 ( ) ... ( ) 1 n x x x x S n + + = b) Sample Standard Deviation (SD) for 1 2 , ,..., n x x x 2 2 1 ( ) ... ( ) 1 n x x x x S n + + = SD = Var = _____________
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Ch2-page 7 Example 8. At a birthday lunch, five friends’ meal costs are: $17, 18, 18, 22, 27 Calculate the sample variance and standard deviation by hand using the formulas. Computing sample SD and variance using STAT 0: (if data are already entered, go to Step 3 and 4) Step1: Set Stat 0 mode: MODE 1 0 Step2: Enter data using the “M+” key as “Enter” Step 3: Calculate the sample SD : RCL Sx (button 5) Step 4: Calculate the sample variance : while Sx is displayed on the screen, press X 2 and = Note: Population SD and Sample SD are different. We compute sample SD only. Class EX4. Use STAT 0 to calculate the mean, sample standard deviation S, and sample variance S 2 for data: 5, 6, 11. Notations Population mean: µ Sample mean: x Population SD: σ Sample SD: S Measure of Position: 1) Z-score: measures how many SDs a value is away from the mean. Z = 1.5 means x is _________ SDs _________ the mean, and Z = -1.5 implies x is _________SDs _________ the mean.
Ch2-page 8 x Z µ σ = Z score of x in a ________________ with mean _____ and SD ________ x x Z S = Z score of x in a ___________ with mean ______ and SD _______ Z-score Rule of Thumb: X is an extreme value if its Z <-2 or Z >2 . Example 9. Suppose you earned 80% on a test and the class mean is 67% and class SD is 10%. What’s your relative position in the class? Example 10. Sarah’s test 1 score is 70 and the class has a mean of 67 and SD of 10. Sarah’s test 2 score is 68 and the class has a mean of 60 and SD of 12. Has Sarah’s standing in the class changed? If so, for better or worse? 2) Quartiles: Q1, Q2, Q3 divide data into approximately 4 equal quarters Draw
Ch2-page 9 Five summary statistics: min, Q1, Q2, Q3, max Interquartile range = IQR = Q3 – Q1 Example 11. Find the 3 quartiles and IQR of the following 11 major earthquake magnitudes: 5.4 6.2 6.2 6.4 6.4 6.5 7.0 7.2 7.2 7.7 8.0 (sorted) Class EX5. Find the 5 summary statistics and IQR for 6.2 6.4 6.5 7.0 7.2 7.2 7.7 8.0 Robust measure of spread: IQR Use IQR, instead of SD , to measure variability or ___ ______ when extreme values are present. An outlier is an observation that appears extreme relative to the rest of the data. 1.5(IQR) rule: X is an outlier if X < Q1 – 1.5(IQR), i.e, 1.5xIQR below Q1, or X > Q3 + 1.5(IQR), i.e., 1.5xIQR above Q3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Ch2-page 10 Example 12. Are there any outliers in the 11 earthquake data points? Boxplots: Draw a simple boxplot by hand (label the boxplot with the 5 number summary) Class EX6. Sketch a boxplot for the 11 earthquake data points: min = 5.4, Q1=6.2, Q2=6.5, Q3=7.2, max=8.0 1 st : draw a scaled line 2 nd : locate the 5 number summary 3 rd : make a boxplot
Ch2-page 11 Computer programs usually draw boxplots that helps identify outliers: e.g., Identify shape of distributions based on boxplots
Ch2-page 12 2.2 Considering categorical data 1. Describing a single categorical variable with a Frequency Table and a Bar Graph Example 13. Frequency table for homeownership Bar graph and relative frequency bar graph . Notice there are usually gaps between bars in a bar graph, unlike histogram which has no gap between bins. Interpret the homeownership bar graphs :
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Ch2-page 13 2. Describe two categorical variables in a contingency table Example 14. Contingency table of homeownership and application type. Terms: cell, row, column, row total, column total, grand total To explore whether or not two categorical variables in a contingency table are related , we examine row proportions (or column proportions) Row proportions of the contingency table of homeownership and app_type are: homeownership app-type Rent Mortgage Own Total individual joint total Now, focus on one column at a time. If there is no association between the two variables, then the row proportions within each column should be nearly the ________________. Are homeownership and app_type associated?
Ch2-page 14 Class EX7. Consider the two-way table below. Find the row proportions and fill them into the table given. outcome treatment (simulated) infection No infection Total vaccine placebo total Now, focus on one column at a time. If there is no association between the two variables, then the row proportions within each column should be nearly the ________________. Do Treatment and Outcome appear to be associated?
Ch2-page 15 Example 15. Gender and Sports Preference A researcher wished to see if sport preference and gender are related. She selected a random sample of 80 individuals and asked them which of three sports was their favorite. The results are shown in a 2x3 contingency table below: Gender Football Baseball Hockey Total Male 18 10 4 32 Female 20 16 12 48 Total 38 26 16 80 a) What percent of the respondents prefer Hockey? b) What percent of the females prefer Hockey? c) What percent of the males prefer Hockey? d) Based on the answers for b) and c), would you conclude that gender and sport preference seem to be associated? Explain. Class EX8. Cont’d with Example 15 e) What percent of the respondents are females? f) What percent of the respondents who prefer Football are females? g) What percent of the respondents who prefer Baseball are females? h) Based on the answers for f) and g), would you conclude that gender and sport preference seem to be associated? Explain.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help