STAT200+2023+GE2.pdf

pdf

School

University of Delaware *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

6

Uploaded by ChancellorElectron13709

Report
STAT 200 Guided Exercise 2 Be sure to: Please submit your answers in a Word or PDF file to Canvas at the place you downloaded the file. You can paste Excel/JMP output into a Word File. Please submit only one file for the assignment. It is ok to do problems by hand. However, you will need to scan or take a picture of your work. Guided Exercises are not graded but we check the work. Key Topics Measures of Central Tendency Stem & Leaf Plot and describing distributions Using Excel to graph data 1. Let’s finish up the Academy Award winners for best actor (and actress) since 1996 that was given in Assignment 1, now that we have command of both central tendency and variability. Each year the Academy of the Screen Actors Guild gives an award for the best actor and actress in a motion picture. We have recorded the name and age of each since 1996. The data for males and females is given below (the sample size, n =27). The sum of their age as well as the sum of age squared are also given. YEAR ACTOR MALE AGE ACTRESS FEMALE AGE 1996 Geoffrey Rush 45 Frances McDormand 39 1997 Jack Nicholson 60 Helen Hunt 34 1998 Roberto Benigni 46 Gwyneth Paltrow 26 1999 Kevin Spacey 40 Hilary Swank 25 2000 Russell Crowe 36 Julia Roberts 33 2001 Denzel Washington 47 Halle Berry 35 2002 Adrien Brody 29 Nicole Kidman 35 2003 Sean Penn 43 Charlize Theron 28 2004 Jamie Foxx 37 Hilary Swank 30 2005 Philip Seymour Hoffman 38 Reese Witherspoon 29 2006 Forest Whitiker 45 Helen Mirren 61 2007 Daniel Day-Lewis 50 Marion Cotillard 32 2008 Sean Penn 48 Kate Winslet 33 2009 Jeff Bridges 60 Sandra Bullock 45 2010 Colin Firth 50 Natalie Portman 29 2011 Jean Dujardin 39 Meryl Streep 62 2012 Daniel Day-Lewis 55 Jennifer Lawrence 22 2013 Matthew McConaughey 44 Cate Blanchett 44 2014 Eddie Redmayne 32 Julianne Moore 54 2015 Leonardo DiCaprio 41 Brie Larson 26 2016 Casey Affleck 41 Emma Stone 28 2017 Gary Oldman 59 Frances McDormand 60 2018 Rami Malek 37 Olivia Colman 45 2019 Joaquin Phoenix 45 Renee Zellweger 50 2020 Anthony Hopkins 83 Frances McDormand 63 2021 Will Smith 53 Jessica Chastain 44 2022 Brandan Fraser 54 Michelle Yeoh 60 Sum X 1257 Sum X 1072 Sum X-squared 61635 Sum X-squared 47012
Here is the Stem and Leaf plot for each group to compare the distributions. Male Actor Age Female Actor Age Stem Leaf Count Stem Leaf Count 2 2 2 1 2 9 1 2 5 6 6 8 8 9 9 7 3 2 1 3 0 2 3 3 4 5 3 6 7 7 8 9 5 3 5 5 9 3 4 0 1 1 3 4 5 4 4 4 2 4 5 5 5 6 7 8 6 4 5 5 2 5 0 0 3 4 4 5 0 4 2 5 5 9 2 5 6 0 0 2 6 0 0 1 2 3 5 6 6 7 7 7 7 8 3 1 8 4|5 is 45 years old 4|5 is 45 years old a. Calculate the measures of central tendency and variability for each group. Males Females Mean Median Mode Range Variance Standard Deviation Coefficient of Variation b. Briefly compare the two distributions with an emphasis on the measures of Central Tendency and Variability. c. For both men and women there are a few outliers. For men there is one individual with a value of 83. For women there is one winner aged 60, another 61 and a third aged 62 and a fourth at 63. Calculate z-scores for these values and interpret their meaning. d. The value of 83 for males is a large outlier. Sometimes, we make a decision to remove an outlier from an analysis. That should never be done lightly. However, large outliers can have a large impact on summary statistics based on the mean. Let’s remove the value of 83 from the data and see what happens. We want to see if the outlier influences the mean, median, standard deviation, and CV much. How to calculate things without the outlier? If you are using the data in Basic Stats.xlsx, just delete the outlier. The file will immediately recalculate for the other 26 values. If you are using the sums of X and x-squared, here are the values you need. Sum(X) = 1174; Sum(X-squared) = 54746; N=26. Full Data Outlier Removed
Mean Median Std Dev CV 2. Below is the data for infant mortality for 37 OECD countries in 2020. The Organization for Economic Co-operation and Development (OECD) is an international economic organization of 37 countries, founded in 1961 to stimulate economic progress and world trade. It is a forum of countries describing themselves as committed to democracy and the market economy, providing a platform to compare policy experiences, seeking answers to common problems, identify good practices and coordinate domestic and international policies of its members (Wikipedia). OECD’s web site provided some data on infant mortality for 37 countries. Infant mortality (the rate of death of children under 1 year of age per 1,000 live births) is a measure of development. The table below has the data for 37 countries already sorted from smallest to highest (https://data.oecd.org/healthstat/infant-mortality-rates.htm). The actual data, a Histogram (from JMP) and the Stem and Leaf Plot for this data is given below. Use the stem and leaf values for some calculations, such as the min and max. For other calculations the Sum(X) and Sum(X 2 ) is given. COU N TRY TIME IMR AUS 2020 3.2 AUT 2020 3.1 BEL 2020 3.3 CAN 2020 4.5 CZE 2020 2.3 DNK 2020 2.4 FIN 2020 1.8 FRA 2020 3.6 DEU 2020 3.1 GRC 2020 3.2 HUN 2020 3.4 ISL 2020 2.9 IRL 2020 3 ITA 2020 2.4 JPN 2020 1.8 KOR 2020 2.5 LUX 2020 4.5 MEX 2020 12.3 NLD 2020 3.8 NOR 2020 1.6 POL 2020 3.6 PRT 2020 2.4 SVK 2020 5.1 ESP 2020 2.6 SWE 2020 2.4 CHE 2020 3.6 TUR 2020 8.5 GBR 2020 3.8 USA 2020 5.4 CHL 2020 5.6 EST 2020 1.4 ISR 2020 2.4 SVN 2020 2.2 COL 2020 16.8 LVA 2020 3.5 LTU 2020 2.8 CRI 2020 7.9 Stem Leaf Count 1 | 4 6 8 8 4 2 | 2 3 4 4 4 4 4 5 6 8 9 11 3 | 0 1 1 2 2 3 4 5 6 6 6 8 8 13 4 | 5 5 2 5 | 1 4 6 3 6 | 0 7 | 9 1 8 | 5 1 9 | 0 10 | 0 11 | 0 12 | 3 1 14 | 0 15 | 0 16 | 8 1 A value of 8|5 is an Infant Mortality of 8.5 Sum(X) 148.700 Sum(X-squared) 925.570 N 37 Q1 2.40 Q3 4.15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a. Calculate the: Mean = Median = Mode = Maximum = Minimum = Range = Inter-Quartile Range = Variance = Standard Deviation = Coefficient of Variation = b. What is the position of the median value for this data? c. Does the mode make sense as a measure of Central Tendency for this data? d. Calculate a z-score for an infant mortality rate of rate of 16.8. 3. Answer the following questions about variability of data sets: a. How would you describe the variance and standard deviation in words, rather than a formula? Think of what you are calculating and how it might be useful in describing a variable. b. What is the primary advantage of using the inter-quartile range compared with the range when describing the variability of a variable? c. Can the standard deviation ever be larger than the variance? Explain. d. Can the variance ever be negative? Why or why not? e. Show the formula for the Coefficient of Variation and explain what it is and how it can be useful in comparing the variability of different variables.
° 4. Two banks use alternative methods of waiting in line for a teller. Both banks user three tellers. Bank A uses separate lines for each teller so a customer must pick which line she or he thinks is best. This approach does allow a customer to pick his/her favorite teller. In contrast, the Bank B uses a single waiting line which leads customers to the next available teller out of all tellers available. We take a random sample of 15 customers from each bank and record the waiting time in minutes. We are asked to analyze the data and determine the differences we note between the approaches of the two banks. Use graphs and summary measures of central tendency and variability to explain the differences. In the end, I am asking that you summarize your finding in words and not just numbers. Here are the data. The data are given below (not sorted) and I provided the Sum(x) and the Sum(x^2). This is an excellent opportunity to use Basic Stats.xlsx. You can copy these two columns of data directly into that Excel file. Sum(x) 71.50 70.00 Sum(x^2) 360.19 330.68 Bank A - 1 Line Bank B Multiple 5.3 5.0 2.5 3.8 5.9 4.9 4.1 4.3 5.4 5.0 3.8 4.7 5.1 3.9 4.1 5.4 4.1 5.1 5.0 4.0 5.1 4.1 5.7 4.5 3.0 5.1 5.3 5.4 7.1 4.8 a. Graph the two banks using stem and leaf plots. Describe he results of your graphs. Stem and Leaf Plot of Waiting Time at Two Banks Bank A Bank B Stem Leaf Stem Leaf ° b. Calculate the following for each bank:
Bank A Bank B Mean Median Mode Variance Std Deviation Minimum Maximum Range Coefficient of Variation c. Summarize your results in a paragraph Page of 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help