5A2 Fundamentals of Statistics [Filled) 8e

docx

School

Lansing Community College *

*We aren’t endorsed by this school

Course

119

Subject

Mathematics

Date

Feb 20, 2024

Type

docx

Pages

9

Uploaded by HighnessGiraffeMaster1026

Report
Math 119 5A Fundamentals of Statistics Student Learning Outcomes: Select and use appropriate statistical methods to summarize and analyze data in contexts. Develop and evaluate inferences and predictions that are based on data in context. Learning Objectives: Define statistics, population, sample, population parameters, and sample statistics. Describe 5 common sampling methods. Define bias and explain how it can affect a statistical study. Distinguish between observational studies, experiments, and retrospective studies. For experiments, distinguish between a treatment group and a control group. Understand the placebo effect and importance of anonymizing experiments. Define and interpret a margin of error and confidence interval. Lead In question: In this section we will be discussing statistics and how we gather data. Humans have learned a lot about how to properly gather data in order to ensure our conclusions are valid. You do not want to gather data in such a way that might lead you to false results. Statistics (singular) is the science of collecting, organizing, and interpreting data. Statistics (plural) are the data that describe or summarize something. One of the things that we will discuss is the Placebo Effect. Check out the two videos below. BBC Documentary: https://www.youtube.com/watch?v=HqGSeFOUsLI Placebo Effect: https://www.youtube.com/watch?v=z03FQGlGgo0 Write out a sentence, explaining why someone who is collecting data or running an experiment, needs to be aware of the placebo effect. The placebo effect can lead us to incorrect conclusions if we are not aware of it. People perceive positive effects of treatments even when there might not be any, so we have to control for this when collecting data or running an experiment. I. How Statistics Works The population in a statistical study is the complete set of people or things being studied (the what/who we are trying to learn about, it will often contain the word “all” when you describe it.) The sample is the subset of the population from which the raw data are actually obtained (the what/who we have data on, it will often give the exact number.)
Parameters are specific characteristics of the population that a statistical study is designed to estimate. Statistics are numbers or observations that summarize the raw data; since the sample is what we have in our hands to work with, we can compute or crunch these. In general, we compute statistics from the sample in order to estimate the corresponding parameters from the population. To help with remembering this terminology, the “s’s for s ample s tatistic” go together and the “p’s for p opulation p arameter” go together. Example: Let’s say we wanted to study the LCC student body. a. This is our population . b. Are the students in our Math 119 class a sample? If so, are they a good sample? Yes, our Math 119 class is a subset (small collection) of the students who attend LCC. One section of Math 119, however, is probably not representative of the LCC student body as a whole since all of the people in our section may have certain traits in common that differ from the general population. (do we prefer online or face-to-face classes? Do we prefer to meet at certain times during the day? Which days do we prefer to meet? Etc.) In order to get a sample that accurately represents the entire student body, we would have to be strategic about how we selected the students in our sample. c. Will we be able to extend what we learn from the sample to the population in this case? We cannot generalize our results to the entire LCC student body since our sample does not represent the student body as a whole. However, our sample might be useful to say something about Math 119 students who prefer online or face-to-face classes. Example: In order to gauge public opinion on the Presidents’ plan to contain Iran’s nuclear program, the Pew Research Center surveyed 1001 Americans by telephone to find the percentage that favor the plan. a. Identify the population and sample. Since it is the sample that is typically described more in the words of every problem, I would suggest you start with that…
Sample: the 1001 Americans surveyed by telephone (that’s what/who we have data on and thus in our hands so to say that we can work with) Population: the set of all Americans (the what/who we are trying learn about) b. Describe in words the population parameters, and sample statistics. The sample statistic is the percentage of the 1001 American’s surveyed (our sample) who favor the Presidents’ plan. We have the data and we would be able to compute this percentage. The parameter would be something like the percentage of all Americans who favor the Presidents’ plan. We would take the sample statistic and use it as an estimate for the whole population. Note that the parameter and statistics are always the same type of number (percentage, average, etc.). II. Choosing a Sample If the sample fairly represents the population as a whole, then it’s reasonable to make inferences from the study. If the sample is NOT representative, then there’s little hope of drawing accurate conclusions about the population. A representative sample is a sample in which the relevant characteristics of the sample members match those of the population. Common Sampling Methods Simple random sampling: We choose a sample of items in such a way that every sample of the same size has an equal chance of being selected. Systematic sampling: We use a simple system to choose the sample, such as selecting every 10th or every 50th member of the population. Convenience sampling: We choose a sample that is convenient to select, such as people who happen to be in the same classroom. Cluster Sampling: We first divide the population into groups, or clusters , and select some of these clusters at random. We then obtain the sample by choosing all the members within each of the selected clusters. Stratified sampling: We use this method when we are concerned about differences among subgroups, or strata, within a population. We first identify the subgroups and then draw a simple random sample within each subgroup. The total sample consists of all the samples from the individual subgroups.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Usually, it is “cluster” and “stratified” that are the most difficult for students to distinguish. In both of them, we group the population. But in cluster, we take some of the groups. In stratified, we take some members of every group. The study can be successful only if the sample is representative of the population. Sample size is important, because a large well-chosen sample has a better chance of being representative than a small one. However, the selection process is even more important: A small well-chosen sample is likely to give better results than a large poorly chosen sample. Example: Identify the type of sampling in each of the following cases. a. You are conducting a survey of students in a dormitory. You choose your sample by knocking on the door of every fifth room. Systematic b. To survey opinions on a proposed new water line, a research firm randomly draws the addresses of 200 homeowners from a public list of all homeowners. Simple Random
c. To see how people like its new product, a small beverage company conducts a taste test, offering the taste test to customers at a local grocery store. Convenience d. Agricultural inspectors for Jefferson County check the levels of residue from three common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county. Stratified (grouped by corn-producing farm and they have checked 25 ears from each farm) e. Anthropologists determine the average brain size of early Neanderthals in Europe by measuring all of the skulls found at three selected sites in Europe. These three sites were selected randomly from the many known Neanderthal sites. Cluster (grouped by Neanderthal site and have measured all skulls from the selected sites) A statistical study suffers from bias if its design or conduct tends to favor certain results. III. Types of Statistical Study In an observational study , researchers observe or measure characteristics of the sample members but do not attempt to influence or modify these characteristics. Example of an observational study : During the COVID-19 Pandemic, public health officials seek to determine whether hospitals are overwhelmed by contacting hospital administrations and asking what fraction of their regular and intensive care unit (ICU) beds are occupied. A retrospective study is an observational study that uses data from the past, such as official records or past interviews. Sometimes it may be impractical or unethical to conduct an experiment. A retrospective study is observational because the researchers do not change the behavior of the participants, but it resembles an experiment because the cases effectively represent a treatment group and the controls represent a control group. This is observational but natural groups are formed. Example of a retrospective study: Suppose we want to study how marijuana use during pregnancy affects newborn babies. Because it is already known that marijuana can be harmful during pregnancy, it would be unethical to divide a sample of pregnant mothers randomly into two groups and then force the member of one group to use marijuana. The cases consist of mothers who used marijuana (by choice) during a past pregnancy, and the controls consist of mothers who did not use marijuana. In an experiment , researchers apply a treatment to some or all of the sample members and then look to see whether the treatment has any effects. Example of an experiment:
In a study about the effects of vitamin C, researchers gave 10,000 people pills, among which 5,000 in a treatment group are given vitamin C and 5,000 in a control group are given a sugar pill. Researchers then look for differences in the numbers of colds among people in the two groups. o The treatment group in an experiment is the group of sample members who receive the treatment being tested. o The control group in an experiment is the group of sample members who do not receive the treatment being tested. o It is important for the treatment and control groups to be selected randomly and to be alike in all respects except for treatment. o For example, if the treatment group consisted of active people with good diets and the control group consisted of sedentary people with poor diets, we could not attribute any differences in colds to vitamin C alone. IV. Placebo Effect and Anonymizing For an experiment involving people, using a treatment and control group may not be enough. People can be affected by their beliefs as well as by real treatments. For example, stress and other psychological factors have been shown to affect resistance to colds. A placebo lacks the active ingredients of a treatment being tested in a study but looks or feels enough like the treatment so that participants cannot determine whether they are receiving the placebo or the real treatment. The placebo effect refers to the situation in which patients improve simply because they believe they are receiving a useful treatment. In statistical terminology, the practice of keeping people in the dark about who is in the treatment group and who is in the control group is called anonymizing . You may also hear of this as “blinding.” An experiment is single-anonymizing if the participants do not know whether they are members of the treatment group or members of the control group, but the experimenters do know. You may also hear of this as single-blind. An experiment is double-anonymizing if neither the participants nor the experimenters (people administering the treatment) know who belongs to the treatment group and who belongs to the control group. You may also hear of this as double-blind. Example: Determine whether each of the studies described is observational or an experiment. If the study is an experiment, identify the control and treatment groups and discuss whether making the study single- or double-anonymous is necessary.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
If the study is observational, state whether it is a retrospective study, and if so, identify the cases and controls. a. What is the average income of stock brokers? Observational, not retrospective b. Do seat belts save lives? This could be done in an experiment but only if we did not use real humans! Crash test dummies are used to look at effects like this. Check out this one at a speed of about 25 miles per hour: https://www.youtube.com/watch?v=9_Af8w2SAT4 If we wanted actual human data, this would have to be observational and retrospective. Case: People in car accidents who were wearing seatbelts. Control: People in car accidents who were not wearing seatbelts. We would then examine the fatality rate among each group. (spoiler alert, the seat belt saves lives!) c. Can lifting weights improve runners’ times in a 10-kilometer race? Experiment Treatment: Runners who lifted weights Control: Runners who did not lift weights We would then compare time improvements between the two groups. This experiment cannot be anonymized since runners will know if they are lifting weights or not. d. Can a new herbal remedy reduce the severity of colds? Experiment Treatment: People who use the herbal remedy for a cold. Control: People who don’t use the herbal remedy for a cold. This experiment can be double-anonymized. We can prepare a placebo that does not contain the actual remedy for our control group to take. We can make it so the person administering treatment does not know whether the patient is taking the placebo or the actual remedy so this is double-anonymized. V. Confidence Intervals and Margin of Error
When collecting data, we are never 100% certain that our survey results actually match the population. Surveys and Poll results that estimate an average or a proportion usually include a margin of error. Think of this like “wiggle room” that we give ourselves when reporting our results. We use this wiggle room to construct what is called a confidence interval . A confidence interval is a range of values that we believe the population parameter lies between. It captures the likely values of the population parameter. We believe every single value that lies in the interval, no matter where it is located in the interval. A confidence interval is: from (sample statistic − margin of error) to (sample statistic + margin of error). Example: Suppose you work for a major news station and hope to report who will win an election before the other major networks. An election eve poll finds that 52% of surveyed voters plan to vote for Smith, and she needs a majority (more than 50%) to win without a runoff. The margin of error in the poll is 3 percentage points. A) Construct a confidence interval and write a sentence interpreting it. ( 52% 3% , 52% + 3% ) ( 49% , 55% ) Based on our confidence interval, we project Smith will receive between 49% and 55% of the vote. B) Should you report that Smith is going to win the election? No! Since the lower bound of our confidence interval is below 50%, this means there is a possibility that Smith receives less than 50% of the vote. Does this make sense? In presidential elections, typically a candidate just needs the highest percentage of votes in a state to win that state. This means they don’t need over 50% to win, they just need the highest percentage of the vote. In a 2016 exit poll, CNN reported with a 3.5% margin of error that Hillary Clinton would win the state of Wisconsin and receive 3.9% more votes than Donald Trump. A) What is the population being studied?
Population: all voters in Wisconsin B) Construct a confidence interval using the sample statistic and margin of error. Write a sentence interpreting the interval. ( 3.9% 3.5% , 3.9% + 3.5% ) ( 0.4 % , 7.4% ) Based on our confidence interval, we project Hillary Clinton will receive between 0.4% and 7.4% more of the vote than Donald Trump. C) Based on the poll and the margin of error, should you expect Hillary Clinton won the state of Wisconsin? Yes, since our confidence interval is above 0, we would expect Hillary won the state of Wisconsin by at least 0.4% of the vote (and by at most 7.4% of the vote). Note if any part of our confidence interval had been negative, meaning that 0 lies within the interval, then we would expect a Donald Trump win to be possible. D) In 2016, Donald Trump won the state of Wisconsin by receiving 47.9% of the vote compared to Hillary Clinton’s 46.9% (0.9% difference). Does this mean that CNN did something wrong when collecting or reporting the data? Most of the major news networks reported Hillary Clinton was going to win Wisconsin so this was not a reporting error. Check out some of the polls that the projections were based on here: ( https://projects.fivethirtyeight.com/2016-election-forecast/wisconsin/ ) This does indicate that the samples taken may not have been representative of the population! This caused many pollsters to change up their methods of data collection (especially when this phenomenon repeated itself in the 2020 election!). You can check out this article to see how polling has changed since then 2016 election: https://www.pewresearch.org/methods/2023/04/19/how-public-polling-has-changed-in-the-21st- century/
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help