Ch1 Sampling and Data

docx

School

California State University, San Marcos *

*We aren’t endorsed by this school

Course

MISC

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

11

Uploaded by DukeUniverse6135

Report
Chapter 1: Sampling and Data Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. Data are collections of observations, such as measurements, genders, or survey responses. (A single data value is called a datum, a term that does not see very much use.) A Population consists of all subjects (human or otherwise) that are being studied. A sample is a group of subjects selected from a population. A census is the collection of data from every member of the population. A variable is a characteristic or attribute that can assume different values. Types of Variables The person or thing these variables are assigned to are called observational units. Variables can be classified into two types: Qualitative or Categorical Data: Consists of names or labels that are not numbers representing counts or measurements, places subject into one of several groups or categories. Quantitative Data: Consists of numbers representing counts or measurements. Types of Quantitative Data Discrete: Possible values are only whole or “countable” numbers. (Household size, number of courses taken) Continuous: Possible values are infinite and without gaps on some range. (Weight, GPA) Descriptive statistics consists of the collection, organization, summarization, and presentation of data. Inferential statistics consists of generalizing from samples to populations, preforming estimations and hypothesis tests, determining relationships among variables, and making predictions. Levels of Measurements: A. The nominal level of measurement is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low too high) 1
Example B. Data are at the ordinal level of measurement if they can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless. Example C. Data are at the interval level of measurement if they can be arranged in order, and differences between data values can be found and are meaningful. Data at this level do not have natural zero starting point at which none of the quantity is present. Example 2
Data Interval Ratio Interval Ratio Continuous Ordinal Nominal Quantitative Qualitative D. Data are at the ratio level of measurement if they can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point (where zero indicates none of the quantity is present). Foe data at this level, differences and ratios are both meaningful. Example Populations & Samples The population in a statistical study is the entire group of individuals we want information about. A sample is a part of the population from which we actually collect information. We use a sample to draw conclusions about the entire population. A parameter is a numerical measurement describing population data. A statistic is a numerical measurement describing sample data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example: In October of 2010, CNN surveyed 888 registered voters in California and found that 53% of them were opposed to Prop 19 (legalization of Marijuana) with a margin of error of 3.5%. (Prop 19 was defeated with 54% voting “No.”) Population: Sample: Parameter: Statistic: Example: For each of the following values, determine if you have been given a statistic or a parameter. A. The actual average height of all adult human males in the US is 5' 9.4" B. 60% of the households sampled from the US own more than one car. C. Currently there are 18,759 houses in Santee, CA. D. The average age of a sample of men in Orlando, FL was 43 years. E. The average SAT score in CA in 1990 was 897. F. A mall survey found 36% of women prefer lipstick to lip balm. G. SANDAG estimates that, in 2008, the household median income of San Diego was $66,715. H. In 2006, 10.1% of deaths in California’s prisons were ruled a suicide. What were the key words to signal you were dealing with a statistic instead of a parameter? Example: A political scientist wants to know how college students feel about the Social Security system. She obtains a list of the 3456 undergraduates at her college and mails a questionnaire to 250 students selected at random. Only 104 questionnaires are returned. a) What is the population in this study? Be careful: what group does she want information about? b) What is the sample? Be careful: from what group does she actually obtain information? c) What are some reasons this sample is not representative of the actual population that she’s interested in?
Sampling Methods The best possible sample would be the entire population of interest. In order to get the most accurate picture of the US population, the government does the census. Example: What are some drawbacks or limitations of the census? There are times that complete population data is easy to collect. When that’s not the case, we need to choose a representative sample from our population. This section focuses on ways to effectively gather data in the real world. Example: What would you want in a good sample? How do we select a sample from the population? First, we select a sampling frame - the list of items or subjects you wish to sample from. It should be the same as the entire population of interest, but we may not be able to include the whole population in our sampling frame. Then we will use one of the methods described below to choose our sample. Regardless of how fair our sampling method or how large our sample is, there will be sampling variability associated with our sample. This is because each sample will select different people, and therefore, different values for the measured variables (no two samples will be identical). Good Sampling Methods Simple Random Sample (SRS): Each member of the population and every sample of size n has an equally likely chance of being selected for the sample. Examples: Drawing names from a hat, using random number table Stratified Random Sample : In a stratified sample the sampling frame is divided into non-overlapping groups or strata that have similar characteristics (i.e. geographical areas, age groups, genders). A random sample is taken from each stratum and then these smaller samples are combined to form the entire sample.
Example: When surveying to find the average closing cost of recent home sales in San Diego county, we may be concerned that different areas of the county have different home costs (i.e. beach communities have much more expensive homes than those further east). In order to protect against possibly getting a sample that is entirely composed of a certain community of San Diego, we could first divide up the up community or zip code, then sample 10 homes from each zip code. Cluster Sampling : The population is divided into groups called (heterogeneous) clusters. We then randomly select clusters and measure all of the individuals within the clusters that have been selected. Example: To measure customer satisfaction, airlines often randomly sample a set of flights, let’s say 10, (serving as clusters) from a possible 200 flights, and they distribute a survey to every person on the flights selected. Systematic Random Sample: This is random sampling with a system! From the sampling frame, a starting point is chosen at random, and thereafter at regular intervals. Every kth item or individual will be included in the sample. Example: Suppose you want to sample 8 houses from a street of 118 houses using a systematic random sample. You choose a starting point using a random number generator, i.e. the 5 th house. After that, you choose every 15 th house to be in the sample. Multistage Random Sample: Combining a variety of sampling methods Poor (Biased) Sampling Methods Voluntary Response Sampling: Individuals are asked to provide information, and all who respond are counted. Convenience Sampling: Selects individuals that are easiest to reach. Example: A survey is to be taken to ascertain student opinions about the quality of teaching at a high school. Below are some possible methods for picking 100 students out of the 2000 students registered at the school. For each, indicate what kind of sampling strategy is used. a) Using an official school roster of the 2000 students, pick every 20 th name. b) Separate the students by class (freshman, sophomore, junior, senior). Pick a simple random sample of size 25 from each of the four groups, and then combine these students into one sample of size 100.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c) Using a random number table, 100 names are chosen from the list of 2000 students registered at the high school. d) Separate the students by home-room assigned (each home-room has 25 students). Pick a random sample of four home-rooms. If a home-room is selected, all students in that home-room are included in the sample. Sampling Bias Sampling Bias is a tendency to favor selecting people or elements that have a particular characteristic or set of characteristics. A poor sampling plan is usually to blame for sampling bias. Reducing Bias The best strategy is to select individuals from our population at random . On average, the sample we take will look very similar to the population we drew from The more individuals you take in your sample , the more information you will have, and the better your estimation will be. Note: If your sampling method is flawed, taking more people into your sample will not reduce bias. Types of Sampling Bias Undercoverage: When some groups in the population are left out of the process of choosing the sample. Nonresponse Bias: When an individual chosen for the sample can’t be contacted or refuses to cooperate Response Bias: The behavior of the respondent or of the interviewer influences the outcome of the survey or questionnair
Experiments and Observational Studies Experiment vs. Observational Study An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. The purpose of an observational study is to describe some group or situation. An experiment deliberately imposes some treatment on individuals in order to observe their responses. The purpose of an experiment is to study whether the treatment causes a change in the response. Example: Recently a group of adults who swim regularly for exercise were evaluated for depression. It turned out that these swimmers were less likely to be depressed than the general population. In a second study, a group of 100 volunteers was randomly divided into two groups. The first group was asked to swim twice a week for six months, the second group did not follow an exercise plan. Which of the following is correct? A. The first study was an experiment while the second was an observational study. B. The first study was an observational study while the second was an experiment. C. Both studies were observational studies. D. Both studies were experiments.
Basic Terminology Experimental units: The items or subjects being given the treatment. These CAN vary from observational units! Factor: The explanatory (independent) variables that are thought to influence the response (outcome/dependent) variable studied. Levels: The specific values chosen for a factor. Treatment: A specific condition applied to the subjects/experimental units created from the combination of specific levels for all the factors. Control Group: The group of individuals/experimental units given no treatment. Placebo: A dummy treatment, sometimes a sugar pill, distilled water or saline solution. Why is it necessary to have a placebo as opposed to no treatment? Blinding: In an experiment, it is desirable to keep the information about the treatments hidden from the patients and anyone involved with evaluating the patient. This is known as blinding . (“Double” if both patients and evaluators are unaware, “Single” if only the patient is unaware). Blinding prevents conscious or subconscious biases or expectations from influencing the outcome of the study
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Basic Principles of Experimental Design Randomization: To ensure that we do not impose personal bias in the selection process, and use enough subjects in each group to reduce chance variation in the results. Control Groups: Using control groups helps to ensure that we account for known factors that could affect a study’s results. Researchers, however, may be unaware of important factors and not account for them in the experiment. Replication: To ensure we get the same results repeatedly (and not just by chance).
Lurking Versus Confounding Variables Lurking variable: It is a variable that is not among the explanatory or response variables, but influences the interpretation of the relationship Example: Ice cream consumption and shark attacks are highly correlated! Does ice cream consumption cause an increase the number of shark attacks or do shark attacks increase the demand for ice cream? What could be the lurking variable? Confounding variable: Additional explanatory variable that affects the response but is not considered when exploring the explanatory/response relationship. Example: Smoking and lung cancer. In a certain city, many men were developing lung cancer at a much higher rate compared to other surrounding cities. Many of the men worked in the local asbestos mines. They were therefore exposed to asbestos, which is a known risk for lung cancer. It is also known that, because of the stress miners are exposed to, they tend to smoke more, especially when working underground. Smoking is related to lung cancer, mining is related to lung cancer as well. Example: Identify either the confounding or lurking variable. You gather a sample of people of various heights and ask them to shoot baskets from the foul line. The number of baskets they make out of ten shots will be your measure of basketball ability. You find that tall people did make better basketball players than short people. Is height the only factor that affects ability?