Ch1 Sampling and Data
docx
keyboard_arrow_up
School
California State University, San Marcos *
*We aren’t endorsed by this school
Course
MISC
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
11
Uploaded by DukeUniverse6135
Chapter
1:
Sampling and
Data
Statistics
is the science of conducting studies to collect, organize, summarize, analyze, and draw
conclusions from data.
Data
are collections of observations, such as measurements, genders, or survey responses. (A
single data value is called a datum, a term that does not see very much use.)
A
Population
consists of all subjects (human or otherwise) that are being studied.
A
sample
is a group of subjects selected from a population.
A
census
is the collection of data from every member of the population.
A
variable
is a characteristic or attribute that can assume different values.
Types of Variables
The person or thing these variables are assigned to are called
observational units.
Variables can be classified into two types:
Qualitative or Categorical Data:
Consists of names or labels that are not numbers
representing counts or measurements, places subject into one of several groups or
categories.
Quantitative Data:
Consists of
numbers
representing counts or measurements.
Types of Quantitative Data
Discrete:
Possible values are only whole or “countable” numbers. (Household size,
number of courses taken)
Continuous:
Possible values are infinite and without gaps on some range. (Weight,
GPA)
Descriptive statistics
consists of the collection, organization, summarization, and presentation
of data.
Inferential statistics
consists of generalizing from samples to populations, preforming
estimations and hypothesis tests, determining relationships among variables, and making
predictions.
Levels of Measurements:
A. The
nominal
level
of
measurement
is characterized by data that consist of names,
labels, or categories only. The data cannot be arranged in an ordering scheme (such as
low too high)
1
Example
B.
Data are at the
ordinal level of measurement
if they can be arranged in some order,
but differences (obtained by subtraction) between data values either cannot be
determined or are meaningless.
Example
C.
Data are at the
interval level of measurement
if they can be arranged in order, and
differences between data values can be found and are meaningful. Data at this level do
not have natural zero starting point at which none of the quantity is present.
Example
2
Data
Interval
Ratio
Interval
Ratio
Continuous
Ordinal
Nominal
Quantitative
Qualitative
D.
Data are at the
ratio level of measurement
if they can be arranged in order, differences
can be found and are meaningful, and there is a natural zero starting point (where zero
indicates none of the quantity is present). Foe data at this level, differences and ratios are
both meaningful.
Example
Populations
&
Samples
The
population
in a statistical study is the entire group of individuals we
want information about.
A
sample
is a part of the population from which we actually collect information. We use
a sample to draw conclusions about the entire population.
A
parameter
is a numerical measurement describing population data.
A
statistic
is a numerical measurement describing sample data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Example:
In October of 2010, CNN surveyed 888 registered voters in California and found
that 53% of them were opposed to Prop 19 (legalization of Marijuana) with a margin of error of
3.5%. (Prop 19 was defeated with 54% voting “No.”)
Population:
Sample:
Parameter:
Statistic:
Example:
For each of the following values, determine if you have been given a statistic or a
parameter.
A.
The actual average height of all adult human males in the US is 5' 9.4"
B.
60% of the households sampled from the US own more than one car.
C.
Currently there are 18,759 houses in Santee, CA.
D.
The average age of a sample of men in Orlando, FL was 43 years.
E.
The average SAT score in CA in 1990 was 897.
F.
A mall survey found 36% of women prefer lipstick to lip balm.
G.
SANDAG estimates that, in 2008, the household median income of San Diego was
$66,715.
H.
In 2006, 10.1% of deaths in California’s prisons were ruled a suicide.
What were the key words to signal you were dealing with a statistic instead of a parameter?
Example:
A political scientist wants to know how college students feel about the Social
Security system. She obtains a list of the 3456 undergraduates at her college and mails a
questionnaire to 250 students selected at random. Only 104 questionnaires are returned.
a)
What is the population in this study? Be careful: what group does she want
information about?
b)
What is the sample? Be careful: from what group does she actually obtain information?
c)
What are some reasons this sample is not representative of the actual population
that she’s interested in?
Sampling
Methods
The best possible sample would be the entire population of interest. In order to get the most
accurate picture of the US population, the government does the census.
Example:
What are some drawbacks or limitations of the census?
There are times that complete population data is easy to collect. When that’s not the case, we
need to choose a representative sample from our population. This section focuses on ways to
effectively gather data in the real world.
Example:
What would you want in a good sample?
How do we select a sample from the population?
First, we select a
sampling frame
- the list of items or subjects you wish to sample from. It
should be the same as the entire population of interest, but we may not be able to include the
whole population in our sampling frame. Then we will use one of the methods described below
to choose our sample.
Regardless of how fair our sampling method or how large our sample is, there will be
sampling
variability
associated with our sample. This is because each sample will select different people,
and therefore, different values for the measured variables (no two samples will be identical).
Good
Sampling
Methods
Simple Random Sample (SRS):
Each member of the population and every sample of
size n has an
equally likely chance
of being selected for the sample.
Examples:
Drawing names from a hat, using random number table
Stratified Random Sample
:
In a stratified sample the sampling frame is divided into
non-overlapping groups or strata that have similar characteristics (i.e. geographical
areas, age groups, genders). A random sample is taken from each stratum and then these
smaller samples are combined to form the entire sample.
Example:
When surveying to find the average closing cost of recent home sales in San
Diego county, we may be concerned that different areas of the county have different
home costs (i.e. beach communities have much more expensive homes than those further
east). In order to protect against possibly getting a sample that is entirely composed of a
certain community of San Diego, we could first divide up the up community or zip code,
then sample 10 homes from each zip code.
Cluster Sampling
:
The population is divided into groups called (heterogeneous) clusters.
We then randomly select clusters and measure all of the individuals within the clusters
that have been selected.
Example:
To measure customer satisfaction, airlines often randomly sample a set of
flights, let’s say 10, (serving as clusters) from a possible 200 flights, and they distribute
a survey to every person on the flights selected.
Systematic Random Sample:
This is random sampling with a system! From the
sampling frame, a starting point is chosen at random, and thereafter at regular intervals.
Every kth item or individual will be included in the sample.
Example:
Suppose you want to sample 8 houses from a street of 118 houses using a
systematic random sample. You choose a starting point using a random number
generator, i.e. the 5
th
house. After that, you choose every 15
th
house to be in the sample.
Multistage
Random
Sample:
Combining a variety of sampling methods
Poor
(Biased)
Sampling
Methods
Voluntary Response Sampling:
Individuals are asked to provide information, and all
who respond are counted.
Convenience
Sampling:
Selects individuals that are easiest to reach.
Example:
A survey is to be taken to ascertain student opinions about the quality of teaching at a
high school. Below are some possible methods for picking 100 students out of the 2000 students
registered at the school. For each, indicate what kind of sampling strategy is used.
a)
Using an official school roster of the 2000 students, pick every 20
th
name.
b)
Separate the students by class (freshman, sophomore, junior, senior). Pick a simple
random sample of size 25 from each of the four groups, and then combine these students
into one sample of size 100.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
c)
Using a random number table, 100 names are chosen from the list of 2000
students registered at the high school.
d)
Separate the students by home-room assigned (each home-room has 25 students). Pick a
random sample of four home-rooms. If a home-room is selected, all students in that
home-room are included in the sample.
Sampling
Bias
Sampling Bias
is a tendency to favor selecting people or elements that have a particular
characteristic or set of characteristics. A poor sampling plan is usually to blame for sampling
bias.
Reducing Bias
The best strategy is to select individuals from our population at
random
. On average, the
sample we take will look very similar to the population we drew from
The
more
individuals you take in your
sample
, the more information you will have,
and the better your estimation will be.
Note: If your sampling method is flawed, taking more people into your sample will not
reduce bias.
Types of Sampling Bias
Undercoverage:
When some groups in the population are left out of the process of
choosing the sample.
Nonresponse
Bias:
When an individual chosen for the sample can’t be contacted or
refuses to cooperate
Response
Bias:
The behavior of the respondent or of the interviewer influences the
outcome of the survey or questionnair
Experiments
and Observational
Studies
Experiment vs. Observational Study
An
observational study
observes individuals and measures variables of interest but
does
not
attempt to influence the responses. The purpose of an observational study is to
describe some group or situation.
An
experiment
deliberately imposes some treatment
on individuals in order to observe
their responses. The purpose of an experiment is to study whether the treatment causes a
change in the response.
Example:
Recently a group of adults who swim regularly for exercise were evaluated for depression. It
turned out that these swimmers were less likely to be depressed than the general population. In a
second study, a group of 100 volunteers was randomly divided into two groups. The first group
was asked to swim twice a week for six months, the second group did not follow an exercise
plan. Which of the following is correct?
A.
The first study was an experiment while the second was an observational study.
B.
The first study was an observational study while the second was an experiment.
C.
Both studies were observational studies.
D.
Both studies were experiments.
Basic
Terminology
Experimental units:
The items or subjects being given the treatment. These CAN vary
from observational units!
Factor:
The
explanatory (independent) variables
that are thought to influence the
response (outcome/dependent) variable
studied.
Levels:
The specific values chosen for a factor.
Treatment:
A specific condition applied to the subjects/experimental units created from
the combination of specific levels for all the factors.
Control Group:
The group of individuals/experimental units given no treatment.
Placebo:
A dummy treatment, sometimes a sugar pill, distilled water or saline solution.
Why is it necessary to have a placebo as opposed to no treatment?
Blinding:
In an experiment, it is desirable to keep the information about the treatments hidden
from the patients and anyone involved with evaluating the patient. This is known as
blinding
.
(“Double” if both patients and evaluators are unaware, “Single” if only the patient is unaware).
Blinding prevents conscious or subconscious biases or expectations from influencing the
outcome of the study
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Basic
Principles
of
Experimental
Design
Randomization:
To ensure that we do not impose personal bias in the selection process,
and use enough subjects in each group to reduce chance variation in the results.
Control Groups:
Using
control groups
helps to ensure that we account for known
factors that could affect a study’s results. Researchers, however, may be unaware of
important factors and not account for them in the experiment.
Replication:
To ensure we get the same results repeatedly (and not just by chance).
Lurking
Versus
Confounding
Variables
Lurking variable:
It is a variable that is not among the explanatory or response
variables, but influences the interpretation of the relationship
Example: Ice cream consumption and shark attacks are highly correlated!
Does ice cream consumption cause an increase the number of shark attacks or do shark
attacks increase the demand for ice cream? What could be the lurking variable?
Confounding variable:
Additional explanatory variable that affects the response but is
not considered when exploring the explanatory/response relationship.
Example: Smoking and lung cancer.
In a certain city, many men were developing lung cancer at a much higher rate compared
to other surrounding cities. Many of the men worked in the local asbestos mines. They
were therefore exposed to asbestos, which is a known risk for lung cancer. It is also
known that, because of the stress miners are exposed to, they tend to smoke more,
especially when working underground. Smoking is related to lung cancer, mining is
related to lung cancer as well.
Example:
Identify either the confounding or lurking variable. You gather a sample of people of various
heights and ask them to shoot baskets from the foul line. The number of baskets they make out of
ten shots will be your measure of basketball ability. You find that tall people did make better
basketball players than short people. Is height the only factor that affects ability?
Related Documents
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt