Topic 3.3_ MATHX402-032 Math for Management
pdf
keyboard_arrow_up
School
University of California, Los Angeles *
*We aren’t endorsed by this school
Course
X402
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
10
Uploaded by LieutenantSalmonPerson861
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
1/10
Topic 3.3
Return to Module 3
(https://onlinelearning.berkeley.edu/courses/2072192/pages/module-03)
Topic 3.3: Data Collection
Introduction
When attempting to conduct a study, you want your results to be as valid and meaningful as possible.
In order to extract meaningful information from your data, you must be sure you’ve gathered your
data using the best methods. When you conduct a study and gather data in the most intellectual way,
your results hold more validity.
There are six steps to good data collection. In sections 3.1 and 3.2, we looked at the first three in
more depth. Recall these were:
1. Defining the objective of the study and identifying the type of research question;
2. Determining the population of interest; and
3. Identifying the variables (quantitative or categorical) that will be used in the study, as well as the
response and explanatory variables.
In this section, we’ll look at the last three steps to good data collection.
4. Determining whether to use a research technique (observational study) or experimental technique
(experimental study);
5. Determining the survey/sampling technique; and
6. Collecting data appropriately by avoiding error and bias.
Types Of Research Studies
There are two types of research studies: observational and experimental. Depending on the goals of
a statistical study, we choose the appropriate type to use.
An observational study
is a study in which the researcher observes (much like the name suggests)
and records the sample data. In regards to how the sample is selected, the sample should be a
simple random sample. A random sample
or random selection
is a sample in which every entity in
the sample has the same chance of being chosen from the population. For example, using a
computer to randomly generate the names of 200 people in a town is considered a simple random
sample because each person has an equal chance of being selected. The sample has to be a
credible representative of the population of interest otherwise the study has no basis for accurate or
valid results. It should be noted that in an observational study, the researcher is not allowed to alter
any variable from those being sampled; the researcher can simply observe and record.
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
2/10
Example from this module’s scenario:
John, our researcher from the scenario, wishes to
determine whether the profit achieved is due to the incurred investment or not. An observational
study can only attribute an association between the explanatory variable (investment) and the
response variable (profit achieved). We can’t manipulate the explanatory variable in an
observational study, and therefore we can’t adjust for the confounding variables that might be
present. For instance, here the capital assets are a confounding variable and some profit might
have been achieved due to them, but we don’t know for sure. Since we cannot manipulate the
data, we can only determine whether there is an association between the two variables
(investment and profit achieved) but we cannot definitively conclude whether the investment
causes the profit or vice versa.
An experimental study
is a study in which a researcher manipulates the research scenario and then
collects the data according to the objective of the research. As in observational studies, the sample is
a simple random sample. From the definitions, you should see that the basic difference between the
two types of studies is that in an observational study, the researcher is not allowed to manipulate
anything; he/she can simply watch and record. (Think: how many businesses have at least 75
parking spaces for customers.) In experimental, the researcher is allowed to intervene and change a
variable in order to gather data. (Think: if we hire a company to do landscaping around twenty banks,
does the number of customers increase.)
Example from this module’s scenario:
John could plan to collect customized data from the
companies in his targeted population. He could direct a group of certain companies to use only
monetary investments (rather than capital assets) for industry transactions/ventures for ten years
in order to determine the cause and effect between the explanatory variable (investment) and
response variable (profit achieved). Here the researcher has adjusted for the confounding
variable in the original dataset, so he has changed or manipulated a variable. He has equalized
the effect of the confounding variable for each group.
A double-blind study is generally superior to a single-blind study. The double-blind design keeps both
researchers and participants in the dark as to who is receiving which treatment. This last part is
important because it prevents the researchers from unintentionally tipping off the study participants,
or unconsciously biasing their evaluation of the results.
A randomized control experiment
is a randomized experimental study that has a case control
structure. There are two groups involved: The case group
is the group that is exposed to the
treatment whose effect is being assessed; and the control group
is a group that is not
exposed to the
treatment being assessed. In this special type of experiment, the sample is broken into two groups,
the case group and the control group. The case group is exposed to some type of treatment, while
the control group is kept stable.
To conduct a randomized control experiment, random selection is used twice. First, a random sample
is chosen from the population. From the sample, the case and control groups are also randomly
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
3/10
assigned. We do this because then each group is more indicative of the entire population. Had the
sample or the groups not been randomly selected, the results of the study wouldn’t be as valid.
It has been proven that a randomized controlled experiment is the best methodology to obtain
substantive results. For this reason, we see randomized controlled experiments more often than not,
especially in the medical field.
Example from this module’s scenario
: If John wishes to determine whether investments are
accruing to profits, he can instruct a group of companies to use investments and capital assets for
company transactions (Case group) and instruct the other group to use only capital assets for
transactions (Control group). Since all the other variables except investments are the same, any
difference in the profit accrued would be due to the incurred investment. Since he randomly chose
these two groups, they are quantitatively comparable.
In the chart below you can see the difference and similarities between observational and
experimental studies. When deciding which to use, you should consider whether a variable needs to
be manipulated. If the answer is yes, then use an experiment. If the answer is no, then an
observation will suffice.
Research Studies Comparison Chart
Characteristics
Observational Studies
Experimental Studies
Manipulation of
variables
The data is collected for the
research study passively without
any manipulation of explanatory
variables. The corresponding
response variable value is
recorded.
The data is collected for the
research study in experimental
settings, the explanatory variables
are manipulated actively. The
corresponding response variable is
recorded.
Example
Collecting data to analyze the
amount of aspirin consumed and
heart attack incidence in a group of
individuals.
Changing the amount of aspirin in
a group of individuals and
monitoring the heart attack
incidence in this group.
Causation and
Correlation
The sample should be randomly
selected. It does not provide a
causation (Cause and Effect)
proof. It provides the generalization
of sample-generated conclusions
to the population. The
observational study only describes
If we wish to determine the effect
of aspirin on incidence of heart
attack, then various confounding
variables like weight, smoking,
age, gender, etc., have to be
adjusted in order to determine a
causal relationship.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
4/10
an association between two
variables.
Ethical concerns where studies
need to be done without making
people change their lifestyle, for
instance making them smoke to
check their weight gain etc, require
researchers to conduct
Observational Studies as opposed
to experimental ones.
If the sample is selected randomly
then we can attribute the causal
relationship with a degree of
confidence. If the assignment of
the sample is done randomly into
case control groups then we can
generalize sample-generated
conclusions to the population.
Progress Check
Consider this module’s scenario and John’s main goal to find out which of the four companies has the
best investment profile. Would his analysis be an observational study or an experimental study?
Answer: Observational Study
Progress Check
In this module’s scenario, could maximum profit sector be a confounding variable?
Answer: Yes
Progress Check
Using random selection and random assignment allows you to do what two things?
Answer: 1) You can generalize the sample conclusion to the population. 2) You can attribute cause
and effect to the study.
Sampling
As we touched on in Section 3.1, sampling involves taking a sample from the population, whereas
taking a census means to survey every item in the population.
For sampling to be effective:
The sample must have a sound design for us to trust the analysis obtained from it. This will allow
the conclusion to be generalized to the population. An inappropriate sample can cause an entire
study to be invalid, so we try to create the best sample possible. Internet, phone, and other
technologies makes it difficult to conduct effective sampling because they can introduce types of
bias (we will discuss this later). New technologies, like social media, have made getting a
completely unbiased sample more and more difficult.
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
5/10
We must have an accurate determination of the population that is being surveyed for the study
and the exact measurable variables like the mean, median, percentile, etc. Knowing these
features will allow us to choose the best sample possible, because we will know which samples
best represent the entire population.
To determine the sample, we generally complete the following steps.
Sampling steps
1. Determine (in detail) the population to be surveyed.
2. Determine which variables to measure.
3. Decide the sampling design for collecting data.
The sampling design (or sampling technique)
is the method we use to choose the sample from
the population. This can range from the simple random sampling that we’ve discussed, to a more
clustered sample where only select groups are used as a sample.
Types of Sampling
We’ve discussed before the concept of a simple random sample
. Recall that this is a sample in
which every individual in the population has an equal chance of being selected. There are a few
reasons why simple random sampling is so important:
This sampling technique is an unbiased one (more on bias
below.)
It is trustworthy for inference purposes by laws of probability. Each member of the population has
an equal chance of being selected, so the sample is a good representation.
Random samples come with a margin of error due to the sample-to-sample variations. The
margin of error for random samples is often smaller than with other sampling techniques.
Larger samples give better information about the population, meaning, if our random sample is
large, the sample is better.
Sampling types can be categorized as one of two types:
One category of sampling is probability sampling
. This is a sampling in which the probability of
selecting one individual is the same as the probability of selecting any other individual in the
population. These include: Simple Random Sample, Stratified Random Sample, Cluster Random
Sampling, Systematic Random Sampling, and Multistage Sampling. The overarching idea is
probability sampling, and then within that idea we have the different sampling techniques.
The other category of sampling is non-probability sampling
. This is a sampling in which the
probability of selecting one individual is not the same as the probability of selecting any other
individual in the population. These include: Convenience Sampling and Voluntary Sampling. Typically,
non-probability sampling is seen as the lesser of the two because it might not give the best
representation of the population. However, for certain situations, such as needing to ask for
volunteers, non-probability sampling might be the only choice.
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
6/10
Below we describe the different sampling types along with an example and the pros and cons of
each. Although it may seem like some techniques should never be used, in certain instances, the
researcher has no choice.
Probability Sampling Techniques
Definition
Example
Advantages and
Disadvantages
SIMPLE RANDOM SAMPLING
Every member in the population
has an equal chance of being
selected in the sample.
The lottery method or using
random numbers
This is an unbiased way of
sampling, though it is not
effective if, by chance, the
major population is not
represented.
SYSTEMATIC RANDOM
SAMPLING
The population is numbered and
then every kth element is chosen
to create a systematic sample.
Checking every 10th potato
chip bag from the assembly
line.
This is an unbiased way of
sampling though it is not
effective if the selection
follows a pattern
STRATIFIED RANDOM
SAMPLING
The population is divided into
groups/stratas. From each strata
a random sample is taken.
Poll data collected from all
states. In this case, each
state is one stratum.
Useful If the stratas are
different, but the elements
are similar in each of the
stratas. All stratas need to
be well represented
CLUSTER RANDOM
SAMPLING
Population is divided into
clusters, where each cluster has
similar elements. A few clusters
are then chosen using random
sampling
Poll data collected from few
states, but not all. In this
case, each state is a cluster.
Is economically
advantageous, but it might
ignore the diversity in the
cluster itself if simple
random sample is taken
from a cluster.
MULTISTAGE RANDOM
SAMPLING
If two or more random sampling
methods are used for sampling
purposes.
If we are surveying SAT
scores of a state, we might
choose a stratified sample
to choose districts and then,
within the chosen districts,
In situations where one
sampling type doesn't "fit", a
multistage random sampling
can be used to customize a
sampling type. However,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
7/10
we might use a cluster
sample to choose the
schools. Then finally we
might use simple random
sample technique to choose
a subset of schools for the
final analysis.
this method might be
overused and result in a
biased sample.
Non-Probability Sampling Techniques
Definition
Example
Advantages and
Disadvantages
VOLUNTARY RESPONSE
SAMPLING
The samples from the population
are chosen as per the voluntary
response of the participants.
Voluntary presidential
election poll.
Useful if expense is the
criterion, but this method
may not be representative
of the population. Personal
bias could come into play.
CONVENIENCE SAMPLING
The samples from the population
are chosen as per the
convenience of the researcher.
Choosing the first 10
students in the hallway for
an opinion poll.
Useful if there is no intrinsic
difference in the population
and the sample. Personal
bias could affect the sample
entities chosen.
Inference
The process of drawing conclusions about the population from the sample statistics is
called Inference
. You may have noticed that most statistics books are titled “Inferential Statistics”.
This is because the goal is to take a sample, measure the desired variables, and then draw
conclusions. The process of inference plays a large part in how studies are designed and statistics
are formed.
Sampling Errors
When taking samples, we expect sampling errors
. This is the error due to the difference between
samples and can be attributed to chance variation. Because no two samples are equal, and no
sample is perfect, we do expect some error to occur.
Chance variations
are variations that are not significant and do not attribute a difference due to the
treatment. In this module’s scenario, returning to our experimental design using randomized control, if
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
8/10
the profit due to investment and capital assets (case group) is significantly higher than the profit
obtained due to just capital assets (control group) then the variations are statistically significant and
not due to minor sample-to-sample chance variations. As with this example, it is customary and
important to analyze why and to what degree these variations occur.
Another type of error is a non-sampling error.
These are the errors due to inaccurate data collection
(sampling technique), recording and analysis. An example would be if a researcher
recorded incorrect values of a variable. In this case, the researcher is at fault.
Progress Check
If we grouped the companies in this module’s scenario by business sector, what sampling technique
would that be?
Answer: Stratified sampling
Progress Check
If we collect the business data from only the companies in our state because they were the easiest to
contact, what sampling technique would that be?
Answer: Convenience sampling
Types of Bias
Most of us know the word “bias” from its everyday meaning: “a tendency to believe that some people,
ideas, etc., are better than others that usually results in treating some people unfairly” (Merriam-
Webster Dictionary). In statistics, the concept is very similar (Math doesn’t strive to change English!).
We say that a study is biased when it underestimates or overestimates the true values. This bias
could be due to any one of a number of reasons.
Self Interest Bias
is a bias that can be identified when the researcher collects data in a partial
way that will benefit his/her cause, often at a cost to others. One example is of a vehicle
insurance worker collecting a sample from his colleagues at the insurance provider to study the
fairness of current insurance rates. We need to be careful of self-interest bias, as many people
will conduct a study with their own personal gains in mind, which can render the study completely
useless.
Voluntary Response Bias
is a bias that can be identified when a researcher collects data in a
manner that is convenient for him/her. An example is a researcher who checks student opinions
regarding university policies from students who enter the library on one day of the week. You
might wonder, why would a researcher ever use voluntary response? The answer is because
sometimes that’s the only choice. Consider a satisfaction survey sent out to an entire company.
The researcher can’t typically require all employees fill out the survey, so he must rely on
voluntary response.
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
9/10
Non Response Bias
is a bias that is seen when the sample collected contains unanswered
responses. For example, if a researcher collects a multiple choice survey from students regarding
the strengths and weaknesses of the instructor and the survey has unanswered questions when
collected, it causes non response bias.
Leading Question Bias
is a bias that comes forth if the survey questions lead individuals to
answer them as they would not wish to. An example a leading survey question might be: Do you
prefer a vegetarian diet or a non vegetarian diet? A vegan person might not answer this question
appropriately. Sometimes these types of questions are there because researchers are really
hoping for a certain outcome. Either way, this is biased.
Social Acceptability Bias
is a bias that is observed when a sample subject does not answer a
question honestly due to the fear of social disapproval. For example, a person might not honestly
answer a question like, “Do you support abortion?” because it might not be socially acceptable to
answer how they truly feel.
Sampling Bias
is a bias that occurs if the sample collected is not representative of the
population. For example, if the behavior of whales is being monitored and the sample does not
include whales from a representative set of regions of the world, this sample would be biased due
to sampling. The sample always needs to be the best representation that is possible.
Undercoverage Bias
happens when some groups in the population are not included in the
sampling process. For instance, a home phone response survey would miss individuals who don’t
have a landline, creating bias in the study.
From this list you can see there are many types of bias. A quality study attempts to remove as much
bias as possible so that the results are a more accurate representation of the population. That being
said, it seems that it’s almost impossible to create a study or experiment that is 100% bias free. The
goal of a good researcher is to make the bias as minimal as possible.
Progress Check
Q: In this module’s scenario, if we collect a sample for only one division (say, IT), which type of bias is
in our study?
Answer: Undercoverage bias because we have not chosen a sample that represents the population.
Summary: Data Collection Steps
In summary, these are the steps to gathering data intelligently in order to extract meaningful
information:
1. Define the objective of the study and identify the type of research question.
2. Determine the population of interest.
3. Identify the variables (quantitative or categorical) that will be used in the study and identify the
response and explanatory variables as well.
4. Determine whether to use an observational study or experimental study.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/9/23, 6:55 AM
Topic 3.3: MATHX402-032 Math for Management
https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811
10/10
5. Determine the survey/sampling technique.
6. Collect data appropriately to make as representative a sample as possible and to minimize bias.
Return to Module 3
(https://onlinelearning.berkeley.edu/courses/2072192/pages/module-03)