Topic 3.3_ MATHX402-032 Math for Management

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

X402

Subject

Mathematics

Date

Feb 20, 2024

Type

pdf

Pages

10

Uploaded by LieutenantSalmonPerson861

Report
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 1/10 Topic 3.3 Return to Module 3 (https://onlinelearning.berkeley.edu/courses/2072192/pages/module-03) Topic 3.3: Data Collection Introduction When attempting to conduct a study, you want your results to be as valid and meaningful as possible. In order to extract meaningful information from your data, you must be sure you’ve gathered your data using the best methods. When you conduct a study and gather data in the most intellectual way, your results hold more validity. There are six steps to good data collection. In sections 3.1 and 3.2, we looked at the first three in more depth. Recall these were: 1. Defining the objective of the study and identifying the type of research question; 2. Determining the population of interest; and 3. Identifying the variables (quantitative or categorical) that will be used in the study, as well as the response and explanatory variables. In this section, we’ll look at the last three steps to good data collection. 4. Determining whether to use a research technique (observational study) or experimental technique (experimental study); 5. Determining the survey/sampling technique; and 6. Collecting data appropriately by avoiding error and bias. Types Of Research Studies There are two types of research studies: observational and experimental. Depending on the goals of a statistical study, we choose the appropriate type to use. An observational study is a study in which the researcher observes (much like the name suggests) and records the sample data. In regards to how the sample is selected, the sample should be a simple random sample. A random sample or random selection is a sample in which every entity in the sample has the same chance of being chosen from the population. For example, using a computer to randomly generate the names of 200 people in a town is considered a simple random sample because each person has an equal chance of being selected. The sample has to be a credible representative of the population of interest otherwise the study has no basis for accurate or valid results. It should be noted that in an observational study, the researcher is not allowed to alter any variable from those being sampled; the researcher can simply observe and record.
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 2/10 Example from this module’s scenario: John, our researcher from the scenario, wishes to determine whether the profit achieved is due to the incurred investment or not. An observational study can only attribute an association between the explanatory variable (investment) and the response variable (profit achieved). We can’t manipulate the explanatory variable in an observational study, and therefore we can’t adjust for the confounding variables that might be present. For instance, here the capital assets are a confounding variable and some profit might have been achieved due to them, but we don’t know for sure. Since we cannot manipulate the data, we can only determine whether there is an association between the two variables (investment and profit achieved) but we cannot definitively conclude whether the investment causes the profit or vice versa. An experimental study is a study in which a researcher manipulates the research scenario and then collects the data according to the objective of the research. As in observational studies, the sample is a simple random sample. From the definitions, you should see that the basic difference between the two types of studies is that in an observational study, the researcher is not allowed to manipulate anything; he/she can simply watch and record. (Think: how many businesses have at least 75 parking spaces for customers.) In experimental, the researcher is allowed to intervene and change a variable in order to gather data. (Think: if we hire a company to do landscaping around twenty banks, does the number of customers increase.) Example from this module’s scenario: John could plan to collect customized data from the companies in his targeted population. He could direct a group of certain companies to use only monetary investments (rather than capital assets) for industry transactions/ventures for ten years in order to determine the cause and effect between the explanatory variable (investment) and response variable (profit achieved). Here the researcher has adjusted for the confounding variable in the original dataset, so he has changed or manipulated a variable. He has equalized the effect of the confounding variable for each group. A double-blind study is generally superior to a single-blind study. The double-blind design keeps both researchers and participants in the dark as to who is receiving which treatment. This last part is important because it prevents the researchers from unintentionally tipping off the study participants, or unconsciously biasing their evaluation of the results. A randomized control experiment is a randomized experimental study that has a case control structure. There are two groups involved: The case group is the group that is exposed to the treatment whose effect is being assessed; and the control group is a group that is not exposed to the treatment being assessed. In this special type of experiment, the sample is broken into two groups, the case group and the control group. The case group is exposed to some type of treatment, while the control group is kept stable. To conduct a randomized control experiment, random selection is used twice. First, a random sample is chosen from the population. From the sample, the case and control groups are also randomly
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 3/10 assigned. We do this because then each group is more indicative of the entire population. Had the sample or the groups not been randomly selected, the results of the study wouldn’t be as valid. It has been proven that a randomized controlled experiment is the best methodology to obtain substantive results. For this reason, we see randomized controlled experiments more often than not, especially in the medical field. Example from this module’s scenario : If John wishes to determine whether investments are accruing to profits, he can instruct a group of companies to use investments and capital assets for company transactions (Case group) and instruct the other group to use only capital assets for transactions (Control group). Since all the other variables except investments are the same, any difference in the profit accrued would be due to the incurred investment. Since he randomly chose these two groups, they are quantitatively comparable. In the chart below you can see the difference and similarities between observational and experimental studies. When deciding which to use, you should consider whether a variable needs to be manipulated. If the answer is yes, then use an experiment. If the answer is no, then an observation will suffice. Research Studies Comparison Chart Characteristics Observational Studies Experimental Studies Manipulation of variables The data is collected for the research study passively without any manipulation of explanatory variables. The corresponding response variable value is recorded. The data is collected for the research study in experimental settings, the explanatory variables are manipulated actively. The corresponding response variable is recorded. Example Collecting data to analyze the amount of aspirin consumed and heart attack incidence in a group of individuals. Changing the amount of aspirin in a group of individuals and monitoring the heart attack incidence in this group. Causation and Correlation The sample should be randomly selected. It does not provide a causation (Cause and Effect) proof. It provides the generalization of sample-generated conclusions to the population. The observational study only describes If we wish to determine the effect of aspirin on incidence of heart attack, then various confounding variables like weight, smoking, age, gender, etc., have to be adjusted in order to determine a causal relationship.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 4/10 an association between two variables. Ethical concerns where studies need to be done without making people change their lifestyle, for instance making them smoke to check their weight gain etc, require researchers to conduct Observational Studies as opposed to experimental ones. If the sample is selected randomly then we can attribute the causal relationship with a degree of confidence. If the assignment of the sample is done randomly into case control groups then we can generalize sample-generated conclusions to the population. Progress Check Consider this module’s scenario and John’s main goal to find out which of the four companies has the best investment profile. Would his analysis be an observational study or an experimental study? Answer: Observational Study Progress Check In this module’s scenario, could maximum profit sector be a confounding variable? Answer: Yes Progress Check Using random selection and random assignment allows you to do what two things? Answer: 1) You can generalize the sample conclusion to the population. 2) You can attribute cause and effect to the study. Sampling As we touched on in Section 3.1, sampling involves taking a sample from the population, whereas taking a census means to survey every item in the population. For sampling to be effective: The sample must have a sound design for us to trust the analysis obtained from it. This will allow the conclusion to be generalized to the population. An inappropriate sample can cause an entire study to be invalid, so we try to create the best sample possible. Internet, phone, and other technologies makes it difficult to conduct effective sampling because they can introduce types of bias (we will discuss this later). New technologies, like social media, have made getting a completely unbiased sample more and more difficult.
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 5/10 We must have an accurate determination of the population that is being surveyed for the study and the exact measurable variables like the mean, median, percentile, etc. Knowing these features will allow us to choose the best sample possible, because we will know which samples best represent the entire population. To determine the sample, we generally complete the following steps. Sampling steps 1. Determine (in detail) the population to be surveyed. 2. Determine which variables to measure. 3. Decide the sampling design for collecting data. The sampling design (or sampling technique) is the method we use to choose the sample from the population. This can range from the simple random sampling that we’ve discussed, to a more clustered sample where only select groups are used as a sample. Types of Sampling We’ve discussed before the concept of a simple random sample . Recall that this is a sample in which every individual in the population has an equal chance of being selected. There are a few reasons why simple random sampling is so important: This sampling technique is an unbiased one (more on bias below.) It is trustworthy for inference purposes by laws of probability. Each member of the population has an equal chance of being selected, so the sample is a good representation. Random samples come with a margin of error due to the sample-to-sample variations. The margin of error for random samples is often smaller than with other sampling techniques. Larger samples give better information about the population, meaning, if our random sample is large, the sample is better. Sampling types can be categorized as one of two types: One category of sampling is probability sampling . This is a sampling in which the probability of selecting one individual is the same as the probability of selecting any other individual in the population. These include: Simple Random Sample, Stratified Random Sample, Cluster Random Sampling, Systematic Random Sampling, and Multistage Sampling. The overarching idea is probability sampling, and then within that idea we have the different sampling techniques. The other category of sampling is non-probability sampling . This is a sampling in which the probability of selecting one individual is not the same as the probability of selecting any other individual in the population. These include: Convenience Sampling and Voluntary Sampling. Typically, non-probability sampling is seen as the lesser of the two because it might not give the best representation of the population. However, for certain situations, such as needing to ask for volunteers, non-probability sampling might be the only choice.
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 6/10 Below we describe the different sampling types along with an example and the pros and cons of each. Although it may seem like some techniques should never be used, in certain instances, the researcher has no choice. Probability Sampling Techniques Definition Example Advantages and Disadvantages SIMPLE RANDOM SAMPLING Every member in the population has an equal chance of being selected in the sample. The lottery method or using random numbers This is an unbiased way of sampling, though it is not effective if, by chance, the major population is not represented. SYSTEMATIC RANDOM SAMPLING The population is numbered and then every kth element is chosen to create a systematic sample. Checking every 10th potato chip bag from the assembly line. This is an unbiased way of sampling though it is not effective if the selection follows a pattern STRATIFIED RANDOM SAMPLING The population is divided into groups/stratas. From each strata a random sample is taken. Poll data collected from all states. In this case, each state is one stratum. Useful If the stratas are different, but the elements are similar in each of the stratas. All stratas need to be well represented CLUSTER RANDOM SAMPLING Population is divided into clusters, where each cluster has similar elements. A few clusters are then chosen using random sampling Poll data collected from few states, but not all. In this case, each state is a cluster. Is economically advantageous, but it might ignore the diversity in the cluster itself if simple random sample is taken from a cluster. MULTISTAGE RANDOM SAMPLING If two or more random sampling methods are used for sampling purposes. If we are surveying SAT scores of a state, we might choose a stratified sample to choose districts and then, within the chosen districts, In situations where one sampling type doesn't "fit", a multistage random sampling can be used to customize a sampling type. However,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 7/10 we might use a cluster sample to choose the schools. Then finally we might use simple random sample technique to choose a subset of schools for the final analysis. this method might be overused and result in a biased sample. Non-Probability Sampling Techniques Definition Example Advantages and Disadvantages VOLUNTARY RESPONSE SAMPLING The samples from the population are chosen as per the voluntary response of the participants. Voluntary presidential election poll. Useful if expense is the criterion, but this method may not be representative of the population. Personal bias could come into play. CONVENIENCE SAMPLING The samples from the population are chosen as per the convenience of the researcher. Choosing the first 10 students in the hallway for an opinion poll. Useful if there is no intrinsic difference in the population and the sample. Personal bias could affect the sample entities chosen. Inference The process of drawing conclusions about the population from the sample statistics is called Inference . You may have noticed that most statistics books are titled “Inferential Statistics”. This is because the goal is to take a sample, measure the desired variables, and then draw conclusions. The process of inference plays a large part in how studies are designed and statistics are formed. Sampling Errors When taking samples, we expect sampling errors . This is the error due to the difference between samples and can be attributed to chance variation. Because no two samples are equal, and no sample is perfect, we do expect some error to occur. Chance variations are variations that are not significant and do not attribute a difference due to the treatment. In this module’s scenario, returning to our experimental design using randomized control, if
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 8/10 the profit due to investment and capital assets (case group) is significantly higher than the profit obtained due to just capital assets (control group) then the variations are statistically significant and not due to minor sample-to-sample chance variations. As with this example, it is customary and important to analyze why and to what degree these variations occur. Another type of error is a non-sampling error. These are the errors due to inaccurate data collection (sampling technique), recording and analysis. An example would be if a researcher recorded incorrect values of a variable. In this case, the researcher is at fault. Progress Check If we grouped the companies in this module’s scenario by business sector, what sampling technique would that be? Answer: Stratified sampling Progress Check If we collect the business data from only the companies in our state because they were the easiest to contact, what sampling technique would that be? Answer: Convenience sampling Types of Bias Most of us know the word “bias” from its everyday meaning: “a tendency to believe that some people, ideas, etc., are better than others that usually results in treating some people unfairly” (Merriam- Webster Dictionary). In statistics, the concept is very similar (Math doesn’t strive to change English!). We say that a study is biased when it underestimates or overestimates the true values. This bias could be due to any one of a number of reasons. Self Interest Bias is a bias that can be identified when the researcher collects data in a partial way that will benefit his/her cause, often at a cost to others. One example is of a vehicle insurance worker collecting a sample from his colleagues at the insurance provider to study the fairness of current insurance rates. We need to be careful of self-interest bias, as many people will conduct a study with their own personal gains in mind, which can render the study completely useless. Voluntary Response Bias is a bias that can be identified when a researcher collects data in a manner that is convenient for him/her. An example is a researcher who checks student opinions regarding university policies from students who enter the library on one day of the week. You might wonder, why would a researcher ever use voluntary response? The answer is because sometimes that’s the only choice. Consider a satisfaction survey sent out to an entire company. The researcher can’t typically require all employees fill out the survey, so he must rely on voluntary response.
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 9/10 Non Response Bias is a bias that is seen when the sample collected contains unanswered responses. For example, if a researcher collects a multiple choice survey from students regarding the strengths and weaknesses of the instructor and the survey has unanswered questions when collected, it causes non response bias. Leading Question Bias is a bias that comes forth if the survey questions lead individuals to answer them as they would not wish to. An example a leading survey question might be: Do you prefer a vegetarian diet or a non vegetarian diet? A vegan person might not answer this question appropriately. Sometimes these types of questions are there because researchers are really hoping for a certain outcome. Either way, this is biased. Social Acceptability Bias is a bias that is observed when a sample subject does not answer a question honestly due to the fear of social disapproval. For example, a person might not honestly answer a question like, “Do you support abortion?” because it might not be socially acceptable to answer how they truly feel. Sampling Bias is a bias that occurs if the sample collected is not representative of the population. For example, if the behavior of whales is being monitored and the sample does not include whales from a representative set of regions of the world, this sample would be biased due to sampling. The sample always needs to be the best representation that is possible. Undercoverage Bias happens when some groups in the population are not included in the sampling process. For instance, a home phone response survey would miss individuals who don’t have a landline, creating bias in the study. From this list you can see there are many types of bias. A quality study attempts to remove as much bias as possible so that the results are a more accurate representation of the population. That being said, it seems that it’s almost impossible to create a study or experiment that is 100% bias free. The goal of a good researcher is to make the bias as minimal as possible. Progress Check Q: In this module’s scenario, if we collect a sample for only one division (say, IT), which type of bias is in our study? Answer: Undercoverage bias because we have not chosen a sample that represents the population. Summary: Data Collection Steps In summary, these are the steps to gathering data intelligently in order to extract meaningful information: 1. Define the objective of the study and identify the type of research question. 2. Determine the population of interest. 3. Identify the variables (quantitative or categorical) that will be used in the study and identify the response and explanatory variables as well. 4. Determine whether to use an observational study or experimental study.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/9/23, 6:55 AM Topic 3.3: MATHX402-032 Math for Management https://onlinelearning.berkeley.edu/courses/2072192/pages/topic-3-dot-3?module_item_id=96528811 10/10 5. Determine the survey/sampling technique. 6. Collect data appropriately to make as representative a sample as possible and to minimize bias. Return to Module 3 (https://onlinelearning.berkeley.edu/courses/2072192/pages/module-03)