Econ+15B+--+6+--+Final+--+March+16%2C+2022 (1)

docx

School

Columbia University *

*We aren’t endorsed by this school

Course

15B

Subject

Economics

Date

May 31, 2024

Type

docx

Pages

5

Uploaded by DeaconHare3679

Report
NAME: _________________________________ I.D.: _____________________________________ FINAL PROBABILITY AND STATISTICS, ECON 15B MARCH 16, 2022 1. For the following seven (7) experiments/questions, pick the most appropriate statistical test. You have the following statistical tests as choices: some may be used more than once, others not at all. Assume homogeneity of variance (where applicable) and the validity of parametric tests (where applicable), unless something is directly stated (e.g., “the data are not at all normal”) or otherwise indicated (viz., by the inspection of the data) which would indicate a strong and obvious violation of an assumption. This means you must inspect the data for violations of all assumptions. Unless it is stated that the population parameter is known, assume it isn’t. Please do not concern yourself with any intervening variable that you may perceive. Finally, please don’t think about what the data would look like in reality. Assume that the question represents reality. One sample z-test used to determine whether the mean of a single sample differs significantly from a known population mean when the population standard deviation is known. One sample t- test used to determine whether the mean of a single sample differs significantly from a known population mean. However, when the population standard deviation is unknown and must be estimated from the sample. t- test for difference between means for two related samples used to determine whether the means of two related samples differ significantly. It is typically used when the samples are related in some way, such as before and after measurements or matched pairs. t-test for the difference between means for two independent samples with homogeneity of variance used to determine whether the means of two independent samples differ significantly. It assumes that the variances of the two populations are equal. t-test for the difference between means for two independent samples with heterogeneity of variance used to determine whether the means of two independent samples differ significantly. However, it does NOT assume equal variances between the populations. a one sample z-test for proportions used to determine whether the proportion of a single sample differs significantly from a known population proportion. It is commonly used in hypothesis testing for categorical data. chi-square goodness of fit used to determine whether the distribution of categorical data differs significantly from a hypothesized distribution. It is often used when dealing with categorical variables with more than two categories . two-sample z-test for the difference between proportions used to determine whether the difference between proportions in two independent samples differs significantly. It is commonly used in comparing proportions between two groups. chi-square test of independence used to determine whether there is a significant association between two categorical variables . It is often used to assess relationships between variables in contingency tables . simple regression used to model the relationship between a single independent variable and a dependent variable . It helps to understand how changes in the independent variable are associated with changes in the dependent variable. multiple regression used to model the relationship between multiple independent variables and a dependent variable . It allows for the examination of the combined association of several predictors on the outcome variable. Please simply write the letter for the test as your answer. Here are the tests: A: one sample z-test B: one-sample t-test C: t-test for the difference between means for two related samples D: t-test for the difference between means for two independent samples with homogeneity of variance E: t-test for the difference between means for two independent samples with heterogeneity of variance F: a one sample z-test for proportions (or a chi-square goodness of fit) G: chi-square goodness of fit only (where a one sample z-test of proportions isn’t appropriate) H: a two-sample z-test for the difference between proportions (or a chi-square test of independence) I: chi-square test of independence only (where a two-sample z-test for the difference between proportions isn’t appropriate.) J: simple regression K : multiple regression L: none of the above An experimenter wants to conduct a test on whether people who work in cubicles have more office friendships than people who work in private offices. The experimenter takes a random sample of people from a large company and records whether they work in a cubicle or private office and how many co-workers they have had non-work-related conversations with in the past year. Cubicle workers: 6 26 29 23 5 32 9 3 34 8 Private office workers: 3 19 18 22 3 5 16 2 17 4
two-sample z-test : for the difference between proportions tests whether the difference in the proportions of office friendships between people who work in cubicles and people who work in private offices is statistically significant. H0: There is no difference in the proportion of office friendships between people who work in cubicles and people who work in private offices. HA: There is a difference in the proportion of office friendships between people who work in cubicles and people who work in private offices. Chi-square test of independence examines whether there is a statistically significant relationship between two categorical variables: the type of workspace (cubicle or private office) and the frequency of office friendships. H0: There is no association between workspace type and the frequency of office friendships. HA: There is an association between workspace type and the frequency of office friendships. 2. Please simply write the letter for the test as your answer. Here are the tests: A: one sample z-test B: one-sample t-test C: t-test for the difference between means for two related samples D: t-test for the difference between means for two independent samples with homogeneity of variance E: t-test for the difference between means for two independent samples with heterogeneity of variance F: a one sample z-test for proportions (or a chi-square goodness of fit) G: chi-square goodness of fit only (where a one sample z-test of proportions isn’t appropriate) H: a two-sample z-test for the difference between proportions (or a chi-square test of independence) I: chi-square test of independence only (where a two-sample z-test for the difference between proportions isn’t appropriate.) J: simple regression K : multiple regression L: none of the above Dr. Smith was a great believer in blood pressure as a key indicator of overall health. Her new assistant, Nurse Ben, believed that aa person’s height and weight affect a person’s blood pressure. Nurse Ben wanted to see if this is true. Multiple Regression: Nurse Ben can examine the coefficients associated with height and weight to determine their individual effects on blood pressure, while also assessing the overall fit of the model to the data. Additionally, multiple regression provides information about the strength and direction of the relationships between variables and can help identify potential confounding factors that may influence the relationship between height, weight, and blood pressure. 3. Please simply write the letter for the test as your answer. Here are the tests: A: one sample z-test B: one-sample t-test C: t-test for the difference between means for two related samples D: t-test for the difference between means for two independent samples with homogeneity of variance E: t-test for the difference between means for two independent samples with heterogeneity of variance F: a one sample z-test for proportions (or a chi-square goodness of fit) G: chi-square goodness of fit only (where a one sample z-test of proportions isn’t appropriate) H: a two-sample z-test for the difference between proportions (or a chi-square test of independence) I: chi-square test of independence only (where a two-sample z-test for the difference between proportions isn’t appropriate.) J: simple regression K : multiple regression L: none of the above A scientist has discovered a cave of 7 dinosaur skulls and wants to know if they belong to the rare Zotosaurus family. On average, the Zotosaurus has a 470-inch-long skull. The dinosaur skulls in the cave are 395 inches on average. One sample t-test: A one sample t-test is appropriate in this scenario because the scientist wants to compare the average skull length of the dinosaur skulls found in the cave (395 inches) with the known average skull length of the Zotosaurus family (470 inches). The one sample t-test will assess whether the observed difference in mean skull length between the sample and the known population mean is statistically significant, helping the scientist determine if the skulls found in the cave likely belong to the rare Zotosaurus family. Why not z-test ? In a one-sample z-test, you typically use the population standard deviation, σ, if it is known. This test is suitable when the population standard deviation is known or when the sample size is large enough to approximate the population standard deviation. When the sample size is small and the population standard deviation is unknown, the appropriate test to use is a one-sample t-test. 4. Please simply write the letter for the test as your answer. Here are the tests: A: one sample z-test B: one-sample t-test C: t-test for the difference between means for two related samples D: t-test for the difference between means for two independent samples with homogeneity of variance E: t-test for the difference between means for two independent samples with heterogeneity of variance F: a one sample z-test for proportions (or a chi-square goodness of fit) G: chi-square goodness of fit only (where a one sample z-test of proportions isn’t appropriate) H: a two-sample z-test for the difference between proportions (or a chi-square test of independence) I: chi-square test of independence only (where a two-sample z-test for the difference between proportions isn’t appropriate.) J: simple regression K : multiple regression L: none of the above Does the presence of more species of plants increase the productivity of a natural area, as measured by the total mass of plant material? The number of plant species was recorded as well as the productivity. Simple regression would be the appropriate test in this scenario because it allows us to assess the relationship between two continuous variables: the number of plant species (independent variable) and the productivity of the natural area (dependent variable).
5. Please simply write the letter for the test as your answer. Here are the tests: A: one sample z-test B: one-sample t-test C: t-test for the difference between means for two related samples D: t-test for the difference between means for two independent samples with homogeneity of variance E: t-test for the difference between means for two independent samples with heterogeneity of variance F: a one sample z-test for proportions (or a chi-square goodness of fit) G: chi-square goodness of fit only (where a one sample z-test of proportions isn’t appropriate) H: a two-sample z-test for the difference between proportions (or a chi-square test of independence) I: chi-square test of independence only (where a two-sample z-test for the difference between proportions isn’t appropriate.) J: simple regression K : multiple regression L: none of the above While traveling through South America, a student decides to analyze which country is the friendliest. He measures friendliness by seeing whether or not a person, when asked for directions, gives a reply. For each country, he gathers data on hundreds of people. The four countries analyzed are Argentina, Chile, Brazil, and Uruguay. A chi-square test of independence is appropriate in this scenario because the student wants to analyze whether there is an association between the country a person is from (Argentina, Chile, Brazil, Uruguay) and their likelihood of giving a reply when asked for directions. This test will determine whether there is a significant relationship between the two categorical variables: country and likelihood of giving a reply. H0: There is no association between the country a person is from and their likelihood of giving a reply when asked for directions. HA: There is an association between the country a person is from and their likelihood of giving a reply when asked for directions. 6. Please simply write the letter for the test as your answer. Here are the tests: A: one sample z-test B: one-sample t-test C: t-test for the difference between means for two related samples D: t-test for the difference between means for two independent samples with homogeneity of variance E: t-test for the difference between means for two independent samples with heterogeneity of variance F: a one sample z-test for proportions (or a chi-square goodness of fit) G: chi-square goodness of fit only (where a one sample z-test of proportions isn’t appropriate) H: a two-sample z-test for the difference between proportions (or a chi-square test of independence) I: chi-square test of independence only (where a two-sample z-test for the difference between proportions isn’t appropriate.) J: simple regression K : multiple regression L: none of the above Zoe takes a sample of eight UCI students to see if test anxiety exists. She measures their pulse a week before a test and then again 10 minutes before the test. The data are as follows: Week Before pulse: 65 68 90 78 59 80 80 76 72 10 Minutes Before pulse: 70 75 96 87 65 87 87 84 80 The paired samples t-test : determine whether there is a significant difference between the mean pulse rates of the students measured a week before the test and those measured 10 minutes before the test. This test is appropriate because it accounts for the paired nature of the data, allowing Zoe to assess whether there is a statistically significant change in pulse rate before and shortly before a test, which could indicate the presence of test anxiety. H0: There is no significant difference between the mean pulse rates of UCI students measured a week before the test and those measured 10 minutes before the test. (μ1 = μ2) HA: There is a significant difference between the mean pulse rates of UCI students measured a week before the test and those measured 10 minutes before the test. (μ1 ≠ μ2) 7. Please simply write the letter for the test as your answer. Here are the tests: A: one sample z-test B: one-sample t-test C: t-test for the difference between means for two related samples D: t-test for the difference between means for two independent samples with homogeneity of variance E: t-test for the difference between means for two independent samples with heterogeneity of variance F: a one sample z-test for proportions (or a chi-square goodness of fit) G: chi-square goodness of fit only (where a one sample z-test of proportions isn’t appropriate) H: a two-sample z-test for the difference between proportions (or a chi-square test of independence) I: chi-square test of independence only (where a two-sample z-test for the difference between proportions isn’t appropriate.) J: simple regression K : multiple regression L: none of the above The UCI Study Abroad Center is interested in whether UCI students are different from US college students in where they study abroad. They thus take a sample of 500 UCI students. It is known that among US college students, 60% study abroad in Europe, 20% in Asia & Oceania, 10% in the Americas, and 10% in the Middle East and Africa. Among the sample of UCI students, 60% study abroad in Europe, 33% in Asia & Oceania, 5% in the Americas and 2% in the Middle East and Africa. Two-sample z-test: for the difference between proportions would be appropriate in this scenario because the UCI Study Abroad Center wants to compare the proportions of UCI students studying abroad in different regions with the proportions of US college students studying abroad in the same regions. H0: There is no difference between the proportions of UCI students studying abroad in each region and the proportions of US college students studying abroad in each region. HA: There is a difference between the proportions of UCI students studying abroad in each region and the proportions of US college students studying abroad in each region.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chi-Squared test: The chi-square test of independence could also be used to analyze the relationship between the variables "student type" (UCI students vs. US college students) and "study abroad destination region". This test would determine whether there is a significant association between these two categorical variables. H0: There is no association between the "student type" (UCI students vs. US college students) and the "study abroad destination region". HA: There is an association between the "student type" and the "study abroad destination region". 8. There is a double log function, what will “a” and “b” equal if the function is a line with a slope of 2 and an intercept of 0? a = 0, b = 2 9. True or false: Generally speaking, increasing the sample size minimizes bias. Please explain. False, bias refers to the systematic error introduced by a sampling or estimation procedure that leads to deviation of the sample statistic from the true population parameter. Increasing the sample size may help reduce random variability (i.e., reduce the standard error), leading to more precise estimates of population parameters. However, bias can persist or even increase with larger sample sizes if the sampling or estimation procedure is inherently flawed or biased. Ex. Suppose a researcher wants to estimate the average income of a population by conducting a survey. If the survey disproportionately samples lower-income individuals who are more likely to respond, increasing the sample size will lead to more data points from this group, further skewing the estimate of the average income downward. 10. The concept of an iid is important when discussing sampling. Discuss how it relates to sampling (and which sample in particular). Specifically, how does it help us understand the sampling distribution. Independent: This means that the selection of one sample unit does not influence the selection of another sample unit. For example, in a simple random sample, each individual in the population has an equal chance of being selected, and the selection of one individual does not affect the probability of selecting another individual. Identically Distributed: each sample unit is drawn from the same probability distribution. This implies that the underlying population distribution is the same for each sample unit. In other words, each observation in the sample comes from the same population with the same parameters. For instance, if we are sampling individuals' heights from a population, the heights of individuals in each sample are drawn from the same distribution of heights in the population. IID(independent and identically distributed) allows for consistency- as the sample size increases, the sampling distribution of the sample statistic approaches a normal distribution. Precision - the variability of the sample statistic (e.g., standard error) can be estimated more accurately. This allows us to quantify the uncertainty associated with our sample statistic and make more precise inferences about the population parameters. Validity: it ensures that the assumptions underlying these tests are met, leading to valid and reliable results. 11. We are doing an experiment to see if Psych majors do better on a test than Econ majors. The null hypothesis is that they do equally well on the test. We get the results, and Psych majors and Econ majors do equally well. What is our conclusion? Fail to reject the null hypothesis, there is no statistically significant difference in test performance between the two majors. 12. True or false and explain why or why not. You are more likely to make Type II error with a t-test than with a comparable z-test. True. A Type II error (false negatives) occurs when we fail to reject a null hypothesis that is actually false. In the context of hypothesis testing, it means we incorrectly conclude that there is no effect or difference when, in fact, there is one. The wider confidence intervals associated with the t-distribution compared to the standard normal distribution (z-distribution). The t- distribution has fatter tails compared to the normal distribution, reflecting greater uncertainty due to smaller sample sizes. This greater uncertainty results in wider confidence intervals and reduces the power of the test, making it more likely to fail to detect a true effect when one exists. In contrast, the z-test is more conservative and tends to have narrower confidence intervals, especially with larger sample sizes. This narrower interval means that the z-test has greater power to detect differences or effects when they are present, reducing the likelihood of a Type II error compared to a t-test under comparable conditions.
13. Explain why we have N1 + N2 – 2 degrees of freedom when we pool the variances. (In other words, why do we gain degrees of freedom when we pool?) Degree of freedom: the maximum number of logically independent values, which may vary in a data sample. Pooling: combining information from multiple sources to estimate a common population parameter Before pooling: In a two-sample t-test, we have two independent samples, one from each population. The degrees of freedom for each sample's variance estimation are N1 -1 and N2 -1 After Pooling: total degrees of freedom for the pooled variance estimate is the sum of the degrees of freedom from both samples are: (N1-1)+(N2-1) thus, total degree of freedom is (N1-1)+(N2-1) = N1 + N2 -2. Example: Consider a data sample consisting of five positive integers. The values of the five integers must have an average of six. If four items within the data set are {3, 8, 5, and 4}, the fifth number must be 10. Because the first four numbers can be chosen at random, the degree of freedom is 4. In another data sample, the values of five integers must have an average of 5. If four items are {2,4,1,8}, the fifth number must be 3. Because the first four numbers can be chosen at random, the degree of freedom is 4. If pooled, the degree of freedom could be 4 + 4 = 8. 14. Here is some data for two variables: “Number of Office Hours Visits” and “Score in the Class.” The covariance is 6.375. The mean number of visits is five with a standard deviation of 3.54. The mean score is 87.2 with a standard deviation of 11.3. What proportion of the variability (sum of squares) in “Score” is directly attributable to the variability in “Number of Visits”? (In other words, what is the proportion of explained squared error?) Visits Score 3 68 1 95 4 89 10 88 7 96 15. Traditionally, I give one free question on a final. It almost always has nothing at all to do with the class. This is it. Here are ten points. You need not answer this question to get points, but if you wish to you may. Last quarter, I asked those of you familiar with Marvel movies and characters, who the most fun character is? Now I am asking which character you could do without. Which character do you never need to see again in a Marvel movie?