LAB 5 Finished

docx

School

The University of Oklahoma *

*We aren’t endorsed by this school

Course

2013

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

7

Uploaded by ElderRiver12471

Report
LAB 5: Testing Hypotheses using statistics (difference of 2 proportions & chi-square) 10/23/23 DUE BY 10/23/23 at 11:59 PM 5 points off each day it’s late Submit this document (or another document with your answers) and excel, Graded out of 100 points, Excel file is worth 40 points It has been shown that students with parents who graduated from college, are more likely to graduate from college themselves. The ELS (Education Longitudinal Study) is a nationally representative survey of students across the United States; it surveys students, parents, and teachers. Suppose we want to understand the mechanisms by which parents’ education status impacts their students’ educational attainment. One approach is to think about whether parents with higher levels of educational attainment have higher expectations or aspirations for their children, in terms of educational attainment. For example, is a parent with a bachelor’s degree more likely to expect their child to earn a bachelor’s degree than a parent who did not go to college? This is especially important to understand as we try to uncover what impacts students’ college-going and degree attainment. In the dataset, parents are asked: How far do you expect your 10 th grader to go? And are given the following options to choose from: Don’t know, Less than HS graduation, High school graduation or GED only, Attend or complete 2-year college, Attend college (4-year incomplete), Graduate from college (4-year), Obtain Master’s or equivalent, Obtain PhD or MD or other advanced degree Codebook: Parent_expect_collgrad= 0 when a parent does not expect at least a 4-year college degree, and 1 if they expect at least a 4-year degree (or more). Parent_collgrad= 0 when parent has less than a 4-year college degree, and 1 if a parent has a BA degree or more, Question #1: Examine the construct validity of this measure (parent expectations). Use full sentences. Construct validity: refers to the degree to which a test of measurement accurately assesses the underlying theoretical construct it intends to measure. In this case- does the chosen measure actually measure parent expectations (something that is hard to measure). 5 points
- I think that this measure has a strong construct validity to a strong degree. Problems that can arise from this is that this sample is taken while students are in 10 th grade. This leaves still 2-3 years for students to grow and parents to be able to understand the path that is best fitting for their child. The hypotheses are: Null: There is no difference in parent expectations of students for parents with and without a bachelor’s degree. H 0 : p 1 P 2 = 0 Alternative: There is a relationship between parent expectations of students and parent educational attainment H 0 : p 1 P 2 = 0 Question #2: What do p 1 and P 2 represent in the hypotheses above? 4 points (2 each) - P1 represents the first population sample taken within the hypothesis and the P2 is the second population sample taken within the hypothesis. Part A: Difference of 2 proportions Use Part A sheet Step 1: Calculate proportions Let’s start by creating a table that shows the raw numbers for each proportion. Enter this table into excel in an empty set of cells (do not include Value1-4 in table, this is just to label where to put certain equations). Expects student to graduate college? Yes No Parent has BA Value 1 Value 3 Parent does not have BA Value 2 Value 4 Now let’s fill in the table in excel as follows: We want to calculate counts using the =COUNTIFS function. The COUNTIFS function tell us to count the number of cells that have a particular value in a certain cell range =COUNTIFS(starting cell : ending cell, value). If we want to count based on two conditions we just add a comma after the first specified value, and then include another starting cell : ending cell, value, like so: =COUNTIFS(start cell 1: end cell 1, value 1, start cell 2: end cell 2, value 2)
For Value 1, we want to write =COUNTIFS(A2:A12263, 1, B2:B12263, 1) Value 2, =COUNTIFS(A2:A12263, 1, B2:B12263, 0) Value 3 =COUNTIFS(A2:A12263, 0, B2:B12263, 1) Value 4 =COUNTIFS(A2:A12263, 0, B2:B12263, 0) Question 3: Explain what one of the COUNTIFS equations is doing. For example, in Value 1 we are… 5 points - For value 1 we are counting if the parent does have a BA degree and they expect the students to graduate. Now we can generate the pooled proportion we need to test the success-failure condition We want to calculate: Totalsuccesses Total cases Total ¿ of parentsthat expect kids ¿ earnBA ( at least ) ¿ Total ¿ of parents ¿ Value 1 + Value 2 Value 1 + Value 2 + Value 3 + Value 4 Question 4: What is the pooled proportion (the ratio/number found)? 3 points - 0.77092 Step 2: Success-failure condition We want to make sure we meet the success/failure condition to ensure we have a sufficient sample size. We can do this in excel! The formula is: n p 0 10 n ( 1 p 0 ) 10 n = sample size p 0 = null hypothesis proportion We have to do this for both proportions In excel, in an empty cell, we can multiply the sample size by the null proportion, as follows: 1. (1-.77) * 5426 2. (1-.77) * 7016 Question 5: How did I get 5246 and 7016? 5 points What estimates did you find? 4 points (2 each) Does this satisfy the success/failure condition? 2 points
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a.) We got 5426 from the total number of parents that have a BA degree, and 7016 from the total number of parents that do not have a BA degree b.) 1206.58 and 1613.68 c.) This satisfies the success/failure condition with both numbers exceeding 10 minimum, but exceeding 30 strengthens the case. Step 3: Find point estimate of the difference in expectations rate between parents with and without BA ^ p BA ^ p noBA = 4718 4718 + 528 4735 4735 + 2281 We can do this in excel using =(4718/(4718+528))-(4735/(4735+2281)) Question 6: What point estimate do you find? 3 points - Point estimate = 0.224466 Step 4: Calculate standard error SE= ^ p pooled ( 1 ^ p pooled ) n 1 + ^ p pooled ( 1 ^ p pooled ) n 2 SE= .77 ( 1 .77 ) 5426 + .77 ( 1 .77 ) 7016 We can do this in excel by writing in an empty cell: =SQRT((0.77*(1-0.77)/5246)+((0.77*(1-0.77)/7016))) Question 7: What is your SE? 3 points - Standard Error = 0.00767 Step 5: Now using excel let’s calculate our Z-score and our p-value. Z-score: In an empty cell write =(0.224 – 0)/select the cell with your SE This comes from =point estimate difference – null/SE P-value: in another empty cell write, =NORM.S.DIST(select cell with zscore, FALSE)
Format cells to show number with up to 3 decimal places. We did this on midterm but with percentages. Select format cells select number increase to 3 decimal places Question 8: What is your z-score? 3 points - 29.223 Question 9: What is your p-value? 3 points - 0.000 Question 10: Do you reject null? 2 points - We reject this null because it is lower than the significance levels p-value that is standing at 0.005 at 95% Question 11: Interpret this test of 2 proportions. 5 points - It would be right on the borderline of what is acceptable and what is not. It is just right below 30 so it probably would not be as strong of an argument compared to Part B: Chi Squared Testing On Sheet B we will be conducting a chi square test. We looked at chi square on Thursday to determine whether the distribution of random scholarships was in fact random…it was not. Today we will be using Chi Square to test to if categorical variables are related (which is what you may do for your research project). We can use the same dataset here as well and the same table we created above. Go to sheet B, where I have a table of the observed counts of parents with a BA by the counts of parents that expect their students to earn a BA. You will create an “expected values” table like so, where: Value 1 =(B5*D3)/D5 Value 2 =(B5*D4)/D5 Value 3 =(C5*D3)/D5 Value 4 =(C5*D4)/D5 Expects student will earn BA degree Expects student will not earn BA degree Total
Parent has BA degree Value 1 Value 3 Value 1+ Value 3 Parent does not have BA degree Value 2 Value 4 Value 2 + Value 4 Total Value 1 + Value 2 Value 3 + Value 4 Value 1 + Value 2 + Value 3 + Value 4 Next, we want to create another table like so, where we look at the difference between the observed value (the one I gave you) and the expected value (the values you just generated): Values 1 – 4 =(O-E)^2/E Using the tables you already have we can do this by: Value 5: =(4718-select value 1)^2/select value 1 Value 6: =(4735-select value 2)^2/select value 2 Value 7: =(528-select value 3)^2/select value 3 Value 8: =(2281-select value 4)^2/select value 4 Expects student will earn BA degree Expects student will not earn BA degree Parent has BA degree Value 5 Value 7 Parent does not have BA degree Value 6 Value 8 Chi-Squared Value Next we need to compute the test statistics (chi-squared). We can do this in excel by writing in an empty cell (DO NOT WRITE VALUE 5, instead select in excel the cell with value 5 in it): =SUM(Value 5+Value6+Value7+Value8) Question 12: What is the chi-squared value? 3 points - The Chi-Squared Value = 856.3643 Critical Value Next we need to compute the critical value. We can do this in excel using the following equation: =CHISQ.DIST.RT(0.05, 1) 0.05 because that is our p value 1 because that is the degrees of freedom (i.e., how many categories we have minus 1) Question 13: What is the critical value? 3 points - 0.823
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 14: What can you conclude from these numbers/this chi-squared test? 4 points - We can conclude that there was a major difference in the data as we had 856.36 as our Chi-Squared number. This means that the difference in the actual data is a pretty substantial difference between our actual and expected data. This can mean that either a, our population did not act as our data would imply or b, that our data implies unreliable information. Question 15: Why did we use a chi-squared test? 3 points - We used it to see if the sampling that we are observing was truly the normal data we would see when observing these types of observations or if they were random. The bigger the Chi-Squared number, the bigger the difference is from the estimated data and the actual data.