LAB 5 Finished
docx
keyboard_arrow_up
School
The University of Oklahoma *
*We aren’t endorsed by this school
Course
2013
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
docx
Pages
7
Uploaded by ElderRiver12471
LAB 5: Testing Hypotheses using statistics (difference of 2 proportions & chi-square)
10/23/23
DUE BY 10/23/23 at 11:59 PM
5 points off each day it’s late
Submit this document (or another document with your answers) and excel, Graded out of 100 points, Excel file is worth 40 points
It has been shown that students with parents who graduated from college, are more likely to
graduate from college themselves. The ELS (Education Longitudinal Study) is a nationally
representative survey of students across the United States; it surveys students, parents, and
teachers. Suppose we want to understand the mechanisms by which parents’ education status impacts their
students’ educational attainment. One approach is to think about whether parents with higher
levels of educational attainment have higher expectations or aspirations for their children, in
terms of educational attainment. For example, is a parent with a bachelor’s degree more likely to
expect their child to earn a bachelor’s degree than a parent who did not go to college? This is
especially important to understand as we try to uncover what impacts students’ college-going
and degree attainment. In the dataset, parents are asked: How far do you expect your 10
th
grader to go? And are given
the following options to choose from: Don’t know, Less than HS graduation, High school graduation or GED only, Attend or complete 2-year college, Attend college (4-year incomplete), Graduate from college (4-year), Obtain Master’s or equivalent, Obtain PhD or MD or other advanced degree
Codebook: Parent_expect_collgrad= 0 when a parent does not expect at least a 4-year college degree, and 1
if they expect at least a 4-year degree (or more). Parent_collgrad= 0 when parent has less than a 4-year college degree, and 1 if a parent has a BA
degree or more, Question #1: Examine the construct validity of this measure (parent expectations). Use full sentences.
Construct validity: refers to the degree to which a test of measurement accurately assesses the underlying theoretical construct it intends to measure. In this case- does the chosen measure actually measure parent expectations (something that is hard to measure). 5 points
-
I think that this measure has a strong construct validity to a strong degree. Problems that
can arise from this is that this sample is taken while students are in 10
th
grade. This leaves
still 2-3 years for students to grow and parents to be able to understand the path that is
best fitting for their child. The hypotheses are: Null: There is no difference in parent expectations of students for parents with and without a
bachelor’s degree.
H
0
: p
1
−
P
2
=
0
Alternative: There is a relationship between parent expectations of students and parent
educational attainment
H
0
: p
1
−
P
2
=
0
Question #2:
What do p
1
and P
2
represent in the hypotheses above? 4 points (2 each)
-
P1 represents the first population sample taken within the hypothesis and the P2 is
the second population sample taken within the hypothesis. Part A: Difference of 2 proportions Use Part A sheet
Step 1: Calculate proportions
Let’s start by creating a table that shows the raw numbers for each proportion. Enter this table
into excel in an empty set of cells (do not include Value1-4 in table, this is just to label where to
put certain equations).
Expects student to graduate college?
Yes
No
Parent has BA
Value 1
Value 3
Parent does not have BA
Value 2
Value 4
Now let’s fill in the table in excel as follows: We want to calculate counts using the =COUNTIFS function. The COUNTIFS function tell us to count the number of cells that have a particular value in a
certain cell range
=COUNTIFS(starting cell : ending cell, value). If we want to count based
on two conditions we just add a comma after the first specified value, and then include another
starting cell : ending cell, value, like so: =COUNTIFS(start cell 1: end cell 1, value 1, start cell
2: end cell 2, value 2)
For Value 1, we want to write =COUNTIFS(A2:A12263, 1, B2:B12263, 1)
Value 2, =COUNTIFS(A2:A12263, 1, B2:B12263, 0)
Value 3 =COUNTIFS(A2:A12263, 0, B2:B12263, 1)
Value 4 =COUNTIFS(A2:A12263, 0, B2:B12263, 0)
Question 3: Explain what one of the COUNTIFS equations is doing. For example, in Value 1 we
are…
5 points
-
For value 1 we are counting if the parent does have a BA degree and they expect the
students to graduate. Now we can generate the pooled proportion we need to test the success-failure condition
We
want
to
calculate:
Totalsuccesses
Total cases
Total
¿
of parentsthat expect kids
¿
earnBA
(
at least
)
¿
Total
¿
of parents
¿
Value
1
+
Value
2
Value
1
+
Value
2
+
Value
3
+
Value
4
Question 4: What is the pooled proportion (the ratio/number found)? 3 points
-
0.77092
Step 2: Success-failure condition
We want to make sure we meet the success/failure condition to ensure we have a sufficient
sample size. We can do this in excel!
The formula is: n p
0
≥
10
∨
n
(
1
−
p
0
)
≥
10
n = sample size
p
0
= null hypothesis proportion
We have to do this for both proportions
In excel, in an empty cell, we can multiply the sample size by the null proportion, as follows: 1.
(1-.77) * 5426
2.
(1-.77) * 7016
Question 5: How did I get 5246 and 7016? 5 points
What estimates did you find? 4 points (2 each)
Does this satisfy the success/failure condition? 2 points
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
a.)
We got 5426 from the total number of parents that have a BA degree, and 7016 from
the total number of parents that do not have a BA degree
b.)
1206.58 and 1613.68
c.)
This satisfies the success/failure condition with both numbers exceeding 10 minimum,
but exceeding 30 strengthens the case.
Step 3: Find point estimate of the difference in expectations rate between parents with and
without BA
^
p
BA
−
^
p
noBA
=
4718
4718
+
528
−
4735
4735
+
2281
We can do this in excel using
=(4718/(4718+528))-(4735/(4735+2281))
Question 6: What point estimate do you find? 3 points
-
Point estimate = 0.224466
Step 4: Calculate standard error
SE=
√
^
p pooled
(
1
−
^
p pooled
)
n
1
+
^
p pooled
(
1
−
^
p pooled
)
n
2
SE=
√
.77
(
1
−
.77
)
5426
+
.77
(
1
−
.77
)
7016
We can do this in excel by writing in an empty cell:
=SQRT((0.77*(1-0.77)/5246)+((0.77*(1-0.77)/7016)))
Question 7: What is your SE? 3 points
-
Standard Error = 0.00767
Step 5: Now using excel let’s calculate our Z-score and our p-value. Z-score: In an empty cell write
=(0.224 – 0)/select the cell with your SE
This comes from =point estimate difference – null/SE
P-value: in another empty cell write, =NORM.S.DIST(select cell with zscore, FALSE)
Format cells to show number with up to 3 decimal places. We did this on midterm but with
percentages. Select format cells
select number
increase to 3 decimal places
Question 8: What is your z-score? 3 points
-
29.223
Question 9: What is your p-value? 3 points
-
0.000
Question 10: Do you reject null? 2 points
-
We reject this null because it is lower than the significance levels p-value that is standing
at 0.005 at 95%
Question 11: Interpret this test of 2 proportions. 5 points
-
It would be right on the borderline of what is acceptable and what is not. It is just right
below 30 so it probably would not be as strong of an argument compared to Part B: Chi Squared Testing On Sheet B we will be conducting a chi square test. We looked at chi square on Thursday to
determine whether the distribution of random scholarships was in fact random…it was not.
Today we will be using Chi Square to test to if categorical variables are related (which is what
you may do for your research project). We can use the same dataset here as well and the same table we created above. Go to sheet B,
where I have a table of the observed counts of parents with a BA by the counts of parents that
expect their students to earn a BA. You will create an “expected values” table like so, where: Value 1 =(B5*D3)/D5
Value 2 =(B5*D4)/D5
Value 3 =(C5*D3)/D5
Value 4 =(C5*D4)/D5
Expects student will
earn BA degree
Expects student will
not earn BA degree
Total
Parent has BA degree
Value 1
Value 3
Value 1+ Value 3
Parent does not have
BA degree
Value 2
Value 4
Value 2 + Value 4
Total
Value 1 + Value 2
Value 3 + Value 4
Value 1 + Value 2 +
Value 3 + Value 4
Next, we want to create another table like so, where we look at the difference between the
observed value (the one I gave you) and the expected value (the values you just generated):
Values 1 – 4 =(O-E)^2/E
Using the tables you already have we can do this by:
Value 5: =(4718-select value 1)^2/select value 1
Value 6: =(4735-select value 2)^2/select value 2
Value 7: =(528-select value 3)^2/select value 3
Value 8: =(2281-select value 4)^2/select value 4
Expects student will
earn BA degree
Expects student will
not earn BA degree
Parent has BA degree
Value 5
Value 7
Parent does not have
BA degree
Value 6
Value 8
Chi-Squared Value
Next we need to compute the test statistics (chi-squared). We can do this in excel by writing in
an empty cell (DO NOT WRITE VALUE 5, instead select in excel the cell with value 5 in it): =SUM(Value 5+Value6+Value7+Value8)
Question 12:
What is the chi-squared value? 3 points
-
The Chi-Squared Value = 856.3643
Critical Value
Next we need to compute the critical value. We can do this in excel using the following equation:
=CHISQ.DIST.RT(0.05, 1)
0.05 because that is our p value
1 because that is the degrees of freedom (i.e., how many categories we have minus 1)
Question 13:
What is the critical value? 3 points
-
0.823
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 14: What can you conclude from these numbers/this chi-squared test? 4 points
-
We can conclude that there was a major difference in the data as we had 856.36 as our
Chi-Squared number. This means that the difference in the actual data is a pretty
substantial difference between our actual and expected data. This can mean that either a,
our population did not act as our data would imply or b, that our data implies unreliable
information. Question 15: Why did we use a chi-squared test? 3 points
-
We used it to see if the sampling that we are observing was truly the normal data we
would see when observing these types of observations or if they were random. The bigger
the Chi-Squared number, the bigger the difference is from the estimated data and the
actual data.