Day 18 Chi-Square Association NOTES

docx

School

Rochester Institute of Technology *

*We aren’t endorsed by this school

Course

146

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

12

Uploaded by AdmiralSeaLion3703

Report
STAT 146 Intro to Statistics II Day 18—Chi-Square Test for Association Notes Table of Contents I. The Test for Association (also called Test for Independence) II. Minitab steps for generating a Test for Association Pearson Chi-square and P-value III. Complete testing process for the Test for Association IV. Requirements for a Chi-Square Test V. How to calculate the expected frequency by hand VI. Examples and Completed Examples 1
I. Test for Association (also called the Test of Independence) You should use the test for Association/Independence when you are working with one random sample of data (taken from one population) and you are interested in studying two categorical variables from that one sample. Since the two categorical variables came from the same sample, this statistical test will look for an association between the variables. In other words, use this test to determine if there is a significant association/relationship between two categorical variables from the same population of data. A test for association / test of independence is used to determine whether two categorical variables (from the same population of data) are associated with one another in the population. The null hypothesis will be the assumption that the two variables are independent; that there is no relationship/association. The alternative hypothesis will be that knowing the level of Variable A can help you predict the level of Variable B. Note: Support for the alternative hypothesis suggests that the variables are related; but the relationship is not necessarily causal, in the sense that one variable "causes" the other. 2
II. Minitab steps for generating a Test for Association Pearson Chi-square and P-value Use the appropriate Chi-square test in Minitab Stat Tables Chi square Test for Association If you have 2 columns of Raw Data… If you have raw data (2 columns of categorical data in Minitab), then use the default option: Raw data (categorical variables) ’. Place one variable in the Rows, and one variable in the Columns (it does not matter which variable goes where). Click on ‘Statistics’, select Each cell’s contribution to chi-square . Always report the Pearson chi-square, not the Likelihood Ratio. If you have summarized data in a two-way table… If you have summarized data in a two-way table (data that has already been counted and put into a contingency table), choose ‘ Summarized data ’. Select all columns of counted data. Labels for the table (optional) Rows: Bring in Column 1, if you want. Columns: Provide a heading for your columns, if you want. Click on ‘Statistics’, select Each cell’s contribution to chi-square . Always report the Pearson chi-square, not the Likelihood Ratio. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
III. The Complete Testing Process for the Test for Association Optional (The two categorical variables are being studied: _____ and _____) The goal is to find evidence to show that ________ and _______ are associated. Ho: and are independent. (In other words, the variables are not associated) (variable 1) (variable 2) Ha: and are associated. (In other words, the variables are not independent) (variable 1) (variable 2) Alpha = ____ Use the appropriate Chi-square test in Minitab Stat Tables Chi square Test for Association Check the assumption: Look at your Minitab output and find the expected counts. Are they all greater than 5? If so, you have met the requirement/assumption. RESULTS Chi-square = ___, df = ____ State the P-value = _____ Is the P-value less than alpha? Decide if you CAN or CANNOT reject the null. At the __% level of significance, the sample does/does not provide sufficient support to say that ______ and ______ are associated. In other words… Follow-up Sentence **If you conclude that the variables are associated/related, report the greatest contributor to Chi-square and if the observed values are greater than or less than the expected values for those levels that are the greatest contributor. 1. You will look at the Minitab output and identify the cell that has the greatest contribution to Chi-square. 2. State this in a complete sentence. Then, identify if the observed counts are less than/greater than what was expected in that cell. 4
IV. Requirements for a Chi-Square Test Have the requirements been met? The test is valid if expected frequencies are > 1 and No more than 20% of expected frequencies are less than 5. Look at your Minitab output and find the expected counts. Are they all greater than 5? If so, you have met both requirements above. If there are cells with expected counts less than 5, Minitab will have alerted you with this message: * NOTE * 2 cells with expected counts less than 5 If Minitab shows this * NOTE * (above), it does not mean that we have NOT met the requirements. You need to first determine if the number of cells reported is more than 20% of the cells. I have demonstrated this in Example 1 later in these notes. 5
V. How to calculate the expected frequency by hand You may always use Minitab to generate the Chi-Square test statistic and P-value. However, to get a deeper understanding, you should be able to calculate the Chi-Square test statistic by hand. Compute the row and column totals. Also compute the table total (the sample size). Compute the relative marginal frequencies for the row variable (take the row total divided by table total). Compute the relative marginal frequencies for the column variable (take the column total divided by table total). Use the Multiplication Rule for Independent Events to compute the expected proportion of observations within each cell (assuming independence) Multiply the proportions by the sample size, to obtain the expected counts within each cell Short-cut formula for finding the expected frequency: Expected Frequency = ( row total ) ( column total ) table total You may have studied probability rules in the past… P ( A / B )= P ( A and B ) P ( A ) and if A and B are independent, then P ( A / B )= P ( A ) The Multiplication Rule for Independent Events: P ( A and B )= P ( A )⋅ P ( B ) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
VI. Examples and Completed Examples Example 1) A random sample of 300 individuals were asked if they think the courts are dealing harshly enough with criminals. Gender and the response to the courts question are provided in the Minitab file: CourtsGender.mtw. Conduct the appropriate hypothesis test to determine if there is a relationship between gender and the response to the court question . a. Is this a Chi-Square Test of Independence/Association or Chi-Square Goodness of Fit Test? Explain. b. Get to know the data. Study the 100% stacked bar chart and state whether you believe there is a relationship between gender and the court response. Gender Male Female 100 80 60 40 20 0 Percent Too Harsh Not Harsh Enough About Right Court Opinion Chart of Gender, Court Opinion Percent is calculated within levels of Gender. c. Complete the appropriate hypothesis test (Show the complete testing process). If you reject the null, write a follow-up sentence that identifies the greatest contributor to Chi-Square and why. 7
Completed Example 1) A random sample of 300 individuals were asked if they think the courts are dealing harshly enough with criminals. Gender and the response to the courts question are provided in the Minitab file. Conduct the appropriate hypothesis test to determine if there is a relationship between gender and the response to the court question. a. Is this a Chi-Square Test of Independence (Association) or Goodness of Fit Test? Explain. This is a test of independence since ONE sample was taken and TWO categorical variables are studied. b. Get to know the data. Study the 100% stacked bar chart and state whether you believe there is a relationship between gender and the court response. Gender Male Female 100 80 60 40 20 0 Percent Too Harsh Not Harsh Enough About Right Court Opinion Chart of Gender, Court Opinion Percent is calculated within levels of Gender. More males feel that the courts are about right. Maybe there is a relationship? c. Complete the appropriate hypothesis test (complete testing process)—don’t forget to explain any differences if you reject the null by writing a follow-up sentence. Optional (Two categorical variables are being studied: Gender and Court Opinion.) The goal is to find evidence to show that gender and opinion of the courts are associated. Ho: Gender and the opinion of the courts are independent (in other words, the variables are not associated). Ha: Gender and the opinion of the courts are associated. Alpha = .05 About Right Not Harsh Enough Too Harsh All Femal e 16 150 6 17 2 18.92 148.49 4.59 0.45 07 0.0153 0.4355 Male 17 109 2 12 8 14.08 110.51 3.41 0.60 56 0.0205 0.5852 All 33 259 8 30 0 Cell Contents Count Expected count Contribution to Chi-square Chi-Square Test Chi- D P- 8
Square F Value Pearson 2.113 2 0.348 Likelihood Ratio 2.164 2 0.339 2 cell(s) with expected counts less than 5. Check the assumptions: 2 out of the 6 cells have expected counts less than 5. 2/6 =.33. Therefore, 33% of the cells have expected counts less than 5. We have NOT met the requirements of a Chi-square test. Chi-Square = 2.113 , df = 2, P-value = .348 Is the P-value less than alpha? NO. We CANNOT reject the null. At the 5% level of significance, the sample does not provide sufficient support to say that gender and opinion of the courts are associated. In other words, gender and the court opinions are independent. I am not confident in these test results since we have not met the chi-square requitements. [NOTE: There is no need for a follow-up sentence…we only write one if we DO reject Ho.] 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example 2) Pre-school Attendance and Pre-algebra Achievement (these are contrived data, based on a real study) In these times of educational reform, attention has been focused on pre-school for all children. Since many districts are facing budget cuts, funding pre-school programs may impact other offerings. Before making their recommendations, administrators in a large urban district take a random sample of 150 seventh graders and compare the pre-algebra achievement levels of those who attended pre-school and those who did not. If achievement is independent of attending pre-school then the proportions at each level should be equal. Use the counts in the frequency table to determine if there is an association between attending pre-school and pre- algebra achievement. [NOTE: You will have to type the data into Minitab. The image above shows how it should look!] a. Is this a test of Independence/Association or Goodness of Fit Test? Explain. b. Study the 100% stacked bar chart. Do you believe it shows a relationship between grade level and pre- school attendance? Advanced At Grade Level Below Grade Level 100 80 60 40 20 0 Data Yes Pre-school No Pre-school Pre-school? Chart of Below Grade Level, At Grade Level, Advanced Percent is calculated within variables. c. Complete the appropriate hypothesis test (Show the complete testing process). If you reject the null, identify the greatest contributor to Chi-Square and why. 10
Example 2 Completed) a. Is this a test of Independence (Association) or Goodness of Fit Test? Explain. This is a test of independence since one sample of data was collected and two categorical variables are being studied. b. Study the 100% stacked bar chart. Do you believe it shows a relationship between grade level and pre- school attendance? Advanced At Grade Level Below Grade Level 100 80 60 40 20 0 Data Yes Pre-school No Pre-school Pre-school? Chart of Below Grade Level, At Grade Level, Advanced Percent is calculated within variables. Here is just one example of many true statements that are reasonable to make: More students who DID not attend preschool are below grade level in math achievement. Yes, there may be a relationship. c. Complete the appropriate hypothesis test (show the complete testing process)—don’t forget to explain any differences if you reject the null. (Two categorical variables are being studied: Pre-Algebra achievement and preschool attendance.) The goal is to find evidence to show that Pre-Algebra achievement and preschool attendance are associated. Ho: Pre-Algebra achievement and preschool attendance are independent. Ha Pre-Algebra achievement and preschool attendance are associated. Alpha = .05 Belo w Grad e Leve l At Grade Level Advance d All No Pre- school 29 32 9 70 20.5 3 35.47 14.00 3.49 1 0.339 1.786 Yes Pre- school 15 44 21 80 23.4 7 40.53 16.00 3.05 5 0.296 1.563 11
All 44 76 30 15 0 Cell Contents Count Expected count Contribution to Chi-square Chi-Square Test Chi- Square D F P- Value Pearson 10.529 2 0.005 Likelihood Ratio 10.705 2 0.005 Check the assumption: yes, we have met the assumption since all expected counts are greater than 5. Chi-square = 10.529, df = 2, P-value = .005 Yes, our P-value is less than alpha. We CAN reject the null. At the 5% level of significance, the sample data DOES provide sufficient support to say that achievement in pre- algebra and whether they attended preschool are associated. In other words, there is an association. The greatest contributor to Chi-square is the students who did NOT attend preschool and are BELOW grade level. In that cell, what was observed is GREATER than what was expected. 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help