Chi-Squared Tests for Categorical Data in BUSN 3000

BUSN 3000 Unit 9: Models for Categorical Data Unit 9: Models for Categorical Data – Chi-squared Tests Goodness-of-fit test - one categorical variable, more than two categories Example: Where do BUSN 3000 students live? Is the distribution the same as the UGA population? Housing UGA population Observed counts (sample data) Expected counts (if Ho true) Off-campus (not UGA-owned or affiliated) 66% 22 25.08 Fraternity or Sorority housing 5.5% 14 2.09 On-campus dorms or other UGA-owned or affiliated housing 28.5% 2 10.83 Total 100% 38 38 Could these observed counts have occurred just by chance if the distribution of housing choices for BUSN 3000 students were really the same as the UGA population? H 0 : p off − campus = 0.66 , p Greek = 0.055 , p on − campus = 0.285 BUSN 3000 has same distribution as UGA population H A : at least one p different from what’s given The chi-squared ( χ 2 ) test statistic compares the observed counts to the expected counts. χ 2 = ∑ ( Observed count − Expected count ) 2 Expected count = (22-25.08)^2/25.08 + (14-2.09)^2/2.09 + (2- 10.83)^2/10.83  When the observed counts are somewhat close to the expected counts… χ 2 is large / small and the evidence against the null hypothesis is strong / weak.  When the observed counts differ greatly from the expected counts… χ 2 is large / small and the evidence against the null hypothesis is strong / weak. 1

BUSN 3000 Unit 9: Models for Categorical Data Student housing (continued) H 0 : p off − campus = 0.66 , p Greek = 0.055 , p on − campus = 0.285 H A : At least one p is different from these values. P-values for chi-squared tests How large must the chi-squared statistic be to convince us that the null hypothesis is not true? df = number of categories – 1 (always choose second option for chi squared)  If the distribution of housing choices for BUSN students were really _____ the same as (Ho true) ______UGA students overall, sample results like ours would be… unlikely  For α = 0.05 , state your conclusion in context. o There is sufficient / insufficient evidence to conclude that distribution of housing choices for BUSN students is ______ different from _______ the overall UGA population. Conducting a goodness-of-fit test using Analyze – Distribution in JMP 2

BUSN 3000 Unit 9: Models for Categorical Data Using residuals as a follow-up analysis  Why is this follow-up necessary? at least one p is different form what is given (which ones? And by how much?) The residuals show which individual categories have large differences between observed and expected counts.  A positive residual means observed count is ____ larger ____ than expected.  A negative residual means observed count is ___ smaller ____ than expected.  Values less than -2 or greater than 2 are unusual. Housing UGA population Observed counts Expected counts Deviation (obs-exp) Standardized residual Off-campus (not UGA-owned or affiliated) 66% 22 25.08 -3.08 -0.615 Fraternity or Sorority housing 5.5% 14 2.09 11.91 8.238 On-campus dorms or other UGA-owned or affiliated housing 28.5% 2 10.83 -8.83 -2.683 Total 100% 38 38 0 Checking conditions for a chi-squared test 1. Random – random selection means generalization to population (our sample may not be representative of BUSN 3000 population) Random Assignment means causation 2. Sample size large enough – expected counts must all be at least 5 (sample size condition is not met because or smallest expected value is 2.09) Two-way tables and segmented bar graphs 3 residual = observed count − expected count √ expectedcount

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

BUSN 3000 Unit 9: Models for Categorical Data Example: Is more choice always better? A researcher set up a tasting booth for jams in a grocery store. She alternated the choice set: sometimes the booth would feature 6 different jams and sometimes 24.  Is there any evidence of a relationship between size of the choice set and whether the customer stopped? p-hat small = 65/157 = 0.4013 P-hat large = 98/163 = 0.6012 There is a relationship between size fo choice set and topping in this sample Chi-squared tests for independence To test the relationship between two categorical variables, we use the chi-squared test of independence. H 0 : size of choice set not related to customers stopping in the population H A : size of choice set is related to customers stopping in the population  What counts would you expect if there were no relationship between size of the choice set and whether the customer stopped? Overall = 161/320 = 50.31% stopped 50.31% of 157 = 78.99 expected count = (row total)(col total)/overall total Chi-squared test statistic and p-value χ 2 = ∑ ( Observed count − Expected count ) 2 Expected count 4 Customer stopped? Yes No Total Small choice set (6) 63 94 157 Large choice set (24) 98 65 163 Total 161 159 320

BUSN 3000 Unit 9: Models for Categorical Data  If the chances of stopping at the tasting booth were really ________________for large and small choice sets, sample results like our would be…  For α = 0.05 , state your conclusion in context. * o There is sufficient / insufficient evidence to conclude that Conducting a test of independence using Analyze – Fit Y by X in JMP  Click the red arrow below the graph (next to Contingency Table ) to show counts, percentages, etc. * This isn’t the end of the story. Try using JMP to investigate how size of the choice set affects whether the customer ultimately purchased one of the jams. 5

Unit 9- BUSN 3000

Related Documents