Unit 9- BUSN 3000

docx

School

University Of Georgia *

*We aren’t endorsed by this school

Course

3000

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

5

Uploaded by AdmiralField13659

Report
BUSN 3000 Unit 9: Models for Categorical Data Unit 9: Models for Categorical Data – Chi-squared Tests Goodness-of-fit test - one categorical variable, more than two categories Example: Where do BUSN 3000 students live? Is the distribution the same as the UGA population? Housing UGA population Observed counts (sample data) Expected counts (if Ho true) Off-campus (not UGA-owned or affiliated) 66% 22 25.08 Fraternity or Sorority housing 5.5% 14 2.09 On-campus dorms or other UGA-owned or affiliated housing 28.5% 2 10.83 Total 100% 38 38 Could these observed counts have occurred just by chance if the distribution of housing choices for BUSN 3000 students were really the same as the UGA population? H 0 : p off campus = 0.66 , p Greek = 0.055 , p on campus = 0.285 BUSN 3000 has same distribution as UGA population H A : at least one p different from what’s given The chi-squared ( χ 2 ) test statistic compares the observed counts to the expected counts. χ 2 = ( Observed count Expected count ) 2 Expected count = (22-25.08)^2/25.08 + (14-2.09)^2/2.09 + (2- 10.83)^2/10.83 When the observed counts are somewhat close to the expected counts… χ 2 is large / small and the evidence against the null hypothesis is strong / weak. When the observed counts differ greatly from the expected counts… χ 2 is large / small and the evidence against the null hypothesis is strong / weak. 1
BUSN 3000 Unit 9: Models for Categorical Data Student housing (continued) H 0 : p off campus = 0.66 , p Greek = 0.055 , p on campus = 0.285 H A : At least one p is different from these values. P-values for chi-squared tests How large must the chi-squared statistic be to convince us that the null hypothesis is not true? df = number of categories – 1 (always choose second option for chi squared) If the distribution of housing choices for BUSN students were really _____ the same as (Ho true) ______UGA students overall, sample results like ours would be… unlikely For α = 0.05 , state your conclusion in context. o There is sufficient / insufficient evidence to conclude that distribution of housing choices for BUSN students is ______ different from _______ the overall UGA population. Conducting a goodness-of-fit test using Analyze – Distribution in JMP 2
BUSN 3000 Unit 9: Models for Categorical Data Using residuals as a follow-up analysis Why is this follow-up necessary? at least one p is different form what is given (which ones? And by how much?) The residuals show which individual categories have large differences between observed and expected counts. A positive residual means observed count is ____ larger ____ than expected. A negative residual means observed count is ___ smaller ____ than expected. Values less than -2 or greater than 2 are unusual. Housing UGA population Observed counts Expected counts Deviation (obs-exp) Standardized residual Off-campus (not UGA-owned or affiliated) 66% 22 25.08 -3.08 -0.615 Fraternity or Sorority housing 5.5% 14 2.09 11.91 8.238 On-campus dorms or other UGA-owned or affiliated housing 28.5% 2 10.83 -8.83 -2.683 Total 100% 38 38 0 Checking conditions for a chi-squared test 1. Random – random selection means generalization to population (our sample may not be representative of BUSN 3000 population) Random Assignment means causation 2. Sample size large enough – expected counts must all be at least 5 (sample size condition is not met because or smallest expected value is 2.09) Two-way tables and segmented bar graphs 3 residual = observed count expected count expectedcount
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
BUSN 3000 Unit 9: Models for Categorical Data Example: Is more choice always better? A researcher set up a tasting booth for jams in a grocery store. She alternated the choice set: sometimes the booth would feature 6 different jams and sometimes 24. Is there any evidence of a relationship between size of the choice set and whether the customer stopped? p-hat small = 65/157 = 0.4013 P-hat large = 98/163 = 0.6012 There is a relationship between size fo choice set and topping in this sample Chi-squared tests for independence To test the relationship between two categorical variables, we use the chi-squared test of independence. H 0 : size of choice set not related to customers stopping in the population H A : size of choice set is related to customers stopping in the population What counts would you expect if there were no relationship between size of the choice set and whether the customer stopped? Overall = 161/320 = 50.31% stopped 50.31% of 157 = 78.99 expected count = (row total)(col total)/overall total Chi-squared test statistic and p-value χ 2 = ( Observed count Expected count ) 2 Expected count 4 Customer stopped? Yes No Total Small choice set (6) 63 94 157 Large choice set (24) 98 65 163 Total 161 159 320
BUSN 3000 Unit 9: Models for Categorical Data If the chances of stopping at the tasting booth were really ________________for large and small choice sets, sample results like our would be… For α = 0.05 , state your conclusion in context. * o There is sufficient / insufficient evidence to conclude that Conducting a test of independence using Analyze – Fit Y by X in JMP Click the red arrow below the graph (next to Contingency Table ) to show counts, percentages, etc. * This isn’t the end of the story. Try using JMP to investigate how size of the choice set affects whether the customer ultimately purchased one of the jams. 5