Lab 6 Chi-square Tests

pdf

School

Drexel University *

*We aren’t endorsed by this school

Course

411

Subject

Mathematics

Date

Apr 3, 2024

Type

pdf

Pages

2

Uploaded by BarristerMusic4065

Report
Math 410 Lab 6: Chi-Square Tests 1 Overview In this lab, we’ll learn how to implement Chi-Square tests in R. Your lab report should contain your solutions for all of the exercises. In all problems, make sure to clearly state your null and alternative hypotheses, as well as your final conclusion. You should include your R code/scripts and the output, along with any hand calculations you are asked to perform. 2 Goodness of Fit Test To perform a goodness of fit test in R, we first need to define the expected and observed frequencies. Observed frequencies should be the counts, while the expected values should be the probabilities of each event occurring. For instance, suppose we roll a die 100 times, with the following results: Result 1 2 3 4 5 6 Frequency 15 16 24 12 20 13 We want to test if the die is fair, so if the null hypothesis were true, each outcome would have a probability of 1⁄6. Therefore, in R we’d enter: observed=c(15,16,24,12,20,13) expected=c(1/6,1/6,1/6,1/6,1/6,1/6) Now, to run the Chi-Square test, enter: chisq.test(x=observed,p=expected,correct=F) R will report the value of 𝜒 2 , the 𝑝 -value, and the number of degrees of freedom. You can then reach a conclusion by comparing the 𝑝 -value to 𝛼 as usual. The 𝑝 -value you get should be pretty high, so we wouldn’t reject the null hypothesis here. One advantage of using R is that it gives us the exact 𝑝 -value, whereas with the table we only had an estimate. Note the correct=F prevents R from using a continuity correction, which we have not discussed. Exercises 1. One hundred tweetie birds were given a choice of either striped sunflower seeds or black sunflower seeds. Seventy-five chose black seeds. May we conclude that the population from which we sample was taken as a preference for black sunflower seeds over striped sunflower seeds? First do the calculations by hand, then verify with R. 2. In a previous lab, you worked through a problem similar to 5.38 in the text, and should have determined the Poisson distribution was not a good fit for the eel data. Test that conclusion now using the Chi-Square test with 𝛼 = .05. Hint: If you aren’t careful, R will give you a warning when you run Chi-Square. Think about why it is doing so, and adjust your analysis accordingly. Also, while R can give you the correct 𝜒 2 statistic, you need to be careful with the df and 𝑝 -value here. 3. Suppose a disease affects approximately 22% of the population. In 500 randomly selected families of four people, the number of people with the disease is given below.
Number of People 0 1 2 3 4 Frequency 292 30 55 70 53 A scientist proposes using the Binomial distribution to predict the number of people with the disease in a family. Use a Chi-Square test to show this is a horrible proposal for this data. Explain why the Binomial model is not effective in this context. 3 Test of Association Now we consider the test of association. For this test, we give R the data in the form of a matrix. First, define the rows of a contingency table as usual: R1=c(1,10) R2=c(9,2) Next, put the data into a matrix and run a Chi-Square test on the matrix: data=matrix(c(R1,R2),nrow=2,byrow=T) chisq.test(data, correct=F) Exercises 1. White-throated sparrows occur in two distinct color morphs, referred to as brown and white. It was suspected that females select mates of the opposite morph (i.e., white females select brown males and brown females select white males). This phenomenon is known as negative associative mating. In 30 mated pairs, the color combinations were as follows. Do the results support the hypothesis that negative assortative mating occurs in the species? Solve it on your own first, then verify the answer using R. Males White Brown Females White 7 23 Brown 14 5 2. For the mosquito fish length data from lab 3, test whether there is an association between gender and length using R. Hint: First, divide the data into bins. I recommend starting with bins of size 4 starting at 16 and extending to 56. You then may want to pool the 48-52 and 52-56 bins to get high enough expected frequencies in each cell.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help