Lab 6 Chi-square Tests
pdf
keyboard_arrow_up
School
Drexel University *
*We aren’t endorsed by this school
Course
411
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
2
Uploaded by BarristerMusic4065
Math 410 Lab 6: Chi-Square Tests
1 Overview In this lab, we’ll learn how to implement Chi-Square tests in R. Your lab report should contain your solutions for all of the exercises. In all problems, make sure to clearly state your null and alternative hypotheses, as well as your final conclusion. You should include your R code/scripts and the output, along with any hand calculations you are asked to perform.
2 Goodness of Fit Test To perform a goodness of fit test in R, we first need to define the expected and observed frequencies. Observed frequencies should be the counts, while the expected values should be the probabilities of each event occurring. For instance, suppose we roll a die 100 times, with the following results:
Result
1
2
3
4
5
6
Frequency
15
16
24
12
20
13
We want to test if the die is fair, so if the null hypothesis were true, each outcome would have a probability of 1⁄6. Therefore, in R we’d enter: observed=c(15,16,24,12,20,13) expected=c(1/6,1/6,1/6,1/6,1/6,1/6) Now, to run the Chi-Square test, enter: chisq.test(x=observed,p=expected,correct=F) R will report the value of 𝜒
2
, the 𝑝
-value, and the number of degrees of freedom. You can then reach a conclusion by comparing the 𝑝
-value to 𝛼
as usual. The 𝑝
-value you get should be pretty high, so we wouldn’t reject the null hypothesis here. One advantage of using R is that it gives us the exact 𝑝
-value, whereas with the table we only had an estimate. Note the correct=F prevents R from using a continuity correction, which we have not discussed. Exercises 1. One hundred tweetie birds were given a choice of either striped sunflower seeds or black sunflower seeds. Seventy-five chose black seeds. May we conclude that the population from which we sample was taken as a preference for black sunflower seeds over striped sunflower seeds? First do the calculations by hand, then verify with R.
2. In a previous lab, you worked through a problem similar to 5.38 in the text, and should have determined the Poisson distribution was not a good fit for the eel data. Test that conclusion now using the Chi-Square test with 𝛼
= .05. Hint: If you aren’t careful, R will give you a warning when you run Chi-Square. Think about why it is doing so, and adjust your analysis accordingly. Also, while R can give you the correct 𝜒
2
statistic, you need to be careful with the df and 𝑝
-value here. 3. Suppose a disease affects approximately 22% of the population. In 500 randomly selected families of four people, the number of people with the disease is given below.
Number of People
0
1
2
3
4
Frequency
292
30
55
70
53
A scientist proposes using the Binomial distribution to predict the number of people with the disease in a family. Use a Chi-Square test to show this is a horrible proposal for this data. Explain why the Binomial model is not effective in this context. 3 Test of Association Now we consider the test of association. For this test, we give R the data in the form of a matrix. First, define the rows of a contingency table as usual: R1=c(1,10) R2=c(9,2) Next, put the data into a matrix and run a Chi-Square test on the matrix:
data=matrix(c(R1,R2),nrow=2,byrow=T) chisq.test(data, correct=F) Exercises 1.
White-throated sparrows occur in two distinct color morphs, referred to as brown and white. It was suspected that females select mates of the opposite morph (i.e., white females select brown males and brown females select white males). This phenomenon is known as negative associative mating. In 30 mated pairs, the color combinations were as follows. Do the results support the hypothesis that negative assortative mating occurs in the species? Solve it on your own first, then verify the answer using R.
Males
White
Brown
Females
White
7
23
Brown
14
5
2.
For the mosquito fish length data from lab 3, test whether there is an association between gender and length using R.
Hint: First, divide the data into bins. I recommend starting with bins of size 4 starting at 16 and extending to 56. You then may want to pool the 48-52 and 52-56 bins to get high enough expected frequencies in each cell.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help