Section 07.2 Shared Lab

pdf

School

Pennsylvania State University *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

5

Uploaded by officialcdot127

Report
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables STAT 200: Lab Activity for Section 7.2 Testing for an association between two categorical variables - Learning objectives: Formulate correct hypotheses With the theory-based Chi-square Test of Association approach: 1. calculate and interpret the expected counts 2. understand the relationship between the chi-square contributions and the chi-square statistic’s final value. 3. use statistical software such as Minitab to perform a hypothesis test 4. recognize when it is appropriate to use the theory-based approach Obtain a p-value using the Randomization (simulation): Chi-Square Test of Association approach when conditions are not met for the theory approach Activity 0: Just notice these formulas! Expected count : [(row total)*(column total)] / sample size Residual = (observed count expected count) Chi-square contribution : (observed count expected count) 2 / expected count Chi-square statistic : sum up all chi-square contributions df = (r-1)*(c-1), where r = the number of rows and c = the number of columns Activity 1: Theory-Based chi-square test, expected counts and cell contributions Data from four different Pew Research Surveys that looked at Facebook usage are summarized in Table 1. Source: https://www.pewresearch.org/ The survey question of interest, for each of the four years, is: How often do you visit Facebook? A. Less often than once a day B. About Once a day C . Several times a day Table 1: Summarized “ Observed Data from the Four Pew Research Surveys Year Less Once Several Total 2013 221 230 384 835 2018 521 465 1012 1998 2019 170 230 510 910 2021 436 330 736 1502 Total 1348 1255 2642 5245 Exploratory Data Anaylsis 1. Which type of data is summarized in the 4×3 contingency table? A. categorical B. quantitative 2. Identify the variables: Explanatory: ______________ Response: ______________ Analysis We are interested in testing the hypotheses: H 0 : There is no association between year and frequency of visiting Facebook H a : There is an association between year and frequency of visiting Facebook year How often people visit facebook
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables The survey data is displayed on the graph found below. 3. Descriptively, does the graph suggest that there is an association between year and frequency of visiting Facebook? Provide reasoning for your answer. The expected counts are displayed below in Table 2. Table 2: Expected Counts for the four Research Pew Research Surveys Year Less Once Several 2013 214.6 199.8 2018 513.5 478.1 1006.4 2019 233.9 217.7 458.4 2021 386.0 359.4 756.6 4. Calculate the expected count for the missing cell in this table. Report to one decimal place. Refer to Table 1. 5. Interpret the expected count for the cell: Visiting Facebook several times a day for the Year 2013. We would ___________ around _________ participants from the 2013 survey to say they visited Facebook at least several times a day, if there is _________ association between the two variables. Blank 1: (observe, expect) Blank 2 : (fill in calculated count) Blank 3: ( an, no) Let’s now use Minitab to complete some the calculations by first typing in the “summarized” observed data from Table 1 into a Minitab worksheet. 6. Use Minitab to obtain both the observed and expected counts. Check to see if the provided expected counts match the expected counts found in Table 2. This includes checking to see if the expected count that you calculated by hand also matches. association yes graph gres up O 0
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables 7. Have we met the conditions for the theory-based Chi-Square Test approach? Include reasoning. If met, use Minitab to obtain the chi-square statistic. 8. What is the chi-square statistic? 9. What are the degrees of freedom for the relevant distribution? How were they calculated? 10. What is the p-value (Pearson)? Sketch a picture of the p-value with labelling. Look at lecture notes or verify with Statkey when using theoretical distributions 11. Write out an interpretation of the p-value. 12. What is the appropriate conclusion for our hypothesis test? Activity 2: Perform a theory-based chi-square test Heart failure is a common event caused by cardiovascular disease. This raw data, which includes 11 clinical features that can be used to predict heart disease, is in a file called HeartDisease. The complete codebook can be found at the website provided. Source: https://www.kaggle.com/fedesoriano/heart-failure-prediction Consider two variables from this data set. Variable 1: Resting Electrocardiogram (ECG) result: Normal ST- T-wave abnormality LVH - left ventricular hypertrophy (LVH ) Variable 2 : Patient diagnosed with Heart Disease: Yes (Heart Disease) No (Normal) Exploratory Data Analysis 1. How many cases are found in this dataset? Note: if you look at the data there are some missing observations with certain variables, but not with the two under consideration. Real data sets often have missing data. Analysis 2. We want to perform a hypothesis test procedure to see if the two variables are associated or not. Write the null and alternative hypothesis of the test. Null Hypothesis: Alternative Hypothesis: Use Minitab to get necessary information to answer the questions below. 3. Have we met the conditions for the theory-based Chi-Square Test Approach? Include reasoning. * yes >5 E 0 . 000 yes evidence 918 yes - 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables If met, use Minitab to obtain the chi- square test statistic, along with each cell’s contribution to the chi - square statistic. 4. When considering each cell’s contribution to the chi -square statistics, Which cell contributes the … i. most? (Is there “over” or “under” representation in this cell) ii. least? (Is there “over” or “under” representation in this cell) 5. What is the chi-square statistic? What are the degrees of freedom? 6. What is the p-value? Sketch a picture of the p-value with labelling. 7. What is the conclusion for our hypothesis test? 8. Can we perform a chi-square test with other variables found in this data set? Table 3: Variable Pairs from the HeartDisease Data Set Variable Pair Variable 1 Variable 2 Can you perform a chi-square test? Include reasoning Pair 1 Resting BP (blood pressure) mm HG HeartDisease (yes, no) Pair 2 Age (years) MaxHR (maximum heart rate) beats/minute Pair 3 Resting Electrocardiogram (ECG) result: (Normal, ST-T wave abnormality, left ventricular hypertrophy (LVH ) ChestPain Type: (TA: typical angina, ATA: atypical angina, …., ASYS (asymptomatic) Activity 3: MBAT The purpose of this study was to gather data on the efficacy of a newly developed psychosocial group intervention for cancer patients, called mindfulness-based art therapy (MBAT). One hundred and eleven women with a variety of cancer diagnoses were randomized to either an eight-week MBAT intervention treatment group or a wait-list control group. Female subjects were recruited from a diversity of referral sources throughout the Jefferson Cancer Network that includes 16 hospitals within the Philadelphia region, although the majority of the subjects were directly referred by Jefferson’s Kimmel Cancer Center. Each subject was beyond four months and within two years of an original diagnosis of cancer (or cancer recurrence). When looking at the demographic information on race (Caucasian, African American, other), is there an association between what group a woman was randomly assigned to and their race? The data, which is raw, is in the file: OncologyStudy. Analysis observed expected = over
© Pennsylvania State University Lab 7.2: Testing for an association between two categorical variables 1. Use Minitab to see whether we have met the conditions for the theory-based Chi-Square Test? Include reasoning. 2. If there is a problem with meeting the conditions, how many cells are problematic? 3. What is the chi-square statistic (Hint: may still need to use)? If we have not met the condition for the theory-based approach, we need to do a Simulation (Randomization Chi-Square Test of Association) in StatKey. 4. Enter the data file into StatKey and then complete at least 5000 simulations to obtain the p-value. 5. The test will be [ ]: A. Left Tail B. Two-Tail C. Right Tail 6. Find the p-value using the same approach that you used in Chapter 4: simulation-based inference. 7. What is your conclusion? Why should you not be surprised by this conclusion? Below is a summary of the observed data from this study Use the summarized data to answer the following questions: 8. With the treatment group, what are the odds that the woman was a “Caucasian”? 9. For an “African American” woman, what are the odds of being randomly assigned to the control group? Source: https://doi.org/10.1002/pon.988 Daniel A. Monti, Caroline Peterson, Elisabeth J. Shakin Kunkel, Walter W. Hauck, Edward Pequignot, Lora Rhodes, George C. Brainard . A randomized, controlled trial of mindfulness-based art therapy (MBAT) for women with cancer, Psycho-Oncology , 15: 363 373 (2006) O