Problem set 7- 11:01:2023

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

101

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

11

Uploaded by BarristerElk4126

Report
PH717 Fall 2023 ©Boston University School of Public Health Problem Set 7 : Chi Square Test [100 points] Please adhere to the “Rules for Collaborating”. For most questions, you can provide a full and thoughtful answer in 1-2 short sentences. Include the code you use in RStudio to generate your responses. You will want to use the epitools R package for some of the problems. Milwaukee Area Renters Study (MARS). Mathew Desmond was the principal investigator for this study conducted during the time (2009- 2011) when he was researching his book Evicted. This is an in-person survey of 1086 households in the city of Milwaukee, all of which are renters. An adult respondent from each sampled household was interviewed about their current and previous rental situations, the current health status of members of the household, and their demographic characteristics. The data set used for this assignment is from the MARS survey dataset. It is restricted to the current rental situation of households that include 1 or more children. In addition, any rows containing missing values, non-responses, or refusals to answer are removed in these data. The data is in a file called MARS_revised.csv and there are 205 observations and 13 variables in the dataset described in the table below: Variable name Description Coding details csid Survey ID Number 1027 60208 long_houseprob Major problem in current residence left unresolved by landlord for several days (e.g. mice, no hot water, broken window) 0=no 1=yes child_asthma At least one child in household has asthma no, yes child_lead At least one child in household has lead poisoning no, yes child_ADHD At least one child in household has ADHD no, yes Child_diab At least one child in household has diabetes No, yes child_learn At least one child in household has a non-ADHD learning disability no, yes self_health Self-rated health of adult respondent poor, fair, good, very good, excellent
Race Race of respondent black, other, white Age Age of respondent Years of age ever_evicted Has the respondent ever been evicted from a residence no, yes Gender Gender of the respondent female, male Hispanic Is the respondent Hispanic or not No, yes 1. Read the dataset into R and fill in the following table with appropriate summary statistics. For dichotomous/categorical/ordinal variables enter frequency and cumulative frequency (%); for continuous variables enter mean and SD. Also, provide total (n) for all the columns. (15 points) The R code: table(Edi_3$Race, Edi_3$ever_evicted) table(Edi_3$Race) prop.table(table(Edi_3$Race, Edi_3$ever_evicted), margin=1)*100 table(Edi_3$Race, Edi_3$long_houseprob) prop.table(table(Edi_3$Race, Edi_3$long_houseprob), margin=1)*100 table(Edi_3$Race, Edi_3$child_learn) prop.table(table(Edi_3$Race, Edi_3$child_learn), margin=1)*100
Association between race and multiple outcomes Outcome Race Black (n=117) Others (n=30) White (n=58) Has the respondent ever been evicted from a residence? (yes) 26 (22.22%) 6 (20%) 15 (25.86%) Major problem in current residence being left unresolved by landlord for several days (yes) 68 (58.12%) 17 (56.67%) 29 (50%) At least one child in household has a non-ADHD learning disability (yes) 25 (21.37%) 6 (20%) 11 (18.97%) *Note: Percentages should be column percentages (for example - Among those were whites, what percentage had ever been evicted from a residence) 2. Is the adult respondents gender related to major problem in current residence being left unresolved by landlord for several days or not? A) Test the hypothesis using a level of significance of 0.05. Report the null and alternative hypothesis (in words), test statistic, degrees of freedom and conclusion (compute the test statistic and degrees of freedom by HAND). Observed Values Female Male Total Yes 88 26 114 No 73 18 91 Total 161 44 205
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Expected Values Female Male Total Yes (114*161)/205=89.53 (114*44)/205=24.47 114 No (91*161)/205=71.47 (91*44)/205=19.53 91 Total 161 44 205 𝐻 0 : major problem in current residence being left unresolved by landlord for several days or not and Gender are independent 𝐻 𝑎 : major problem in current residence being left unresolved by landlord for several days or not and Gender are not independent 𝜒 2 = (88 − 89.53) 2 89.53 + (26 − 24.47) 2 24.47 + (73 − 71.47) 2 71.47 + (18 − 19.53) 2 19.53 = 0.026 + 0.096 + 0.033 + 0.12 = 0.275 df = (2-1)*(2-1) = 1 𝜒 ?𝑟𝑖𝑡𝑖?𝑎𝑙,?𝑓=1 2 = 3.84 Because the test statistic 0.275 < 3.84 critical value, we fail to reject the null hypothesis and can conclude that major problem in current residence being left unresolved by landlord for several days or not and Gender are independent. B) Compute the prevalence ratio (and 95% confidence interval) by HAND comparing the proportion of respondents whose major problem in current residence were left unresolved by landlord for several days for respondents who are males as compared to respondents who are females. (Major problem is the outcome variable and gender is the exposure variable.) Interpret the prevalence ratio. (You may use RStudio to check your work.) (25 pts) Prevalence ratio = (26/44)/(88/161)= 1.08
The prevalence of major problem in current residence were left unresolved by landlord for several days in males is 1.08 that in females. To get the 95% confidence interval from page 115 of the textbook: ln(1.08) ± 1.96 ( 18 26 ) 44 + ( 73 88 ) 161 0.077 ± 0.283 (−0.206, 0.360) Now exponentiating this interval, (𝑒 −0.206 , 𝑒 0.360 ) (0.814, 1.434) Thus, we are 95% confident that the prevalence ratio of long_house_problem in males as compared to females is between 0.814 and 1.434. The null, or no difference, value for a prevalence ratio is 1. Because this CI includes the null value, we cannot conclude that there is a statistically significant difference in long_house_problem between males and females.
3. Is major problem by landlords being unresolved related to at least one child in family having lead poisoning? A) Test the hypothesis using a level of significance of 0.05 BY HAND. Report the null and alternative hypothesis (in words), test statistic, degrees of freedom and conclusion (compute the test statistic and degrees of freedom BY HAND). Observed Yes_problem No_problem Total Yes_lead 13 7 20 No_lead 101 84 185 Total 114 91 205 Expected Yes_problem No_problem Total Yes_lead (20*114)/205= 11.12 (20*91)/205= 8.88 20 No_lead (185*114)/205= 102.88 (185*91)/205= 82.12 185 Total 114 91 205 𝐻 0 : major problem in current residence being left unresolved by landlord for several days or not and at least one child in family having lead poisoning are independent 𝐻 𝑎 : major problem in current residence being left unresolved by landlord for several days or not and at least one child in family having lead poisoning are not independent 𝜒 2 = (13 − 11.12) 2 11.12 + (7 − 8.88) 2 8.88 + (101 − 102.88) 2 102.88 + (84 − 82.12) 2 82.12 = 0.317 + 0.397 + 0.034 + 0.043 = 0.792
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
df = (2-1)*(2-1) = 1 𝜒 ?𝑟𝑖𝑡𝑖?𝑎𝑙,?𝑓=1 2 = 3.84 Because the test statistic 0.792 < 3.84 critical value, we fail to reject the null hypothesis and can conclude that major problem in current residence being left unresolved by landlord for several days or not and at least one child in family having lead poisoning are independent. B) Compute the prevalence ratio (and 95% confidence interval) BY HAND comparing the proportion of respondents who had at least one child in their family with lead poisoning between those whose major problems were unresolved as compared to those whose major problems were resolved. (Lead poisoning is the outcome variable and major problem is the exposure variable.) Interpret the prevalence ratio. You may use RStudio to check your work. (25 pts) Prevalence ratio = (13/114)/(7/91)= 1.48 The prevalence of at least one child in their family with lead poisoning in those whose major problems were unresolved is 1.48 that in those whose major problems were resolved. To get the 95% confidence interval from page 115 in the textbook: ln(1.48) ± 1.96 ( 101 13 ) 114 + ( 84 7 ) 91 0.392 ± 0.877 (−0.485, 1.269) Now exponentiating this interval,
(𝑒 −0.485 , 𝑒 1.269 ) (0.616, 3.556) Thus, we are 95% confident that the prevalence ratio of at least one child in their family with lead poisoning in those whose major problems were unresolved as compared to those whose major problems were resolved is between 0.616 and 3.556. The null, or no difference, value for a prevalence ratio is 1. Because this CI includes the null value, we cannot conclude that there is a statistically significant difference in at least one child in their family with lead poisoning between those whose major problems were unresolved and those whose major problems were resolved. 4. Is the adult respondents race related to ever have been evicted from a residence or not? A) Test the hypothesis using R. Report the null and alternative hypothesis (in words), test statistic, degrees of freedom, p-value and conclusion (compute the test statistic and degrees of freedom using R). 𝐻 0 : Race and ever have been evicted from a residence are independent. 𝐻 𝑎 : Race and ever have been evicted from a residence are not independent. The R code: chi4a=table(Edi_3$Race, Edi_3$ever_evicted) chi4a chisq <- chisq.test(chi4a) chisq The output: Pearson's Chi-squared test data: chi4a X-squared = 0.4611, df = 2, p-value = 0.7941
Since the p-value > 0.05, we fail to reject the null hypothesis and can conclude that race and ever have been evicted from a residence are independent. B) Calculate the prevalence ratios comparing the proportion of respondents who have been evicted from a residence between those respondents who belong to other and white race to those who belong to black race using the R code below. (You will be calculating 2 prevalence ratios.) Interpret the prevalence ratios. (25 pts) R code: install.packages("epitools") library(epitools) riskratio.wald(Race,ever_evicted) The output: $data Outcome Predictor no yes Total black 91 26 117 other 24 6 30 white 43 15 58 Total 158 47 205 $measure risk ratio with 95% C.I. Predictor estimate lower upper black 1.000000 NA NA other 0.900000 0.4076839 1.986834 white 1.163793 0.6700621 2.021327
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
$p.value two-sided Predictor midp.exact fisher.exact chi.square black NA NA NA other 0.8178849 1.00000 0.7924478 white 0.5939999 0.70485 0.5925553 $correction [1] FALSE attr(,"method") [1] "Unconditional MLE & normal approximation (Wald) CI" Interpretation: The prevalence ever have been evicted from a residence in Other race is 0.9 (or 10% lower than) that in Black race. The prevalence ever have been evicted from a residence in White race is 1.16 (or 16% higher than) that in Black race. 5. A) Is major problem in current residence left unresolved by landlord for several days related to the self-rated health of respondents or not? (5) Self-rated health of adults Major problems unresolved (Yes) Excellent 21 (53.8%) Very good 27 (51.9%)
good 37 (62.7%) fair 22 (56.4%) poor 7 (43.8%) ** Note: Percentages should be row percentages i.e., how many respondents with self-rated health as excellent have major problems unresolved and the % of these respondents B) Test the hypothesis using R. Report the test statistic, degrees of freedom, p-value, and conclusion (compute the test statistic and degrees of freedom using R). (5) 𝐻 0 : self-rated health and major problem in current residence left unresolved by landlord for several days are independent. 𝐻 𝑎 : self-rated health and major problem in current residence left unresolved by landlord for several days are independent are not independent. The R code: chi5a=table(Edi_3$self_health, Edi_3$long_houseprob) chi5a chisq <- chisq.test(chi5a) chisq The output: Pearson's Chi-squared test data: chi5a X-squared = 2.4628, df = 4, p-value = 0.6513 Since the p-value > 0.05, we fail to reject the null hypothesis and can conclude that self-rated health and major problem in current residence left unresolved by landlord for several days are independent.