Bryson_Biostat_HW2

docx

School

Abraham Baldwin Agricultural College *

*We aren’t endorsed by this school

Course

3000

Subject

Health Science

Date

Feb 20, 2024

Type

docx

Pages

8

Uploaded by SuperHumanProton12926

Report
Biostatistics : Homework 2 Hannah Bryson Tue Feb 6 09:42:09 2024 EST Due Date: 2/5/2023 Instructions: For all hypothesis test problems, write H0 and H1 in words starting with ##. After the appropriate hypothesis test code, type the p-value and write decision and conclusion Question 1 (50 points) The following data is the length of a tooth (in mm) for 60 different guinea pigs: 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16.5, 16.5, 15.2, 17.3, 22.5 ,17.3, 13.6, 14.5, 18.8, 15.5, 23.6, 18.5, 33.9, 35.5, 26.4, 32.5, 26.7, 21.5, 23.3, 29.5, 15.2, 21.5,17.6 , 9.7, 14.5, 10.0, 8.2, 9.4, 16.5, 9.7, 19.7, 23.3, 23.6, 26.4, 20.0, 25.2, 25.8, 21.2,14.5, 27.3, 25.5, 26.4, 22.4 ,24.5, 24.8, 30.9, 26.4, 27.3, 29.4, 23.0 a) In R, write the above data in a vector form. Find the sample mean, quartiles and variance of the tooth length. ## If you need to use R, type your code here Tooth = c ( 4.2 , 11.5 , 7.3 , 5.8 , 6.4 , 10.0 , 11.2 , 11.2 , 5.2 , 7.0 , 16.5 , 16.5 , 15.2 , 17.3 , 22.5 , 17.3 , 13.6 , 14.5 , 18.8 , 15.5 , 23.6 , 18.5 , 33.9 , 35.5 , 26.4 , 32.5 , 26.7 , 21.5 , 23.3 , 29.5 , 15.2 , 21.5 , 17.6 , 9.7 , 14.5 , 10.0 , 8.2 , 9.4 , 16.5 , 9.7 , 19.7 , 23.3 , 23.6 , 26.4 , 20.0 , 25.2 , 25.8 , 21.2 , 14.5 , 27.3 , 25.5 , 26.4 , 22.4 , 24.5 , 24.8 , 30.9 , 26.4 , 27.3 , 29.4 , 23.0 ) mean (Tooth) ## [1] 18.98 quantile (Tooth) ## 0% 25% 50% 75% 100% ## 4.200 13.075 19.250 25.275 35.500 var (Tooth) ## [1] 62.44536 b) Write the data in column format using data.frame and name it as Tooth. ## If you need to use R, type your code here B = data.frame (Tooth) head (B)
## Tooth ## 1 4.2 ## 2 11.5 ## 3 7.3 ## 4 5.8 ## 5 6.4 ## 6 10.0 c) In the data, attach a new column named Tlength as following: If the length is less than 14, Tlength is short, if the length is more than or equal to 14 and less than 20, Tlength is medium and if the length is greater than 20, Tlength is long. ## If you need to use R, type your code here\ B $ Tlength = cut (Tooth, breaks = c ( 0 , 14 , 20 , 40 ), labels = c ( 'short' , 'medium' , 'long' ), include.lowest= TRUE ) head (B) ## Tooth Tlength ## 1 4.2 short ## 2 11.5 short ## 3 7.3 short ## 4 5.8 short ## 5 6.4 short ## 6 10.0 short d) Use R to construct a table for the column Tooth$Tlength. How many of them have short, medium, and long tooth length? ## If you need to use R, type your code here Tab1 = table (B $ Tlength) Tab1 ## ## short medium long ## 16 16 28 e) Test the following hypothesis for α = 0.05. Note that for each of the problem, you need to write H0 and H1 and comment out. e1) Test if the average tooth length is different from 20mm given the population standard deviation is σ = 5.36. ## If you need to use R, type your code here ##H0: mu=20 ##H1: mu=/=20 library (TeachingDemos) z.test (B $ Tooth, mu= 20 , stdev= 5.36 , sig.level= 0.05 , alternative = "two.sided" ) ## ## One Sample z-test ## ## data: B$Tooth
## z = -1.474, n = 60.00000, Std. Dev. = 5.36000, Std. Dev. of the sample ## mean = 0.69197, p-value = 0.1405 ## alternative hypothesis: true mean is not equal to 20 ## 95 percent confidence interval: ## 17.62376 20.33624 ## sample estimates: ## mean of B$Tooth ## 18.98 e2) Test if the average tooth length is less than 20mm ( σ is unknown). ## H0: mu=20 ## H1: mu<20 ## If you need to use R, type your code here ## Use T Test when sigma is unknown t.test (B $ Tooth, mu= 20 , stdev= NA , sig.level= 0.05 , alternative = "less" ) ## ## One Sample t-test ## ## data: B$Tooth ## t = -0.99983, df = 59, p-value = 0.1607 ## alternative hypothesis: true mean is less than 20 ## 95 percent confidence interval: ## -Inf 20.68481 ## sample estimates: ## mean of x ## 18.98 e3) From the table, we see that there are only ……. guinea pigs with tooth length `short’. Test the claim that exactly 40% of the guinea pigs have short tooth length. ## If you need to use R, type your code here ## H0: P=.40 ## H1: P=\=0.40 prop.test ( x= 16 , n= 60 , conf.level = 0.95 , p= 0.40 , alternative = "two.sided" ) ## ## 1-sample proportions test with continuity correction ## ## data: 16 out of 60, null probability 0.4 ## X-squared = 3.9062, df = 1, p-value = 0.04811 ## alternative hypothesis: true p is not equal to 0.4 ## 95 percent confidence interval: ## 0.1645226 0.3989020 ## sample estimates: ## p ## 0.2666667
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
e4) Test if the variance of the tooth length of guinea pigs is different from 39.69mm. ## If you need to use R, type your code here library (EnvStats) ## ## Attaching package: 'EnvStats' ## The following objects are masked from 'package:stats': ## ## predict, predict.lm varTest (B $ Tooth, sigma.squared = 39.69 , conf.level = . 95 , alternative = "two.sided" ) ## ## Results of Hypothesis Test ## -------------------------- ## ## Null Hypothesis: variance = 39.69 ## ## Alternative Hypothesis: True variance is not equal to 39.69 ## ## Test Name: Chi-Squared Test on Variance ## ## Estimated Parameter(s): variance = 62.44536 ## ## Data: B$Tooth ## ## Test Statistic: Chi-Squared = 92.8263 ## ## Test Statistic Parameter: df = 59 ## ## P-value: 0.006503881 ## ## 95% Confidence Interval: LCL = 44.86596 ## UCL = 92.89217 Question 2 (50 points) The following data is taken from `Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset’ from https://hbiostat.org/data/ . You can click the link and see the details. The data consists of 3504 patients and 6 variables. The patients were referred to Duke University Medical Center for chest pain. The following 6 variables are used: Sex: 0=male, 1=female (categorical data) age: years (numerical data)
cad.dur=Duration of Symptoms of Coronary Artery Disease in days (numerical) choleste= Cholesterol in mg% (numerical) sigdz= Significant Coronary Disease by Cardiac Cath (0=no, 1=yes, categorical) tvdlm=Three Vessel or Left Main Disease by Cardiac Cath (0=no, 1=yes, categorical) i) Download the data `acath.csv’ from Georgiaview and save it in the same folder of this homework ii) Using ‘read.csv (“…”), read the data and name it as ’A’. Using d im ( A ) ## If you need to use R, type your code here C = read.csv ( 'acath.csv' , header= TRUE ) head (C) ## sex age cad.dur choleste sigdz tvdlm ## 1 0 73 132 268 1 1 ## 2 0 68 85 120 1 1 ## 3 0 54 45 NA 1 0 ## 4 1 58 86 245 0 0 ## 5 1 56 7 269 0 0 ## 6 0 64 0 NA 1 0 ii) find the number of rows and columns (size of data A). Also find the names of the columns with NA values using nam e s ( whic h ( c ol Sum s ( i s.na ( A ) ) > 0 ) ) ## If you need to use R, type your code here dim (C) ## [1] 3504 6 colnames (C)[ apply ( is.na (C), 2 , any)] ## [1] "choleste" "tvdlm" iii) Let B be the data without all rows (of A) with NA i.e, B = n a.o mit ( A ) . Find the size of the data B. ## If you need to use R, type your code here D = na.omit (C) head (D) ## sex age cad.dur choleste sigdz tvdlm ## 1 0 73 132 268 1 1 ## 2 0 68 85 120 1 1 ## 4 1 58 86 245 0 0 ## 5 1 56 7 269 0 0
## 8 0 41 15 247 1 0 ## 12 0 35 44 257 0 0 dim (D) ## [1] 2258 6 iv) Using subset' function, let E’ be the data for male only. i.e, E=subset(B,~sex==`0’). Similarly, G is the data for female only. ## If you need to use R, type your code here E = subset (D,sex == '0' ) head (E) ## sex age cad.dur choleste sigdz tvdlm ## 1 0 73 132 268 1 1 ## 2 0 68 85 120 1 1 ## 8 0 41 15 247 1 0 ## 12 0 35 44 257 0 0 ## 14 0 58 7 168 1 0 ## 15 0 81 2 246 1 1 G = subset (D,sex == '1' ) head (G) ## sex age cad.dur choleste sigdz tvdlm ## 4 1 58 86 245 0 0 ## 5 1 56 7 269 0 0 ## 21 1 52 30 240 0 0 ## 24 1 57 30 261 0 0 ## 32 1 59 3 200 1 0 ## 34 1 58 1 246 1 1 v) Find the average `cad.dur’ of male and female patients. Similarly find the average cholesterol for male and female patients. ## If you need to use R, type your code here mean (E $ cad.dur) ## [1] 41.64245 mean (G $ cad.dur) ## [1] 42.51669 mean (E $ choleste) ## [1] 226.9242 mean (G $ choleste) ## [1] 236.7692
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
vi) Find the average age' of patients who have Significant Coronary Disease by Cardiac Cath. Similarly find the average age’ for patients who does not have tvdlm. ## If you need to use R, type your code here H = subset (D,sigdz == '1' ) head (H) ## sex age cad.dur choleste sigdz tvdlm ## 1 0 73 132 268 1 1 ## 2 0 68 85 120 1 1 ## 8 0 41 15 247 1 0 ## 14 0 58 7 168 1 0 ## 15 0 81 2 246 1 1 ## 16 0 58 79 221 1 1 mean (H $ age) ## [1] 52.29128 I = subset (D,tvdlm == '0' ) head (I) ## sex age cad.dur choleste sigdz tvdlm ## 4 1 58 86 245 0 0 ## 5 1 56 7 269 0 0 ## 8 0 41 15 247 1 0 ## 12 0 35 44 257 0 0 ## 14 0 58 7 168 1 0 ## 18 0 47 6 272 1 0 mean (I $ age) ## [1] 49.35896 vii) For α = 0.05 , test if the average cholesterol of female patients is greater than 235. ## If you need to use R, type your code here ## H0: mu=235 ## H0: mu>235 (right tailed test) library (TeachingDemos) t.test (G $ choleste, mu= 235 , stdev= NA , sig.level= 0.05 , alternative = "greater" ) ## ## One Sample t-test ## ## data: G$choleste ## t = 0.81596, df = 688, p-value = 0.2074 ## alternative hypothesis: true mean is greater than 235 ## 95 percent confidence interval: ## 233.1979 Inf ## sample estimates:
## mean of x ## 236.7692 viii) For α = 0.01 , test if the average `cad.dur’ of male patients is less than 41. ## If you need to use R, type your code here ## H0: mu=41 ## H1: mu<41 (Left tailed test) library (TeachingDemos) t.test (E $ cad.dur, mu= 41 , stdev= NA , sig.level= 0.01 , alternative = "less" ) ## ## One Sample t-test ## ## data: E$cad.dur ## t = 0.4685, df = 1568, p-value = 0.6803 ## alternative hypothesis: true mean is less than 41 ## 95 percent confidence interval: ## -Inf 43.89936 ## sample estimates: ## mean of x ## 41.64245