Biostatistics Homework 1: Analyzing Tooth Length Data in R

Biostatistics : Homework 1 Hannah Bryson Fri Jan 26 10:38:01 2024 EST Due Date: January 23, 2024 ** Instructions: One line 3, please write your name. Also change the name of the file as your l ast name B i os t at H W 1 . Rmd format. Complete the homework and upload it in G eor gi avi e w > A ss es s ment > A s si g nm ent > H W 1 . Question 1: The following data is the length of a tooth in different guniea pigs: 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 15.2, 7.0, 16.5, 16.5, 15.2, 17.3, 22.5 ,17.3, 13.6, 14.5, 18.8, 15.5, 23.6, 18.5, 33.9, 35.5, 26.4, 32.5, 26.7, 21.5, 23.3, 29.5, 15.2, 21.5, 17.6 , 9.7, 14.5, 10.0, 8.2, 9.4, 16.5, 9.7, 29.7, 23.3, 23.6, 26.4, 20.0, 25.2, 25.8, 21.2, 14.5, 27.3, 25.5, 26.4, 22.4 ,24.5, 24.8, 30.9, 26.4, 27.3, 29.4, 23.0 Part (a) In R, write the above data in a vector form and name it as T oot h L e ngt h rm ( list= ls ()) Tooth_Length = c ( 4.2 , 11.5 , 7.3 , 5.8 , 6.4 , 10.0 , 11.2 , 11.2 , 15.2 , 7.0 , 16.5 , 16.5 , 15.2 , 17.3 , 22.5 , 17.3 , 13.6 , 14.5 , 18.8 , 15.5 , 23.6 , 18.5 , 33.9 , 35.5 , 26.4 , 32.5 , 26.7 , 21.5 , 23.3 , 29.5 , 15.2 , 21.5 , 17.6 , 9.7 , 14.5 , 10.0 , 8.2 , 9.4 , 16.5 , 9.7 , 29.7 , 23.3 , 23.6 , 26.4 , 20.0 , 25.2 , 25.8 , 21.2 , 14.5 , 27.3 , 25.5 , 26.4 , 22.4 , 24.5 , 24.8 , 30.9 , 26.4 , 27.3 , 29.4 , 23.0 ) ## copy and paste above data in () Part (b) Use R to sort the data T oot h L e ngt h . What is the maximum and minimum tooth length? i.e, sor t ( T oot h L e ngt h ) ,ma x ( T oot h L en gt h ) ,mi n ( T o ot h L en gt h ) sort (Tooth_Length) ## [1] 4.2 5.8 6.4 7.0 7.3 8.2 9.4 9.7 9.7 10.0 10.0 11.2 11.2 11.5 13.6 ## [16] 14.5 14.5 14.5 15.2 15.2 15.2 15.5 16.5 16.5 16.5 17.3 17.3 17.6 18.5 18.8 ## [31] 20.0 21.2 21.5 21.5 22.4 22.5 23.0 23.3 23.3 23.6 23.6 24.5

24.8 25.2 25.5 ## [46] 25.8 26.4 26.4 26.4 26.4 26.7 27.3 27.3 29.4 29.5 29.7 30.9 32.5 33.9 35.5 max (Tooth_Length) ## [1] 35.5 min (Tooth_Length) ## [1] 4.2 Part (c) Write the data T oot h L e ngt h in column format using data.frame and name it as T oot h . i.e, T oot h = d at a.f r am e ( T oot h L e ngt h ) Tooth = data.frame (Tooth_Length) head (Tooth) ## Tooth_Length ## 1 4.2 ## 2 11.5 ## 3 7.3 ## 4 5.8 ## 5 6.4 ## 6 10.0 Part (d) In the data Tooth, attach a new column named T l en gt h as following: If the length is less than 14.1, Tlength is shor t , if the length is more than 14.1 and less than 24.3, Tlength is m ed ium and if the length is greater than 24.3, Tlength is long. # Tooth$Tlength=cut(Tooth\ $Tooth_Length,breaks=c(0,14.1,24.3,40),labels=c("short","medium","long ")) Tooth $ Tlength = cut (Tooth_Length, breaks = c ( 0 , 14.1 , 24.3 , 40 ), labels = c ( 'short' , 'medium' , 'long' ), include.lowest= TRUE ) head (Tooth) ## Tooth_Length Tlength ## 1 4.2 short ## 2 11.5 short ## 3 7.3 short ## 4 5.8 short ## 5 6.4 short ## 6 10.0 short

Part (e) Use R to construct a table for the column T oot h $T l en gt h . How many of them have short, medium and long tooth length? ie, T ab 1 = t abl e ( T oot h $T l en gt h ) Tab1 = table (Tooth[, 2 ]) head (Tab1) ## ## short medium long ## 15 26 19 Part (f) Draw a barplot for the table of column named Tlength. Can you change the colors of the bars? barplot(Tab1,col=c(“black”,“blue”,“green”),main=” Bar plot of Tooth length”,xlab=“Tooth size”, ylab=“number of animals”). barplot (Tab1, col = ( 9 : 14 ), main = 'Bar Plot of Tooth Length' , xlab = 'Tooth Size' , ylab = '# of animals' ) Part (g) Find mean, median, quartiles, 80th percentile and variance of the data. mean (Tooth_Length) ## [1] 19.31333

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

median (Tooth_Length) ## [1] 19.4 quantile (Tooth_Length) ## 0% 25% 50% 75% 100% ## 4.200 14.275 19.400 25.575 35.500 quantile (Tooth_Length,. 80 ) ## 80% ## 26.4 var (Tooth_Length) ## [1] 61.29507 Question 2: Theophylline is drug used to prevent and treat wheezing, shortness of breath, and chest tightness caused by asthma, chronic bronchitis, emphysema, and other lung diseases. It relaxes and opens air passages in the lungs, making it easier to breathe. The R builtin data set T he o ph consists of 132 rows and 5 column. 12 individuals were administered the drug and its concentration in the blood were measured at different time points within 25 hours. Part (a) In R, write A=Theoph. ## Part (b) Use R to sort the data in the fifth column and write the maximum and minimum concentration of the drug. A = Theoph head (A) ## Subject Wt Dose Time conc ## 1 1 79.6 4.02 0.00 0.74 ## 2 1 79.6 4.02 0.25 2.84 ## 3 1 79.6 4.02 0.57 6.57 ## 4 1 79.6 4.02 1.12 10.50 ## 5 1 79.6 4.02 2.02 9.66 ## 6 1 79.6 4.02 3.82 8.58 sort (A[, 5 ]) ## [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.15 0.24 0.74 ## [13] 0.85 0.86 0.90 0.92 1.05 1.12 1.15 1.15 1.17 1.25 1.25 1.29 ## [25] 1.57 1.72 1.89 2.02 2.35 2.42 2.69 2.78 2.84 2.89 3.00 3.01 ## [37] 3.05 3.05 3.08 3.16 3.28 3.46 3.53 3.62 3.70 3.96

4.02 4.11 ## [49] 4.19 4.24 4.37 4.39 4.40 4.45 4.55 4.57 4.57 4.60 4.73 4.86 ## [61] 4.90 4.94 5.02 5.22 5.22 5.25 5.30 5.33 5.40 5.53 5.63 5.66 ## [73] 5.67 5.68 5.78 5.87 5.88 5.90 5.94 6.08 6.11 6.20 6.32 6.33 ## [85] 6.41 6.44 6.57 6.58 6.59 6.59 6.66 6.81 6.85 6.88 6.89 6.90 ## [97] 7.09 7.09 7.14 7.14 7.24 7.31 7.37 7.47 7.50 7.54 7.56 7.56 ## [109] 7.80 7.82 7.83 7.91 8.00 8.02 8.20 8.31 8.33 8.36 8.38 8.57 ## [121] 8.58 8.60 8.74 9.03 9.18 9.33 9.66 9.72 9.75 10.21 10.50 11.40 min (A[, 5 ]) ## [1] 0 max (A[, 5 ]) ## [1] 11.4 Part (c) In data set A, attach a new column named T h.l ev el as following: If the concentration is less than 3, Th.level is l ow , if the concentration is more than 3 and less than 7, Th.level is m ed ium and if the concentration is greater than 7, Th.level is high. A $ Th.level = cut (A $ conc, breaks = c ( 0 , 3 , 7 , 20 ), labels = c ( 'low' , 'medium' , 'high' ), include.lowest = TRUE ) head (A) ## Subject Wt Dose Time conc Th.level ## 1 1 79.6 4.02 0.00 0.74 low ## 2 1 79.6 4.02 0.25 2.84 low ## 3 1 79.6 4.02 0.57 6.57 medium ## 4 1 79.6 4.02 1.12 10.50 high ## 5 1 79.6 4.02 2.02 9.66 high ## 6 1 79.6 4.02 3.82 8.58 high Part (d) Use R to construct a table for the column A$ T h.le v el . How many data are in small, medium and high Theophylline level? Tab2 = table (A $ Th.level) head (Tab2)

## ## low medium high ## 35 61 36 Part (e) Draw a barplot for the table of column named Th.level. Can you write the title and x and y labels of the plot? barplot ((Tab2), col = ( 2 : 4 ), main = 'Theophylline level' , xlab = 'Theophylline level' , ylab = '# of people' ) Question 3 The ‘diabetes.csv’ data consist of 19 variables on 403 subjects from 1046 subjects who were interviewed in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia for African Americans. According to Dr John Hong, Diabetes Mellitus Type II (adult onset diabetes) is associated most strongly with obesity. The waist/hip ratio may be a predictor in diabetes and heart disease. DM II is also agssociated with hypertension - they may both be part of “Syndrome X”. The 403 subjects were the ones who were actually screened for diabetes. Glycosolated hemoglobin > 7.0 is usually taken as a positive diagnosis of diabetes.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

part (a) Download the d iab et e s.c s v file from your Geogriaview. Then read the file in R and assign the name A. i.e, A=read.csv(‘……..’, header=TRUE) A = read.csv ( 'diabetes.csv' , header= TRUE ) head (A, n= 5 ) ## id chol stab.glu hdl ratio glyhb location age gender height weight frame ## 1 1000 203 82 56 3.6 4.31 Buckingham 46 female 62 121 medium ## 2 1001 165 97 24 6.9 4.44 Buckingham 29 female 64 218 large ## 3 1002 228 92 37 6.2 4.64 Buckingham 58 female 61 256 large ## 4 1003 78 93 12 6.5 4.63 Buckingham 67 male 67 119 large ## 5 1005 249 90 28 8.9 7.72 Buckingham 64 male 68 183 medium ## bp.1s bp.1d bp.2s bp.2d waist hip time.ppn ## 1 118 59 NA NA 29 38 720 ## 2 112 68 NA NA 46 48 360 ## 3 190 92 185 92 49 57 180 ## 4 110 50 NA NA 33 38 480 ## 5 138 80 NA NA 44 41 300 part (b) Replace any blank data with NA. Then, list the columns that have NA values. You will notice that the columns c hol and w ai s t are two of the many columns with NA values. Find which entries in those two columns have NA values. You can use w hi c h ( i s .n a ( A $ .... ) ) ## First, let us replace any blank by NA is.na (A) <- A == "" ##This is the code to replace blanks by NA ## list columns with at least one NA colnames (A)[ apply ( is.na (A), 2 , any)] ##Displays any column that has at least one NA ## [1] "chol" "hdl" "ratio" "glyhb" "height" "weight" ## [7] "frame" "bp.1s" "bp.1d" "bp.2s" "bp.2d" "waist" ## [13] "hip" "time.ppn" ## Find which rows of hdl and waist have NA values which ( is.na (A $ chol)) ## row 28 ## [1] 28 which ( is.na (A $ waist)) ## row 337 and 394

## [1] 337 394 part (c) Use d at a. f r am e to construct a data set C with c hol and w ai s t columns only. C = data.frame (A $ chol,A $ waist) colnames (C) = c ( "chol" , "waist" ) head (C) ## chol waist ## 1 203 29 ## 2 165 46 ## 3 228 49 ## 4 78 33 ## 5 249 44 ## 6 248 36 part (d) Add a column named c hol l ev el in the data set C such that is c hol < 150 mark it gr eat , if 150 ≤chol < 210 mark it O K and if hd l > ¿ 210 mark it At Ri sk . C $ chol_level = cut (C $ chol, breaks = c ( 0 , 150 , 210 , 500 ), labels = c ( 'great' , 'okay' , 'at-risk' ), include.lowest = TRUE ) head (C) ## chol waist chol_level ## 1 203 29 okay ## 2 165 46 okay ## 3 228 49 at-risk ## 4 78 33 great ## 5 249 44 at-risk ## 6 248 36 at-risk part (e) Find how many people have great, OK and At Risk of c hol level using t abl e function. Tab3 = table (C $ chol_level) head (Tab3) ## ## great okay at-risk ## 30 197 175 part (f) Draw a barplot for the table of c hol l ev el . barplot (Tab3, col = ( 2 : 4 ), main = 'Chol Levels' , xlab = 'Chol levels' , ylab = '# of people' )

part (g) Draw pie-chart for the table of c hol l ev el . Also, draw the corresponding 3D plot. pie (Tab3, labels= Tab3, col= 2 : 4 , main= "Chol Levels" )

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

library (plotrix) #### for 3D pie chart pie3D (Tab3, labels= Tab3, col= 2 : 4 , main= "Chol Levels" , radius= 2 , explode= 0.2 , labelcex= 1.5 )

part (h) Find the mean, quartiles, variances of the column ‘glyhb’ in the diabetes data set. mean (A $ glyhb, na.rm= TRUE ) ## [1] 5.589769 quantile (A $ glyhb, na.rm= TRUE ) ## 0% 25% 50% 75% 100% ## 2.68 4.38 4.84 5.60 16.11 var (A $ glyhb, na.rm= TRUE ) ## [1] 5.029232

Bryson_Biostat_HW1

Related Documents