Homework 02_ANOVA-1

docx

School

University of California, Merced *

*We aren’t endorsed by this school

Course

180

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

9

Uploaded by GrandFireMoose27

Report
Homework #2: Analysis of Variance Goal: perform an Analysis of Variance on a variety of datasets and interpret the ANOVA statistics. Instructions: Follow the instructions provided, filling in your answers as you go. Make sure the text of your answers is noticeably different from black Times New Roman 12 pt font (that’s what this is), so that your answers stand out. Be sure to include copies of any graphs you are asked to make. Steps that have an R output that you’ll need to copy/paste* are denoted with the symbol. As you work, you are allowed to discuss the homework with your peers, but note that you must complete all of the steps independently and answer all of the questions in your own words . Submitting work that was completed by another person is a violation of the academic honesty policy of UC Merced. * Note: after you copy/paste, change the font to “ Courier New ”; this is a fixed-width font and will keep the formatting the same as used in R. Part A – Performing an ANOVA by hand - Ecology A plant ecologist collected data regarding the height of plant species X from four different fields. Each field contained a unique type of soil, which the plants grew in. This ecologist measured the height (in inches) of three plants in each of the fields. All four fields were contained in an area of two square miles. The ecologist’s results are tabulated below: 1. What is the null hypothesis that this ecologist can address? - The ecologist can address from the given chart is that there are no differences in growth in different soils. 2. To determine whether there is a significant difference in the height of plants grown in different soils, you will perform an ANOVA analysis. Fill in the following table to help you get started. Note that MY, MF, and FY are just the differences between the values; for example, MY is the difference between the plant height (Y) and the grand mean (M). You can see Chapter 1 in the Grafen & Hails textbook for an example of filling in a table like this. An excel spreadsheet is a convenient format to do these calculations in (a template is provided on CatCourses), but you can also just use a calculator or R to complete this table. Plant Height (Y) Field Grand Mean (M) Treatment Mean (F) MY MF FY 11 A 15.75 11 -.75 -4.75 0 13 A 15.75 11 -2.75 -4.75 2 9 A 15.75 11 -6.75 -4.75 -2 16 B 15.75 20.3 0.25 4.55 -4.3 22 B 15.75 20.3 16.25 4.55 1.7 23 B 15.75 20.3 7.25 4.55 2.7 19 C 15.75 17.6 3.25 1.92 1.33 1 Fields A B C D 11 16 19 13 13 22 13 14 9 23 21 15
13 C 15.75 17.6 -2.75 1.92 -4.67 21 C 15.75 17.6 5.75 1.92 3.33 13 D 15.75 14 -2.75 -1.75 -1 14 D 15.75 14 -1.75 -1.75 0 15 D 15.75 14 -0.75 -1.75 1 Sum of Square (SS) 224.25 150.92 73.77 Degree of Freedom (DF) 11 3 8 3. Calculate the F-ratio. (Be sure to show your calculation setup) F-Ratio = SS DF 150.92 3 = ¿ 50.31 73.77 3 = ¿ 9.17 F-Ratio = 50.31 9.17 F-Ratio = 5.486 4. By using the calculated F-ratio and provided F-table (95% confidence interval), interpret your analysis. We can reject the null hypothesis as a 95% confidence interval since one of the means is different. Part B – Performing an ANOVA in R – Effect of ventilation on blood folate levels Several studies have been published to show that nitrous oxide used in anesthetics can reduce patient blood folate levels. This dataset gives measured levels of folate in red blood cells in patients who received three different methods of ventilation while under anesthesia. The first column in the dataframe, ventilation , is a categorical variable (or “factor”) containing one of the three ventilation methods for each patient: Factor levels Treatment 'N2O+O2, 24h' 50% nitrous oxide and 50% oxygen, continuously for 24 hours 'N2O+O2, op' 50% nitrous oxide and 50% oxygen, only during operation 'O2, 24h' No nitrous oxide, but 35-50% oxygen for 24 hours. The second column in the dataframe, is a numerical variable, folate , which is the folate concentration (μg/l) for each patient. [This dataset is from the “ISwR” library of datasets available in the ISwR package at www.r-project.org .] Procedure 1. First, change the directory in R (under the File menu) to wherever you saved the “red_cell” file. 2. Then read in this data set using the load() command in R . Type the following command: load("red_cell.RData") 2
Alternatively, you can have R open a file browser window as follows: load(file.choose()) 3. Once you’ve read in the data set, you can access the two data vectors in this data frame using the “$” command (i.e. red_cell$ventilation and red_cell$folate) or you can make the individual data vectors available using the command: attach(red_cell) After this command you can access the data vectors just as ventilation and folate 4. Make boxplots and stripcharts of this dataset using the following commands: boxplot(folate~ventilation) stripchart(folate~ventilation) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5. Now you will perform an analysis of variance on this data set using the lm() command in R (see box at the end of this section for more information.) folate.anova<-lm(folate~ventilation) Note that “ folate.anova ” could be any name you want to give the result of this command, and that this command assumes that you used the attach() command in step 3. 6. Print out a summary of the analysis of variance results using the anova() command: anova(folate.anova) Note: Whenever you copy and paste a text output from R into Word, you’ll need to change the font of the output to a fixed-width font such as “ Courier New ” so that the table will be properly formatted in Word just as it was in R. Analysis of Variance Table Response: folate Df Sum Sq Mean Sq F valuePr(>F) ventilation 2 15516 7757.9 3.7113 0.04359 * Residuals 19 39716 2090.3 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Analysis questions Use the information in Analysis of Variance (ANOVA) table from step 6 above to answer the following questions: a. How many model degrees of freedom? There are 2 models for degrees of freedom. b. How many error (or residual) degrees of freedom? In total there are 19 residuals. c. What is the error sum of squares (as reported in the anova table)? Calculate the mean square error (show your calculation setup) and compare with the value in the anova table. The error sum of square is 39716. 39716 19 = ¿ 2090.2 for the mean square error. d. Calculate the F-ratio from the two appropriate mean square values (show your calculation setup) and compare with the value in the table. 7757.9 2090.3 = 3.7114 This is the same value that is found on the table. e. What is the Null Hypothesis for this model? 4
The relationship between both blood folate and ventilation methods are nonexistent. f. Look at the p-value in the table. Can you reject the null hypothesis at the 90% confidence level? At the 95% confidence level? At the 99% confidence level? Since the null hypothesis is lower than 0.05 but not 0.01 the null hypothesis can be rejected at 95%. R Backgrounder: You’ll note that the lm() command gave you no specific output (unless it complained with an error message). Instead it stored the result of the lm() command in a variable (or “object”). You use various other commands in R to print out the results stored in this object. The most common commands for interrogating such an object are: anova( object ) Print out the analysis of variance table summary( object ) Print out a summary of the fit, including the fit parameters fitted( object ) Print out the predicted values for each explanatory variable residuals( object ) Print out the errors in each predicted response variable Note that the final two are often useful when plotted, e.g.: plot(resid(folate.anova)~ventilation) which shows the range of errors in predicted folate for each of the ventilation methods—we’ll see this a lot in a later part of BIO 180. Part C – ANOVA in R & converting to factors – Insulin-like growth factor vs. maturity Tanner stages (1-5) are levels of physical maturity developed by the British pediatrician James Mourilyan Tanner that are based of various indicators (height, voice changes, body hair, etc) and vary by sex. Stage 1 is least mature; stage 5 most mature. This dataset gives measured levels of Insulin-like Growth Factor (IGF) versus sex (1=male, 2=female) and Tanner stage. [This dataset is a subset of the “juul” dataset in the “faraway” library of datasets available in the ISwR package at www.r-project.org .] Procedure 1. First set the working directory from the File menu. 2. Read in the data set igf_data.RData using the load() command and attach it (like you did in the previous problem). The igf_data dataframe contains three vectors: igf, tanner, and sex . 3. Use either a boxplot or stripchart (your choice) to look at the relationship between IGF and tanner stage. 5
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Which tanner stage appears to have the lowest levels of IGF? Stage 1 (male) 4. Next make a boxplot to look at the relationship between IGF and sex. 7
8
Which sex appears to have lower IGF levels? The sex value 1. 5. In this data set numbers are used to encode the sex and tanner levels, even though these are discrete categorical levels, described in R as “factors” (i.e. a sex value of 1.2 means nothing in this data set). It is okay to use numbers to represent different categories, but it is essential that the software know that the numbers represent categories and not continuous numeric values. In R you can test this by using the command is.factor() , as follows: is.factor(tanner) If the answer is “TRUE”, then you’re okay, but if the answer is “FALSE”, the vector must be labeled as a “factor” using the command factor() . For example, if x is a numeric vector, it can be made into a factor as follows: x <- factor(x) Check to ensure that both tanner levels and sex are being treated as factors by R. 6. Run an Analysis of Variance of IGF vs. Tanner and print out the results using the anova() command: igf.tanner.anova<-lm(igf~tanner) anova(igf.tanner.anova) Analysis of Variance Table Response: igf Df Sum Sq Mean Sq F Value Pr(>F) Tanner 4. 6699782 1674946 103.48 < 2.2e-16 *** Residuals 390631243816186 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Analysis questions Look at your Analysis of Variance table for IGF vs. Tanner Level and answer the following questions. a. What is the null hypothesis being tested? IGF is not related to the tanner level. b. At the 99% confidence level, can you reject the null hypothesis? Since the p value is below 0.01 we can reject the null hypothesis. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help