Lab 9 Data Analysis and R, Dominique Pitman

pdf

School

Oregon State University, Corvallis *

*We aren’t endorsed by this school

Course

106

Subject

Geography

Date

Dec 6, 2023

Type

pdf

Pages

6

Uploaded by BrigadierIron95541

Report
Lab 9: Data Analysis and R Part 1: Getting Started 1. Set and check your current working directory. Enter the console panel output below. [1] "/cloud/project" 2. Save the Elwha estuary dataset in your R environment by following the example code you loaded into the source pane. Check the structure using str(), and then preview the first 5 rows and 6 columns. Copy and paste your output from the console when running the str() function. Field_ID Dam_Condition Date_Collected Site_Name Latitude 1 1 Before removal 6/22/2006 ES1 48.14700 2 2 Before removal 6/22/2006 ES2 48.14827 3 3 Before removal 7/20/2006 ES1 48.14700 4 4 Before removal 7/20/2006 ES2 48.14827 5 5 Before removal 8/31/2006 ES1 48.14700 3. Provide the longitudinal and latitudinal coordinates for one of the five Field_ID of your choice below. Make sure to include the Field_ID in your answer. Field_ID Longitude Latitude 4 48.147 -123.564
Part 2: Descriptive Statistics: R provides a wide range of functions for obtaining descriptive statistics 4. Walkthrough generating the descriptive statistics for pH values. Next, create a similar R code to generate descriptive statistics (five-point summary) for both temperature and turbidity. Fill in the tables below and add descriptive table title descriptions to each. Temperature ( o C) Mean 13.11597 Median 12.35 Minimum 5.79 Maximum 20.98 Standard Deviation 3.289619 Table 1: Describes the mean, median, minimum, maximum, and standard deviation of temperature through the use of RStudio Cloud. Turbidity (Nephelometric Turbidity Unit; NTU) Mean 63.23206 Median 13.9 Minimum 0.3 Maximum 305.6 Standard Deviation 88.97956 Table 2: The table shows the mean, median, minimum, maximum, and standard deviation for Turbidity which was found through the RStudio Cloud. 5. Calculate the 95% confidence interval for temperature ( o C) and write a statement below including these values to assess our confidence in the temperature mean. If the distribution is normal, we can interpret the 95% confidence that the true mean temperature is between 12.584186 and 13.647553.
Part 3: Statistical Analysis in R 6. Walkthrough the t-test determining if the mean pH values are significantly different before and during dam removal. Next, determine if there is a significant difference in the mean turbidity values before and during dam removal by conducting your own t-test. Report your results in the context of the study by interpreting the p-value and variables used. By doing my own t-test to see if there was a significant difference in the mean Turbidity values before and during dam removal shows that the Turbidity was significantly higher during dam removal when compared to the dam removal before (t=-5.7058, df=113.89, p-value=9.312e-08). 7. Walk through the example ANOVA to determine if the mean pH values are significantly between different testing sites. Next, conduct your own ANOVA to determine if there are site differences in temperature during dam removal. Report your result in the context of the study by interpreting the p-value and variables used. Using ANOVA to determine if there are site differences in temperature during dam removal, I found that there was no significant difference in temperature among the sampling sites during dam removal (F=1.862; df=4,113; p=0.122). 8. Produce a boxplot to help illustrate your results from 7. Export the image (copy to clipboard), paste it below, and include a descriptive figure caption. Figure 1: These boxplots show the outliers (the two open circles in ES2), medians which are the thick black lines in the middle of the boxes, quartiles (on the outer part of the boxes), and the minimum and maximum values, excluding outliers; which are the lines at the end of the dotted-lines.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
9. What is the null hypothesis for each of the analyses (t-test; ANOVA) that you conducted above? The null hypothesis for the t-test is: that the difference in group means is zero. The null hypothesis for ANOVA: is always that there is no difference in means. Part 4: Water quality (for you to answer on your own in complete sentences) 10. Why do you think dam removal would affect water quality parameters in a river? The definition of dam removal is to demolish a dam by returning the water flow back into a river. Knowing this, I think dam removal would affect other water quality parameters in a river because your basically taking water that doesn’t usually move from a singular area and you're releasing it back into flowing water so things like temperature could be different along with other things. 11. What is Turbidity? What do you think it is an important parameter to consider when measuring water quality? Can you think of an example of very high turbidity? Turbidity refers to a fluid's cloudiness or haziness, which is very similar to smoke being in the air. Additionally, I think Turbidity is an important metric to consider when monitoring water quality since it can detect whether there are pollutants, bacteria, or viruses in the water that could harm fish and other aquatic life, as well as their habitat. Furthermore, measuring turbidity in drinking water allows researchers and scientists to look for metals, bacteria, and other contaminants that might create Turbidity in drinking water, which can affect the taste and odor of the water. A possible example of very high Turbidity could be a river/stream/creek after a rainstorm. 12. Why do you think water quality is important? Find an example from the primary literature to support your argument. What happens when there is poor water quality? What are some of the factors that contribute to poor water quality? Water quality is essential because it allows for a healthy ecosystem and it supports the diversity of plants and wildlife. An article written by the CDC explains the importance of water quality and testing of these waters (citation at the end). Poor water quality, on the other hand, can cause health issues and weakened immune systems for us, humans, and for fish and other organisms, it can destroy habitats, affect the survival of a certain species, and possibly disturb mating rituals and other things. Some factors that contribute to poor water quality are things like temperature, pH, run-off, sedimentation, erosion, and pesticides.
13. Think about a data-driven question that you would like to answer using RStudio. Find a potential dataset and describe it - what kind of variable it contains, what it measures, and what questions you would like to use the data to answer. So for my data-driven question, I wanted to focus on animal shelters and how they keep track of all animals brought in/out, found a home, died or were euthanized, etc… The data set I found consisted of analyzing the following: 1. Intake: Number of animals going into a shelter 2. Outcome: How many animals were adopted, returned, euthanized, dead, or transferred 3. Save rate: the percentage of animals who have left the shelter alive 4. Live Release Rate: which is the ratio of total live outcomes/ total outcomes 5. Length of Stay: describes how long each particular animal stayed at a shelter I would like this data to answer questions along the lines of what could shelter do to optimize running costs, What time of year/month was the best for adoptions, euthanization, intake, etc.. and how could using data allow for shelters to continue running so that they could support each animal, even if the animal was returned? 14. What statistical test from the lab today would you use to analyze your data and why? What would be your null hypothesis? I think the best statistical test from the lab this week that I could use to analyze the data would be the ANOVA test. This is because using the ANOVA test allows for a comparison of 2+ groups at the same time so that we can determine if there is a possible relationship between them. I also think the use of visualization tools like bar plots, pie charts, and line graphs would allow us to see the relationship that could occur between two things such as the month or year and the number of adoptions. My null hypothesis is that the type of month will not amount to how many adoptions, euthanizations, returns, etc.. a shelter might have.
Citations: 1. U.S. Department of Health & Human Services. (2020, October 28). Importance of Water Quality and Testing | Public Water Systems | Drinking Water | Healthy Water | CDC . CDC.Gov. Retrieved May 27, 2022, from https://www.cdc.gov/healthywater/drinking/public/water_quality.html 2. Yadhunath, R. (2022, January 1). Saving Animal Lives with Data - Towards Data Science . Medium. Retrieved May 27, 2022, from https://towardsdatascience.com/saving-animal-lives-with-data-d815c6e854eb
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help