Lab 9 Data Analysis and R, Dominique Pitman
pdf
keyboard_arrow_up
School
Oregon State University, Corvallis *
*We aren’t endorsed by this school
Course
106
Subject
Geography
Date
Dec 6, 2023
Type
Pages
6
Uploaded by BrigadierIron95541
Lab 9: Data Analysis and R
Part 1: Getting Started
1.
Set and check your current working directory. Enter the console panel output below.
[1] "/cloud/project"
2.
Save the Elwha estuary dataset in your R environment by following the example code
you loaded into the source pane. Check the structure using str(), and then preview the
first 5 rows and 6 columns. Copy and paste your output from the console when running
the str() function.
Field_ID
Dam_Condition Date_Collected Site_Name Latitude
1
1 Before removal
6/22/2006
ES1 48.14700
2
2 Before removal
6/22/2006
ES2 48.14827
3
3 Before removal
7/20/2006
ES1 48.14700
4
4 Before removal
7/20/2006
ES2 48.14827
5
5 Before removal
8/31/2006
ES1 48.14700
3.
Provide the longitudinal and latitudinal coordinates for one of the five Field_ID of your
choice below. Make sure to include the Field_ID in your answer.
Field_ID
Longitude
Latitude
4
48.147
-123.564
Part 2: Descriptive Statistics: R provides a wide range of functions for obtaining
descriptive statistics
4.
Walkthrough generating the descriptive statistics for pH values. Next, create a similar R
code to generate descriptive statistics (five-point summary) for both temperature and
turbidity. Fill in the tables below and add descriptive table title descriptions to each.
Temperature (
o
C)
Mean
13.11597
Median
12.35
Minimum
5.79
Maximum
20.98
Standard Deviation
3.289619
Table 1:
Describes the mean, median, minimum, maximum, and standard deviation of
temperature through the use of RStudio Cloud.
Turbidity (Nephelometric Turbidity Unit; NTU)
Mean
63.23206
Median
13.9
Minimum
0.3
Maximum
305.6
Standard Deviation
88.97956
Table 2:
The table shows the mean, median, minimum, maximum, and standard deviation for
Turbidity which was found through the RStudio Cloud.
5.
Calculate the 95% confidence interval for temperature (
o
C) and write a statement below
including these values to assess our confidence in the temperature mean.
If the distribution is normal, we can interpret the 95% confidence that the true mean
temperature is between 12.584186 and 13.647553.
Part 3: Statistical Analysis in R
6.
Walkthrough the t-test determining if the mean pH values are significantly different
before and during dam removal. Next, determine if there is a significant difference in the
mean turbidity values before and during dam removal by conducting your own t-test.
Report your results in the context of the study by interpreting the p-value and variables
used.
By doing my own t-test to see if there was a significant difference in the mean Turbidity values
before and during dam removal shows that the Turbidity was significantly higher during dam
removal when compared to the dam removal before (t=-5.7058, df=113.89,
p-value=9.312e-08).
7.
Walk through the example ANOVA to determine if the mean pH values are significantly
between different testing sites. Next, conduct your own ANOVA to determine if there are
site differences in temperature during dam removal. Report your result in the context of
the study by interpreting the p-value and variables used.
Using ANOVA to determine if there are site differences in temperature during dam removal, I
found that there was no significant difference in temperature among the sampling sites during
dam removal (F=1.862; df=4,113; p=0.122).
8.
Produce a boxplot to help illustrate your results from 7. Export the image (copy to
clipboard), paste it below, and include a descriptive figure caption.
Figure 1:
These boxplots show the outliers (the two open circles in ES2), medians which are
the thick black lines in the middle of the boxes, quartiles (on the outer part of the boxes), and
the minimum and maximum values, excluding outliers; which are the lines at the end of the
dotted-lines.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9.
What is the null hypothesis for each of the analyses (t-test; ANOVA) that you conducted
above?
The null hypothesis for the t-test is: that the difference in group means is zero.
The null hypothesis for ANOVA: is always that there is no difference in means.
Part 4: Water quality (for you to answer on your own in complete sentences)
10.
Why do you think dam removal would affect water quality parameters in a river?
The definition of dam removal is to demolish a dam by returning the water flow back into a
river. Knowing this, I think dam removal would affect other water quality parameters in a river
because your basically taking water that doesn’t usually move from a singular area and you're
releasing it back into flowing water so things like temperature could be different along with
other things.
11.
What is Turbidity? What do you think it is an important parameter to consider when
measuring water quality? Can you think of an example of very high turbidity?
Turbidity refers to a fluid's cloudiness or haziness, which is very similar to smoke being in the
air. Additionally, I think Turbidity is an important metric to consider when monitoring water
quality since it can detect whether there are pollutants, bacteria, or viruses in the water that
could harm fish and other aquatic life, as well as their habitat. Furthermore, measuring turbidity
in drinking water allows researchers and scientists to look for metals, bacteria, and other
contaminants that might create Turbidity in drinking water, which can affect the taste and odor
of the water. A possible example of very high Turbidity could be a river/stream/creek after a
rainstorm.
12.
Why do you think water quality is important? Find an example from the primary literature
to support your argument. What happens when there is poor water quality? What are
some of the factors that contribute to poor water quality?
Water quality is essential because it allows for a healthy ecosystem and it supports the
diversity of plants and wildlife. An article written by the CDC explains the importance of water
quality and testing of these waters (citation at the end). Poor water quality, on the other hand,
can cause health issues and weakened immune systems for us, humans, and for fish and other
organisms, it can destroy habitats, affect the survival of a certain species, and possibly disturb
mating rituals and other things. Some factors that contribute to poor water quality are things
like temperature, pH, run-off, sedimentation, erosion, and pesticides.
13.
Think about a data-driven question that you would like to answer using RStudio. Find a
potential dataset and describe it - what kind of variable it contains, what it measures, and
what questions you would like to use the data to answer.
So for my data-driven question, I wanted to focus on animal shelters and how they keep track
of all animals brought in/out, found a home, died or were euthanized, etc… The data set I
found consisted of analyzing the following:
1.
Intake:
Number of animals going into a shelter
2.
Outcome:
How many animals were adopted, returned, euthanized, dead, or
transferred
3.
Save rate:
the percentage of animals who have left the shelter alive
4.
Live Release Rate:
which is the ratio of total live outcomes/ total outcomes
5.
Length of Stay:
describes how long each particular animal stayed at a shelter
I would like this data to answer questions along the lines of what could shelter do to optimize
running costs, What time of year/month was the best for adoptions, euthanization, intake, etc..
and how could using data allow for shelters to continue running so that they could support each
animal, even if the animal was returned?
14.
What statistical test from the lab today would you use to analyze your data and why?
What would be your null hypothesis?
I think the best statistical test from the lab this week that I could use to analyze the data would
be the ANOVA test. This is because using the ANOVA test allows for a comparison of 2+
groups at the same time so that we can determine if there is a possible relationship between
them. I also think the use of visualization tools like bar plots, pie charts, and line graphs would
allow us to see the relationship that could occur between two things such as the month or year
and the number of adoptions. My null hypothesis is that the type of month will not amount to
how many adoptions, euthanizations, returns, etc.. a shelter might have.
Citations:
1.
U.S. Department of Health & Human Services. (2020, October 28).
Importance of Water Quality
and Testing | Public Water Systems | Drinking Water | Healthy Water | CDC
. CDC.Gov.
Retrieved May 27, 2022, from
https://www.cdc.gov/healthywater/drinking/public/water_quality.html
2.
Yadhunath, R. (2022, January 1).
Saving Animal Lives with Data - Towards Data Science
.
Medium. Retrieved May 27, 2022, from
https://towardsdatascience.com/saving-animal-lives-with-data-d815c6e854eb
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help