Wk8_assignmentSP24

docx

School

Phoenix College *

*We aren’t endorsed by this school

Course

324

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by ProfessorFog12950

MAR2801 Week 8 In-Class Assignment Please highlight or change the text color of your answers and code to make them distinct from the questions. Light blue highlighted code is necessary, but showing the output is not. Gray highlighted code is necessary AND you are required to show the output. You only need the install.packages code if you have not already installed tidyverse on your computer. Otherwise, just use the library command. install.packages(“tidyverse”) library(tidyverse) 1. (3 Points) We want to investigate the association between the sepal width and the sepal length of irises (the flower, not the colored part of the eye). a. What would be the most appropriate plot to use? Does this plot use central tendencies and spreads, or raw data? b. What statistical test should be used here, given that we are only investigating an association? c. Would you be able to swap the axes? 2. (4 Points) We are investigating the relationship between the girth and height of black cherry trees. We are using the trees dataset built into R to create a plot to investigate this relationship. Include the generated plot in your answer. a. What can you deduce from this graph about the relationship between girth and height of black cherry trees? b. Look at the trend line and its 95% confidence interval (gray area surrounding both sides). Where is the highest degree of confidence in our trend line (think about what a confidence interval means)? Be as specific as you can. c. What statistical test should be used here, given the context and trend line? d. Would you be able to swap the axes? ggplot(trees,aes(x=Girth,y=Height))+ geom_point()+ geom_smooth(method="lm")+ labs(x="Girth (in.)", y="Height (ft)", title="Relationship Between Grith and Height of Black Cherry Trees")+ theme_classic()

3. (2.5 Points) We are investigating the relationship between diamond carat, cut, and price. We are using the diamonds dataset built into the tidyverse to create a plot to investigate this relationship. Include the generated plot in your answer. a. What cut quality has the highest price increase by carat? b. Carats are the unit of weight for diamonds and other gems. How heavy is the heaviest diamond in the dataset? What is its cut quality? You will likely want to use a command we learned at the beginning of the course to look at the diamonds dataset. You do not need to show code for how you found this answer. c. What is potentially misleading about the trend lines on this graph? ggplot(diamonds,aes(x=carat,y=price))+ geom_point(alpha=0.8,color="gray",fill="gray")+ geom_smooth(aes(color=factor(cut)),method="lm",se=FALSE)+ scale_y_continuous(labels=scales::dollar_format())+ coord_cartesian(ylim=c(0,18823))+ labs(x="Carat", y="Price (USD)", title="Relationship Between Carat and Price among Cuts", color="Cut Quality")+ theme_classic() 4. (2 Points) We want to see how the Biochemical Oxygen Demand (mg/L) changes over time (days). We are using the BOD dataset built into R to create a plot to display how the Biochemical Oxygen Demand changes over time. Include the generated plot in your answer. a. What type of plot did we create? b. Does this plot use central tendencies and spreads, or raw data? How do you know? ggplot(BOD,aes(x=Time,y=demand))+ geom_line(size=2)+ labs(x="Time (days)", y="Biochemical Oxygen Demand (mg/l)", title="The Biochemical Oxygen Demand Through Time")+ theme_bw() 5. (4 Points) We want to see how the CO2 uptake of plants changes with location (Type), temperature (Treatment), and concentration (conc). We are using the CO2 dataset built into R to create a plot to display this. Include the generated plot in your answer. a. What type of plot did we create? b. Does this plot use central tendencies and spreads, or raw data? How do you know? c. Interpret the plots. co2data <- CO2 %>% group_by(Type,Treatment,conc) %>%

summarise(CT=mean(uptake,na.rm=TRUE), spread=sd(uptake,na.rm=TRUE)) %>% ungroup() ggplot(co2data,aes(x=conc,y=CT,color=Treatment))+ geom_line(size=2)+ geom_errorbar(aes(ymin=CT-spread,ymax=CT+spread),width=0.2, size=1)+ facet_wrap(~Type,ncol=2)+ labs(x="Ambiant CO2 (mL/L)", y=paste0("Uptake Concentration Rate (mean ", "\U00B1", " sd umol/m^2 sec)"), title="The effect of location, temperature, and ambiant CO2 concentration on CO2 uptake", color="Treatment")+ theme(panel.background = element_rect(fill="white",color="gray"), # not really needed panel.grid.major = element_line(color="gray"), # not really needed panel.grid.minor = element_line(color="gray")) # not really needed 6. (4.5 Points) For each of the following plots, describe what is wrong and how to fix it. I have given an indication of how many issues there are, though that doesn’t mean there can’t be more. Bigger versions can be found at the end of this week’s slides. Also think about last weeks PowerPoint when answering these questions. a. Two issues

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version