Assignment 2 W2023 NO Solutions (1)

docx

School

University of Guelph *

*We aren’t endorsed by this school

Course

2230

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

9

Uploaded by MajorAlligator1290

Report
University of Guelph STAT*2230- Biostatistics for Integrative Biology Assignment (2) Please turn in a typed (not handwritten!) version of the assignment. Show/Provide (Copy/Paste in the word document) both your R code and the result/output. “Copy/Paste” graphs into the word document. Do not use screenshots. Scale your plots to fit them in the spaces provided. Question (1) [4.5 Points] Read the introduction, methods and figures in the Chen and Robinson (2014) paper that is available on Courselink, and answer the following questions: a. Is this an experiment or an observational study? Why? This is an observational study because the researchers did not change existing conditions. Instead, they let nature to take its course and analyzed what they observed without intervening with any variables. b. What method did the authors use to sample ant nests? How do we know this? Stratified Random Sampling. The authors used samples from distinct subgroups; nests with not a lot of canopy cover, “nests with canopy cover lower than 51.2%”, “between 51.2% and 67.5%”, and “higher than 67.5%”. They also split colonies up into 3 groups based on how many nests are in each colony. c. Evaluate the sampling method used by Chen and Robinson (2014). Is it likely to result in an unbiased sample of the population? Why? I think this is unbiased since the sampling is random and there was no change to the environment. d. Review the figures in the paper and identify whether each of the following variables was nominal, ordinal, discrete numeric, or continuous numeric. 1
Variable Figure Variable Type Mean canopy cover (%) Fig. 2 Continuous numeric Foraging Trail Fig. 4 Nominal Nest Number per Colony Fig. 5 Discrete numeric Question (2) [10.5 Points] Bythotrephes longimanus is a species of invasive zooplankton from Eurasia that invaded the Great Lakes in the 1980s and has since spread to many inland lakes. Bythotrephes are born with a long tail spine that makes it harder for some fish predators to eat them. Miehls et al. (2014) were interested in testing whether natural selection on the length of the spine was affected by the fish community in each lake. In 2008, Miehls collected 170 first instar Bythotrephes from Boshkung Lake in Huntsville, ON and measured the length of the distal segment of their tail spine (in mm). This part of the spine is labeled “Distal spine” in the figure above. These data are called “distal.spine” in the “Bythotrephes_Boshkung.csv” datafile that is located in the Data Sets folder on CourseLink. a. What type of variable is the distal spine length? Continuous numeric. b. Read (import) the data into R. 2
Dataset <- read.csv(file.choose(), header=TRUE) Note: You can read the data into R using the following command: Bythotrephes$distal.spine <- read.csv(file.choose(), header=TRUE) # Read the Excel file c. Construct a simple boxplot (vertical) of the data with a nice colour of your choice for the box, label the x-axis “Bythotrephes$distal.spine”, and main title “Boxplot of Bythotrephes$distal.spine”.t boxplot(Dataset$distal.spine,xlab="Bythotrephes$distal.spine",main="Boxplot of Bythotrephes$distal.spine",col="purple") d. Construct a histogram of the data with a nice colour of your choice for the bins, label the x-axis “Bythotrephes$distal.spine”, and main title “Histogram of Bythotrephes$distal.spine”. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
hist(Dataset$distal.spine,xlab="Bythotrephes$distal.spine",main="Histogram of Bythotrephes$distal.spine",col="blue") e. . Determine the sample mean, median, standard deviation, variance, and inter-quartile range. Mean: > mean(Dataset$distal.spine) [1] 5.315929 Median: > median(Dataset$distal.spine) [1] 5.303 Standard Deviation: > sd(Dataset$distal.spine) [1] 0.257824 Variance: > var(Dataset$distal.spine) [1] 0.06647321 Inter-Quartile Range: > IQR(Dataset$distal.spine) [1] 0.303 4
f. Calculate the Coefficient of Variation (CV) for distal spine length as percentage (show your work) . CV= (Standard Deviation / Mean) x 100 = (0.257824 / 5.315929) x 100 = 4.85 g. Convert the distal spine length into units of cm using code like this: Bythotrephes$distal.spine.cm<-Bythotrephes$distal.spine[, 2]/10 Dataset$distal.spine.cm<-Dataset$distal.spine/10 lake distal.spine distal.spine.cm 1 Boshkung 4.298 0.4298 2 Boshkung 4.834 0.4834 3 Boshkung 4.848 0.4848 4 Boshkung 4.899 0.4899 5 Boshkung 4.912 0.4912 6 Boshkung 4.985 0.4985 7 Boshkung 4.995 0.4995 8 Boshkung 5.000 0.5000 …etc h. Recalculate the sample mean, median, standard deviation, variance, inter-quartile range and coefficient of variation. Mean: > mean(Dataset$distal.spine.cm) [1] 0.5315929 Median: > median(Dataset$distal.spine.cm) [1] 0.5303 Standard Deviation: > sd(Dataset$distal.spine.cm) [1] 0.0257824 Variance: > var(Dataset$distal.spine.cm) [1] 0.0006647321 Inter-quartile Range: > IQR(Dataset$distal.spine.cm) [1] 0.0303 Coefficient of Variation CV= (Standard Deviation / Mean) x 100 = (0.0257824 / 0.5315929) x 100 = 4.85 5
i. Enter your calculated values in the table below along with their associated units. Some units are provided for you already. Measurement Value for spine length in mm Units for spine length in mm Value for spine length in cm Units for spine length in cm Mean 5.315929 mm 0.5315929 cm Median 5.303 mm 0.5303 cm Standard deviation 0.257824 mm 0.0257824 cm Variance 0.06647321 mm^2 0.0006647321 cm^2 Interquartile range 0.303 mm 0.0303 cm Coefficient of variation 4.85 4.85 j. Describe how each of these measures (sample mean, median, standard deviation, variance, inter- quartile range and coefficient of variation) changed when we went from measuring spine length in units of mm to units of cm. Sample mean: from mm to cm divided by 10 Median: from mm to cm divided by 10 Standard Deviation: from mm to cm divided by 10 Variance: from mm^2 to cm^2 divided by 100 Interquartile range: divided by 10 Coefficient of Variation: Stayed the same k. is the sample median distal spine length greater than, less than, or similar to the sample mean distal spine length? The median distal spine length is similar to the mean. Slightly less than. l. If we were to add 2mm to each of the original 170 observations for distal spine length, what would happen to the sample mean? The sample mean would be increased by 2mm m. What would happen to the sample variance? The sample variance would stay the same 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question (3) [10 Points] The marine threespine stickleback is a small fish named for its defensive armour, which is made up of a series of bony plates along each side of its body. The armour reduces predation by other fish and diving birds. However, sticklebacks living in lakes and streams, where there are fewer predators, have reduced armour. To understand the genetic basis of variation of defensive armour, Colosimo et al. (2003) measured the number of bony plates on 174 grandchildren of a cross between a marine and a freshwater stickleback. a. Identify the type of variable in the study. Discrete Variable b. Read the data into R from the stickleback.csv file (available in the Data Sets folder on Courselink). The variable is named “no.plates”. > sicklebacks <- read.csv(file.choose(), header=TRUE) c. Construct a boxplot (vertical) of the data using the colour red, label the x-axis “stickleback$no.plates”, and main title “Boxplot of stickleback$no.plates”. boxplot(sicklebacks$no.plates,xlab="stickleback$no.plates",main="Boxplot of stickleback$no.plates",col="red") 7
d. Construct a frequency histogram of the data with the colour colour green for the bins, label the x-axis “stickleback$no.plates”, and main title “Boxplot of stickleback$no.plates”. > hist(sicklebacks$no.plates,xlab="stickleback$no.plates",main="Histogram of sickleback$no.plates",col="green") e. Is the distribution unimodal, bimodal or multimodal? Unimodal f. Is the distribution symmetrical or skewed, and if skewed in which direction? Skewed in the left direction g. Based on the shape of this distribution, do you expect the mean to be greater than, less than, or approximately equal to the median for number of plates? Based on the shape, I’d expect the mean to be less than the median for number of plates. h. Test your prediction by calculating the sample mean number of plates and the sample median number of plates. > mean(sicklebacks$no.plates) 8
[1] 50.37931 > median(sicklebacks$no.plates) [1] 59 i. Suppose that we change the first observation in the dataset from 10 to 2. We can do this using this code. stickleback.changed<-stickleback stickleback.changed$no.plates[1]<- 2 You can confirm that this worked using… stickleback.changed$no.plates > sicklebacks.changed<-sicklebacks > sicklebacks.changed$no.plates[1]<-2 > sicklebacks.changed$no.plates j. What do you predict will be the impact of this change on both mean and median? Since a smaller number is being added to the data set, the mean (average) will decrease slightly. The median will remain the same since the middle of the data set will not be changed by a 2 being added in the front. k. Check your prediction by calculating the mean and the median for stickleback.changed$no.plates and compare that with the original data set stickleback$no.plates. . Original data set: MEAN: > mean(sicklebacks$no.plates) [1] 50.37931 MEDIAN: > median(sicklebacks$no.plates) [1] 59 Changed data set: MEAN: > mean(sicklebacks.changed$no.plates) [1] 50.33333 MEDIAN: > median(sicklebacks.changed$no.plates) [1] 59 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help