Quiz1Problem2

pdf

School

University of Washington *

*We aren’t endorsed by this school

Course

EDDD 8

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

12

Uploaded by JusticeFlower13326

Report
Handout with Supplementary Material for Quiz 1 Problem 2 1 Problem 2 uses a fish data set that has multiple measurements for multiple species of fish. We will restrict our analysis to two species, Pike, and Roach, and four continuous measurements as describe in the Table 1 below. We are not completely sure of the origin of this data set, but it appears like one based on a 1917 study of fish caught in Lake Laengelmavesi near Tampere in Finland. Table 1. Fish variables for Quiz 1 Problem 2. Variable Description Weight Weight of fish in grams (g) Length3 Cross length of fish in cm Height Height of fish in cm Width Diagonal width of fish in cm Species Categorical variable, either Pike or Roach For this analysis, we want to determine which of fish cross length, height, or diagonal width is the best single predictor of fish weight. We also want to explore differences between the two species. Figure 1 shows you images of typical Pike and Roach for reference. A stock Pike Picture taken from here . A common Roach Picture from here . Figure 1. Images of a typical Pike and a common Roach.
Handout with Supplementary Material for Quiz 1 Problem 2 2 Exploratory Data Analysis Pages 3 – 8 provide various EDA output for the fish data set. All EDA includes information on the four quantitative variables overall and split out by species. Figure 2 shown on the next page is a pairs plot for four continuous variables along with density plots for each variable. All plots also include a third variable, Species, which is color coded (Pike pink and Roach blue). For our analysis, Weight is the response and Length3, Height and Width are possible predictors. For each pair, three correlations are shown, one overall, one for Pike only and one for Roach only. The first plot in row 1 shows the density plots for Weight by species. The remaining information in row 1 are correlations between Weight and each explanatory variable overall (black), Pike (pink), and Roach (blue). Each following row is for one of the explanatory variables and with a color-coded scatterplot for Weight on each of the explanatory variables. The correlations in row 2 are between Height and Length3 and Width and Length3. Some annotations have been added to Figure 2 to help you understand how to read it. Pages 9 – 11 show linear model output, including diagnostic plots for single variable models for Weight on each of the predictors for all data. Page 12 introduces an extension of the regression we did in HW2 when we added the Lunch variable. This regression includes a single quantitative variable (only presented for Width) with the addition of the categorical variable Species, and with an added interaction term that allows a different slope for each species.
Handout with Supplementary Material for Quiz 1 Problem 2 3 Figure 2. Pairs plot showing scatterplots, density plots and correlations for all pairwise combinations of Weight, Length3, Height, and Width. Pink indicates the species Pike and light blue indicate Roach. Weight (x), Length 3 (y) Length 3 (x); Height (y)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Handout with Supplementary Material for Quiz 1 Problem 2 4 Figures 3 – 6 show individual histograms and boxplots for each variable altogether and split out by species. Table 2 shows summary statistics for each of the variables as well. Figure 3. Histograms and boxplots for each of Weight overall and Weight by species.
Handout with Supplementary Material for Quiz 1 Problem 2 5 Figure 4. Histograms and boxplots for each of Length3 overall and Length3 by species.
Handout with Supplementary Material for Quiz 1 Problem 2 6 Figure 5. Histograms and boxplots for each of Height overall and Height by species.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Handout with Supplementary Material for Quiz 1 Problem 2 7 Figure 6. Histograms and boxplots for each of Width overall and Width by species.
Handout with Supplementary Material for Quiz 1 Problem 2 8 Table 2. Seven number summary statistics for the Weight ( y ) and possible predictors ( x ). The total sample size is 37 fish, 17 Pike and 20 Roach. Variable Min Q1 Median Q3 Max Mean SD Weight (g) All 0 145 272 500 1650.0 412.4 441.19 Pike 200.0 345.0 510.0 950.0 1650.0 718.7 494.1 Roach 0 104.25 147.5 171.8 390.0 152.1 88.83 Length3 (cm) All 16.2 24.7 30.6 45.5 68.0 35.9 14.1 Pike 34.8 40.5 45.8 55.1 68.0 48.7 10.17 Roach 16.2 22.7 24.9 26.9 35.0 25.0 4.03 Height (cm) All 4.1 6.2 6.9 7.8 10.8 7.2 1.53 Pike 5.6 6.4 7.3 8.9 10.8 7.7 1.66 Roach 4.1 6.0 6.5 7.2 9.5 6.7 1.26 Width (cm) All 2.3 3.5 4.0 4.9 7.5 4.3 1.16 Pike 3.4 4.3 4.9 6.1 7.5 5.1 1.14 Roach 2.3 3.3 3.6 3.9 5.4 3.7 0.69
Handout with Supplementary Material for Quiz 1 Problem 2 9 Single Variable Linear Model Output Weight on Cross Length3 Call: lm(formula = Weight ~ Length3, data = FDatSub2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -635.139 73.694 -8.619 3.56e-10 *** Length3 29.195 1.915 15.244 < 2e-16 *** Residual standard error: 161.9 on 35 degrees of freedom Multiple R-squared: 0.8691 Weight on Height Call: lm(formula = Weight ~ Height, data = FDatSub2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1251.17 212.17 -5.897 1.06e-06 *** Height 232.25 28.99 8.012 1.98e-09 *** Residual standard error: 265.8 on 35 degrees of freedom Multiple R-squared: 0.6472 Weight on Diagonal Width Call: lm(formula = Weight ~ Width, data = FDatSub2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1079.20 118.37 -9.117 8.99e-11 *** Width 345.74 26.52 13.038 5.38e-15 *** Residual standard error: 184.9 on 35 degrees of freedom Multiple R-squared: 0.8293
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Handout with Supplementary Material for Quiz 1 Problem 2 10 Scatterplots with Regression Lines Figure 7 shows scatterplots of Weight on each the three predictor variables with regression lines. All scatterplots have three lines. The salmon and cyan lines are the regression lines fit to individual species (see page 12). The black lines are the regression lines for the single variable regressions on the previous page. How to set up a model that allows for different slopes is explained on page 12. We chose to include this as an extension to HW3, which allowed two parallel lines, to show you there is more you can do. Figure 7. Scatterplots of Weight on each of the predictors with fitted regression lines.
Handout with Supplementary Material for Quiz 1 Problem 2 11 Residual Diagnostic Plots Figure 8 shows residuals plots and histograms of the residuals for each of the univariate regressions (black lines in Figure 3). Figure 8. Residual plots and histograms of residuals for each of the three univariate regressions: weight on cross length (top row), weight on height (middle row), and weight on diagonal width (bottom row).
Handout with Supplementary Material for Quiz 1 Problem 2 12 Linear Model Output Accounting for Species Weight on Width and Species allowing for different slopes In this regression, we add a second categorical variable to the model like we did in HW3, but this time we allow for different slopes for each species (as opposed to requiring parallel lines with different intercepts). Remember, adding a categorical variable allows us to fit two lines, one for each species. The first level of the category is included in the y-intercept. Coefficients for the other levels show up in the output. In the output below, we have an additional line for SpeciesRoach and a line for Width:SpeciesRoach . The SpeciesRoach coefficient is added to the y -intercept to get the intercept for Roach. The Width:SpeciesRoach coefficient is added to the slope to get the slope for Roach. The two regression lines that result from this model are shown underneath the output. When we add a categorical variable to the model using and asterisk (*) instead of a plus sign (+), we allow for different slopes for each level of the categorical variable. The use of the * is called an interaction term. Call: lm(formula = Weight ~ Width * Species, data = FDatSub2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1308.58 156.91 -8.340 1.24e-09 *** Width 398.57 30.14 13.222 9.74e-15 *** SpeciesRoach 1025.61 231.30 4.434 9.67e-05 *** Width:SpeciesRoach -279.64 54.74 -5.109 1.34e-05 *** Residual standard error: 137.5 on 33 degrees of freedom Multiple R-squared: 0.911 Regression equation for Pike: 𝑦𝑦 = 1308.58 + 398.57( 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ ) Regression equation for Roach: 𝑦𝑦 = ( 1308.58 + 1025.61) + (398.57 279.64)( 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ ) 𝑦𝑦 = 282.97 + 118.94( 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help