Handout3Q5_solution

pdf

School

University of Washington *

*We aren’t endorsed by this school

Course

311

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

8

Uploaded by PrivateIron365

Report
Handout 3 Question 5 Solutions STAT 311 – Winter 2024 2024-02-08 Question 5 The data for this question is Love and Money posted on Canvas. #load in the data Love_and_Money <- read.csv ( "Yan_LM.csv" ) A local flower shop wanted to see if they could better use their advertising dollars and so they conducted a survey of several customers who came to their shop before Valentine’s Day. They asked customers to rate how much they loved their partner and then kept track of how much money was spent at the flower shop. Primary interest: Is there a relationship between love and money? (a) Describe the relationship between the two variables (Love and Money Spent) with respect to linearity, direction and strength of relationship, and the presence of outliers. ggplot (Love_and_Money, aes ( x = Love, y = MoneySpent)) + geom_point () + labs ( title = "MoneySpent versus Love" , x = "Love" , y = "MoneySpent" ) 1
100 200 0 25 50 75 100 Love MoneySpent MoneySpent versus Love Relationship: Linear Direction: Positive Strength: Strong Outliers: None (b) Calculate r and R2 and interpret the results. #correlation r cor (Love_and_Money $ Love, Love_and_Money $ MoneySpent) ## [1] 0.9993063 #Rˆ2 cor (Love_and_Money $ Love, Love_and_Money $ MoneySpent) ˆ 2 ## [1] 0.998613 The correlation r is 0.9993, so there is a positive,strong,linear relationship between Love and MoneySpent. The R 2 is 0.9986, so 99.86% of the variability in the amount of money spent is explained by the model. (c) Calculate the slope and intercept for the least-squares regression line for these data model <- lm (MoneySpent ~ Love, data = Love_and_Money) tidy (model) #summary(model) 2
## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 25.9 1.09 23.8 3.94e-20 ## 2 Love 2.50 0.0176 142. 1.46e-41 ˆ MoneySpent = 25 . 95 + 2 . 5 × Love (d) Interpret the slope and intercept of the least-squares regression line in the context of this problem. Intercept: When Love= 0, MoneySpent is expected to equal 25.95. Slope: For each unit in Love, MoneySpent is expected to increase on average by 2.5. (e) Check all conditions and comment on your findings. model.aug <- augment (model) 1. Linearity: the relationship between the explanatory (Love) and response (MoneySpent) should be linear. ggplot (Love_and_Money, aes ( x = Love, y = MoneySpent)) + geom_point () + labs ( title = "MoneySpent versus Love" , x = "Love" , y = "MoneySpent" ) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
100 200 0 25 50 75 100 Love MoneySpent MoneySpent versus Love From the scatterplot above, it appears the linearity condition is met. 2. Nearly Normal Residuals ggplot ( data = model.aug) + aes ( x = .resid) + geom_histogram ( binwidth = 0.4 ) + ggtitle ( "Frequency of Residuals" ) + xlab ( "Residuals" ) 4
0 1 2 3 4 -5.0 -2.5 0.0 2.5 Residuals count Frequency of Residuals From the histogram of residuals, it appears that the nearly normal residuals condition is violated. 3. Constant Variability ggplot ( data = model.aug) + aes ( x = .fitted, y = .resid) + geom_point () + geom_hline ( yintercept = 0 , linetype= "dashed" , color = "red" ) + ggtitle ( "Residuals vs. Fitted" ) + xlab ( "Fitted Values" ) + ylab ( "Residuals" ) + ylim ( - 10 , 300 ) 5
0 100 200 300 100 200 Fitted Values Residuals Residuals vs. Fitted From the residual vs. fitted plot, it appears that residuals are not constant. However, the residuals have small magnitude compared to the magnitude of MoneySpent. Therefore, if you look at the scatterplot of Love vs. MoneySpent, ggplot (Love_and_Money, aes ( x = Love, y = MoneySpent)) + geom_point () + labs ( title = "MoneySpent versus Love" , x = "Love" , y = "MoneySpent" ) + geom_smooth ( method= "lm" , se= FALSE ) ## ‘geom_smooth()‘ using formula = ’y ~ x’ 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
100 200 0 25 50 75 100 Love MoneySpent MoneySpent versus Love the variability of points around the least squares (blue) line appears roughly constant. (f) A statistical consultant thought there had to be more to this relationship and conducted a residual analysis to see if there were any underlying patterns that she could incorporate into her model. Conduct a residual analysis and identify any patterns that you may find. ggplot ( data = model.aug) + aes ( x = .fitted, y = .resid) + geom_point () + geom_hline ( yintercept = 0 , linetype= "dashed" , color = "red" ) + ggtitle ( "Residuals vs. Fitted" ) + xlab ( "Fitted Values" ) + ylab ( "Residuals" ) 7
-6 -4 -2 0 2 4 100 200 Fitted Values Residuals Residuals vs. Fitted The residuals are in the shape of a heart. 8