R-Project

docx

School

University of Wisconsin, Milwaukee *

*We aren’t endorsed by this school

Course

210

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

4

Uploaded by mcmahonlauren82

Report
R Project Lauren McMahon This code loads the “baseball.xlsx” data. Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter . library (readxl) baseball <- read_excel ( "Downloads/baseball.xlsx" ) # View(baseball) The following code examines the numeric data in the baseball data set using scatterplots. pairs ( Filter (is.numeric,baseball)) Which variables appear most strongly related to Wins? Are the relationships positive or negative? Describe these relationships below the code chunk. The plot between wins and runs allowed show a strong relationship and the relationship is negative Runs allowed is increased and wins is decreased.
regOutput <- lm (Wins ~ ., data= Filter (is.numeric,baseball)) summary (regOutput) ## ## Call: ## lm(formula = Wins ~ ., data = Filter(is.numeric, baseball)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.3415 -2.7566 -0.3218 2.2175 10.2182 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 84.294336 28.416158 2.966 0.0049 ** ## Runs_Scored 0.072351 0.031029 2.332 0.0245 * ## Runs_Allowed -0.115252 0.008437 -13.661 <2e-16 *** ## On_Base_Pct 13.043916 140.741613 0.093 0.9266 ## Slug_Pct 84.185917 70.375010 1.196 0.2382 ## Batt_Avg -44.276845 116.159243 -0.381 0.7050 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.102 on 43 degrees of freedom ## Multiple R-squared: 0.8904, Adjusted R-squared: 0.8777 ## F-statistic: 69.87 on 5 and 43 DF, p-value: < 2.2e-16 Is the overall fit of the model high or low? What tells you this? The overall fit is high because of the high R-squared value is 0.8904. Is the model fit due to the number of variables? How can you tell? The model is fit because some of the variables can contribute to the model fit. What are the most significant variables in your model? What information tells you this? Runs allowed is the most significant variable because it has a strong negative effect on the number of wins. The information that shows me this is the high P-Value of p<2e-16 and the estimate of -0.11525. Do you see anything surprising in the results? (Hint: Do the coefficient signs make sense?) The surprising things about the results were that the on base practice, slug practice, and bat average all had coefficents with standard errors and higher p values than other variables. library (car) ## Loading required package: carData car :: vif (regOutput) ## Runs_Scored Runs_Allowed On_Base_Pct Slug_Pct Batt_Avg ## 11.437587 1.179100 7.339887 7.908608 4.954558 Does there appear to be a problem with multicolinearity? How can you tell? There does appear to be a problem with multicolinearity because runs scored has a relatively high VIF as well as VIF’s of On base practice, slug practice, and Batt Avg are also high. Do these results explain anything about the results we saw in our modelsummary? These results
explain some of the observations in the model summary. Runs Scored and Runs allowed are more huge in showing Wins and different factors with coefficients of enormous standard errors and high p-values show their absence of importance. library (olsrr) ## ## Attaching package: 'olsrr' ## The following object is masked from 'package:datasets': ## ## rivers # Plot residuals ols_plot_resid_fit (regOutput) # Create QQ-plot of model residuals qqPlot ( residuals (regOutput))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## [1] 3 8 Are ALL the assumptions of the model satisfied? Explain. Yes because the chart is going up positively. Using our regression model, how many wins would we predict a team would have in a given season if we knew the following statistics: Runs Scored = 680 Runs Allowed = 600 On Base Pct = .301 Slug Pct = .399 Batt Avg = .255 84.294336 + (0.072351 × 680) + (- 0.115252 × 600) + (13.043916 × 0.301) + (84.185917 × 0.399) + (—44.276845 × 0.255) = 84.294336 + 48.99328 — 69.1512 + 3.945144 + 33.477600 - 11.308109 = 90.5676 predicted Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I . When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file). The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit , Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.