PS 15

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

20

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

2

Uploaded by nathen1076225

Report
11/17/23, 9:35 AM PS 15 https://stat20.datahub.berkeley.edu/user/nathen1076225/rstudio/p/f3e8bb00/ 1/2 ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── dplyr 1.1.3 readr 2.1.4 forcats 1.0.0 stringr 1.5.0 ggplot2 3.4.4 tibble 3.2.1 lubridate 1.9.3 tidyr 1.3.0 purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── dplyr::filter() masks stats::filter() dplyr::lag() masks stats::lag() Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors Rows: 195 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (2): x, y Use `spec()` to retrieve the full column specification for this data. Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 193 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (2): x, y Use `spec()` to retrieve the full column specification for this data. Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 1 × 1 r.squared Nathen Hadgu PS 15 AUTHOR library (tidyverse) library (broom) train <- read_csv ( 'https://raw.githubusercontent.com/idc9/course-materials/main/3-prediction/14-ov test <- read_csv ( 'https://raw.githubusercontent.com/idc9/course-materials/main/3-prediction/14-ove lm_train <- lm (y ~ x, data = train) glance ( lm_train) %>% select (r.squared)
11/17/23, 9:35 AM PS 15 https://stat20.datahub.berkeley.edu/user/nathen1076225/rstudio/p/f3e8bb00/ 2/2 <dbl> 1 0.735 # A tibble: 1 × 1 r.squared <dbl> 1 0.992 # A tibble: 1 × 2 Rsq_linear Rsq_poly <dbl> <dbl> 1 0.726 0.889 The testing and training r squared values are greater in the polynomial models compared to the single linear models. Whats driving the difference in statistics of these two models is that polynomial models are more capable of capturing the non linear relationships between variables compared to single linear models. lm_training <- lm (y ~ poly (x, degree = 20 , raw = T), data = train) glance (lm_training) %>% select (r.squared) y_pred_linear <- predict (lm_train, newdata = test) y_pred_poly <- predict (lm_training, newdata = test) test %>% mutate ( y_pred_linear = y_pred_linear, y_pred_poly = y_pred_poly, resid_sq_linear = (y - y_pred_linear) ^ 2 , resid_sq_poly = (y - y_pred_poly) ^ 2 ) %>% summarize ( TSS = sum ((y - mean (y)) ^ 2 ), RSS_linear = sum (resid_sq_linear), RSS_poly = sum (resid_sq_poly)) %>% mutate ( Rsq_linear = 1 - RSS_linear / TSS, Rsq_poly = 1 - RSS_poly / TSS) %>% select (Rsq_linear, Rsq_poly)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help