PS 15

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by nathen1076225

11/17/23, 9:35 AM PS 15 https://stat20.datahub.berkeley.edu/user/nathen1076225/rstudio/p/f3e8bb00/ 1/2 ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.3 ✔ readr 2.1.4 ✔ forcats 1.0.0 ✔ stringr 1.5.0 ✔ ggplot2 3.4.4 ✔ tibble 3.2.1 ✔ lubridate 1.9.3 ✔ tidyr 1.3.0 ✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors Rows: 195 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (2): x, y ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 193 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (2): x, y ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 1 × 1 r.squared Nathen Hadgu PS 15 AUTHOR library (tidyverse) library (broom) train <- read_csv ( 'https://raw.githubusercontent.com/idc9/course-materials/main/3-prediction/14-ov test <- read_csv ( 'https://raw.githubusercontent.com/idc9/course-materials/main/3-prediction/14-ove lm_train <- lm (y ~ x, data = train) glance ( lm_train) %>% select (r.squared)

11/17/23, 9:35 AM PS 15 https://stat20.datahub.berkeley.edu/user/nathen1076225/rstudio/p/f3e8bb00/ 2/2 <dbl> 1 0.735 # A tibble: 1 × 1 r.squared <dbl> 1 0.992 # A tibble: 1 × 2 Rsq_linear Rsq_poly <dbl> <dbl> 1 0.726 0.889 The testing and training r squared values are greater in the polynomial models compared to the single linear models. Whats driving the difference in statistics of these two models is that polynomial models are more capable of capturing the non linear relationships between variables compared to single linear models. lm_training <- lm (y ~ poly (x, degree = 20 , raw = T), data = train) glance (lm_training) %>% select (r.squared) y_pred_linear <- predict (lm_train, newdata = test) y_pred_poly <- predict (lm_training, newdata = test) test %>% mutate ( y_pred_linear = y_pred_linear, y_pred_poly = y_pred_poly, resid_sq_linear = (y - y_pred_linear) ^ 2 , resid_sq_poly = (y - y_pred_poly) ^ 2 ) %>% summarize ( TSS = sum ((y - mean (y)) ^ 2 ), RSS_linear = sum (resid_sq_linear), RSS_poly = sum (resid_sq_poly)) %>% mutate ( Rsq_linear = 1 - RSS_linear / TSS, Rsq_poly = 1 - RSS_poly / TSS) %>% select (Rsq_linear, Rsq_poly)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

PS 15

Related Documents