PS 15
pdf
keyboard_arrow_up
School
University of California, Berkeley *
*We aren’t endorsed by this school
Course
20
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
2
Uploaded by nathen1076225
11/17/23, 9:35 AM
PS 15
https://stat20.datahub.berkeley.edu/user/nathen1076225/rstudio/p/f3e8bb00/
1/2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔
dplyr 1.1.3
✔
readr 2.1.4
✔
forcats 1.0.0
✔
stringr 1.5.0
✔
ggplot2 3.4.4
✔
tibble 3.2.1
✔
lubridate 1.9.3
✔
tidyr 1.3.0
✔
purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖
dplyr::filter() masks stats::filter()
✖
dplyr::lag() masks stats::lag()
ℹ
Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become
errors
Rows: 195 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): x, y
ℹ
Use `spec()` to retrieve the full column specification for this data.
ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 193 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): x, y
ℹ
Use `spec()` to retrieve the full column specification for this data.
ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 × 1
r.squared
Nathen Hadgu
PS 15
AUTHOR
library
(tidyverse)
library
(broom)
train <-
read_csv
(
'https://raw.githubusercontent.com/idc9/course-materials/main/3-prediction/14-ov
test <-
read_csv
(
'https://raw.githubusercontent.com/idc9/course-materials/main/3-prediction/14-ove
lm_train <-
lm
(y
~
x,
data =
train)
glance
( lm_train)
%>%
select
(r.squared)
11/17/23, 9:35 AM
PS 15
https://stat20.datahub.berkeley.edu/user/nathen1076225/rstudio/p/f3e8bb00/
2/2
<dbl>
1 0.735
# A tibble: 1 × 1
r.squared
<dbl>
1 0.992
# A tibble: 1 × 2
Rsq_linear Rsq_poly
<dbl> <dbl>
1 0.726 0.889
The testing and training r squared values are greater in the polynomial models compared to the single
linear models. Whats driving the difference in statistics of these two models is that polynomial models are
more capable of capturing the non linear relationships between variables compared to single linear models.
lm_training <-
lm
(y
~
poly
(x,
degree =
20
,
raw =
T),
data =
train)
glance
(lm_training)
%>%
select
(r.squared)
y_pred_linear <-
predict
(lm_train,
newdata =
test)
y_pred_poly <-
predict
(lm_training,
newdata =
test)
test
%>%
mutate
(
y_pred_linear =
y_pred_linear,
y_pred_poly =
y_pred_poly,
resid_sq_linear =
(y
-
y_pred_linear)
^
2
,
resid_sq_poly =
(y
-
y_pred_poly)
^
2
)
%>%
summarize
(
TSS =
sum
((y
-
mean
(y))
^
2
),
RSS_linear =
sum
(resid_sq_linear),
RSS_poly =
sum
(resid_sq_poly))
%>%
mutate
(
Rsq_linear =
1
-
RSS_linear
/
TSS,
Rsq_poly =
1
-
RSS_poly
/
TSS)
%>%
select
(Rsq_linear, Rsq_poly)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help