HW_7
Rmd
keyboard_arrow_up
School
University of Wisconsin, Madison *
*We aren’t endorsed by this school
Course
07
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
Rmd
Pages
4
Uploaded by PresidentMonkey1089
---
title: "HW 7"
author: "Junyu Sui"
output:
pdf_document:
number_sections: true
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE,
warnings = FALSE, fig.align = 'center',
eval = TRUE)
```
You can run the following code to prepare the analysis.
```{r}
library(r02pro)
#INSTALL IF NECESSARY
library(tidyverse)
#INSTALL IF NECESSARY
library(MASS)
library(tree)
my_ahp <- ahp %>% dplyr::select(gar_car, liv_area, lot_area, oa_qual, sale_price)
%>%
na.omit() %>%
mutate(type = factor(ifelse(sale_price > median(sale_price), "Expensive",
"Cheap")))
tr_ind <- 1:(nrow(my_ahp)/20)
my_ahp_train <- my_ahp[tr_ind, ]
my_ahp_test <- my_ahp[-tr_ind, ]
```
Suppose we want to use tree, bagging, random forest, and boosting to predict
`sale_price` and `type` using variables `gar_car`, `liv_area`, `lot_area`, and
`oa_qual`. Please answer the following questions.
1. Predict `sale_price` (a continuous response) using the training data
`my_ahp_train` with tree (with CV pruning), bagging, random forest, and boosting
(with CV for selecting the number of trees to be used). For each method, compute
the training and test MSE.
(For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`)
```{r}
sale.tree <- tree(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train)
cv.sale <- cv.tree(sale.tree)
bestsize <- cv.sale$size[which.min(cv.sale$dev)]
sale.tree.prune <- prune.tree(sale.tree, best = bestsize)
plot(sale.tree.prune)
text(sale.tree.prune)
prediction_train.tree <- predict(sale.tree.prune, newdata = my_ahp_train)
mean((my_ahp_train$sale_price - prediction_train.tree)^2)
prediction_test.tree <- predict(sale.tree.prune, newdata = my_ahp_test)
mean((my_ahp_test$sale_price - prediction_test.tree)^2)
```
```{r}
library(randomForest)
set.seed(1)
p <- ncol(my_ahp)-2
##Setting mtry = p for bagging
bag.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train, mtry = p, importance=TRUE)
bag.sale
importance(bag.sale)
varImpPlot(bag.sale)
prediction_train.bag <- predict(bag.sale, newdata = my_ahp_train)
mean((my_ahp_train$sale_price - prediction_train.bag)^2)
prediction_test.bag <- predict(bag.sale,newdata = my_ahp_test)
mean((my_ahp_test$sale_price - prediction_test.bag)^2)
```
```{r}
set.seed(1)
rf.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train,importance=TRUE)
prediction_train.rf <- predict(rf.sale, newdata = my_ahp_train)
mean((my_ahp_train$sale_price - prediction_train.rf)^2)
prediction_test.rf <- predict(rf.sale,newdata = my_ahp_test)
mean((my_ahp_test$sale_price - prediction_test.rf)^2)
```
```{r}
library(gbm)
set.seed(1)
boost.sale <- gbm(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train, distribution = "gaussian", n.trees = 5000, interaction.depth = 1,
cv.folds = 5)
summary(boost.sale)
prediction_train.boost <- predict(boost.sale, newdata = my_ahp_train, n.trees
=5000)
mean((my_ahp_train$sale_price - prediction_train.boost)^2)
prediction_test.boost <- predict(boost.sale, newdata = my_ahp_test, n.trees =5000)
mean((my_ahp_test$sale_price - prediction_test.boost)^2)
```
\newpage
2. Predict `type` (a binary response) using the training data `my_ahp_train` with
tree (with CV pruning), bagging, random forest, and boosting (with CV for selecting
the number of trees to be used). For each method, compute the training and test
classification error.
(For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`)
```{r}
type.tree <- tree(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train,
split="gini")
set.seed(0)
cv.type <- cv.tree(type.tree)
cv.type_df <- data.frame(size = cv.type$size, deviance = cv.type$dev)
best_size <- cv.type$size[which.min(cv.type$dev)]
type.tree.prune <- prune.tree(type.tree, best=best_size)
plot(type.tree.prune)
text(type.tree.prune)
prediction_train.type.tree <- predict(type.tree, my_ahp_train, type = "class")
mean(prediction_train.type.tree != my_ahp_train$type)
prediction_test.type.tree <- predict(type.tree, my_ahp_test, type = "class")
mean(prediction_test.type.tree != my_ahp_test$type)
```
```{r}
library(randomForest)
set.seed(1)
p <- ncol(my_ahp)-2
##Setting mtry = p for bagging
bag.type <- randomForest(type ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train, mtry = p, importance=TRUE)
bag.type
importance(bag.type)
varImpPlot(bag.type)
prediction_train.bag.type <- predict(bag.type, newdata = my_ahp_train)
mean(prediction_train.bag.type != my_ahp_train$type)
prediction_test.bag.type <- predict(bag.type,newdata = my_ahp_test)
mean(prediction_test.bag.type != my_ahp_test$type)
```
```{r}
set.seed(1)
rf.type <- randomForest(type ~ gar_car+liv_area+lot_area+oa_qual, data =
my_ahp_train,importance=TRUE)
prediction_train.rf.type <- predict(rf.type, newdata = my_ahp_train)
mean(prediction_train.rf.type != my_ahp_train$type)
prediction_test.rf.type <- predict(rf.type,newdata = my_ahp_test)
mean(prediction_test.rf.type != my_ahp_test$type)
```
```{r,warning=FALSE}
library(gbm)
set.seed(1)
boost.type <- gbm(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train,
distribution = "multinomial", n.trees = 5000, interaction.depth = 1, cv.folds = 5)
summary(boost.type)
prediction_train.boost.type <- predict(boost.type, newdata = my_ahp_train, n.trees
=5000,type = "response")
type.train.boost <- levels(my_ahp_train$type)[apply(prediction_train.boost.type, 1,
which.max)]
mean(type.train.boost != my_ahp_train$type)
prediction_test.boost.type <- predict(boost.type, newdata = my_ahp_test, n.trees
=5000,type="response")
type.test.boost <- levels(my_ahp_test$type)[apply(prediction_test.boost.type, 1,
which.max)]
mean(type.test.boost != my_ahp_test$type)
```
\newpage
3. Question 8.4.2 on Page 362 in ISLRv2.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
# Based on 8.2, for a one depth tree, the value of d equals to 1. Thus, if d = 1 in
every term of
$$
\hat{f}(x) = \sum_{b = 1}^{B}\lambda \hat{f}^b(x)
$$
# Therefore, each additive term will depend on one predictor and make it to an
additive model.
\newpage
4. Question 8.4.5 on Page 362 in ISLRv2.
# Set p >=0.5 produce red and p <0.5 produce green. For majority vote approach,
```{r}
probs <- c(0.1,0.15,0.2,0.2,0.55,0.6,0.6,0.65,0.7,0.75)
ifelse(sum(probs >= 0.5) > sum(probs < 0.5), "Red", "Green")
```
# For average probability,
```{r}
mean(probs)
ifelse(mean(probs) >= 0.5, "Red", "Green")
```