HW_7

Rmd

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

Rmd

Pages

Uploaded by PresidentMonkey1089

--- title: "HW 7" author: "Junyu Sui" output: pdf_document: number_sections: true df_print: paged --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warnings = FALSE, fig.align = 'center', eval = TRUE) ``` You can run the following code to prepare the analysis. ```{r} library(r02pro) #INSTALL IF NECESSARY library(tidyverse) #INSTALL IF NECESSARY library(MASS) library(tree) my_ahp <- ahp %>% dplyr::select(gar_car, liv_area, lot_area, oa_qual, sale_price) %>% na.omit() %>% mutate(type = factor(ifelse(sale_price > median(sale_price), "Expensive", "Cheap"))) tr_ind <- 1:(nrow(my_ahp)/20) my_ahp_train <- my_ahp[tr_ind, ] my_ahp_test <- my_ahp[-tr_ind, ] ``` Suppose we want to use tree, bagging, random forest, and boosting to predict `sale_price` and `type` using variables `gar_car`, `liv_area`, `lot_area`, and `oa_qual`. Please answer the following questions. 1. Predict `sale_price` (a continuous response) using the training data `my_ahp_train` with tree (with CV pruning), bagging, random forest, and boosting (with CV for selecting the number of trees to be used). For each method, compute the training and test MSE. (For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`) ```{r} sale.tree <- tree(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train) cv.sale <- cv.tree(sale.tree) bestsize <- cv.sale$size[which.min(cv.sale$dev)] sale.tree.prune <- prune.tree(sale.tree, best = bestsize) plot(sale.tree.prune) text(sale.tree.prune) prediction_train.tree <- predict(sale.tree.prune, newdata = my_ahp_train) mean((my_ahp_train$sale_price - prediction_train.tree)^2) prediction_test.tree <- predict(sale.tree.prune, newdata = my_ahp_test) mean((my_ahp_test$sale_price - prediction_test.tree)^2) ``` ```{r} library(randomForest) set.seed(1)

p <- ncol(my_ahp)-2 ##Setting mtry = p for bagging bag.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, mtry = p, importance=TRUE) bag.sale importance(bag.sale) varImpPlot(bag.sale) prediction_train.bag <- predict(bag.sale, newdata = my_ahp_train) mean((my_ahp_train$sale_price - prediction_train.bag)^2) prediction_test.bag <- predict(bag.sale,newdata = my_ahp_test) mean((my_ahp_test$sale_price - prediction_test.bag)^2) ``` ```{r} set.seed(1) rf.sale <- randomForest(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train,importance=TRUE) prediction_train.rf <- predict(rf.sale, newdata = my_ahp_train) mean((my_ahp_train$sale_price - prediction_train.rf)^2) prediction_test.rf <- predict(rf.sale,newdata = my_ahp_test) mean((my_ahp_test$sale_price - prediction_test.rf)^2) ``` ```{r} library(gbm) set.seed(1) boost.sale <- gbm(sale_price ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, distribution = "gaussian", n.trees = 5000, interaction.depth = 1, cv.folds = 5) summary(boost.sale) prediction_train.boost <- predict(boost.sale, newdata = my_ahp_train, n.trees =5000) mean((my_ahp_train$sale_price - prediction_train.boost)^2) prediction_test.boost <- predict(boost.sale, newdata = my_ahp_test, n.trees =5000) mean((my_ahp_test$sale_price - prediction_test.boost)^2) ``` \newpage 2. Predict `type` (a binary response) using the training data `my_ahp_train` with tree (with CV pruning), bagging, random forest, and boosting (with CV for selecting the number of trees to be used). For each method, compute the training and test classification error. (For boosting, please set `n.trees = 5000, interaction.depth = 1, cv.folds = 5`) ```{r} type.tree <- tree(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, split="gini") set.seed(0) cv.type <- cv.tree(type.tree) cv.type_df <- data.frame(size = cv.type$size, deviance = cv.type$dev) best_size <- cv.type$size[which.min(cv.type$dev)] type.tree.prune <- prune.tree(type.tree, best=best_size) plot(type.tree.prune) text(type.tree.prune)

prediction_train.type.tree <- predict(type.tree, my_ahp_train, type = "class") mean(prediction_train.type.tree != my_ahp_train$type) prediction_test.type.tree <- predict(type.tree, my_ahp_test, type = "class") mean(prediction_test.type.tree != my_ahp_test$type) ``` ```{r} library(randomForest) set.seed(1) p <- ncol(my_ahp)-2 ##Setting mtry = p for bagging bag.type <- randomForest(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, mtry = p, importance=TRUE) bag.type importance(bag.type) varImpPlot(bag.type) prediction_train.bag.type <- predict(bag.type, newdata = my_ahp_train) mean(prediction_train.bag.type != my_ahp_train$type) prediction_test.bag.type <- predict(bag.type,newdata = my_ahp_test) mean(prediction_test.bag.type != my_ahp_test$type) ``` ```{r} set.seed(1) rf.type <- randomForest(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train,importance=TRUE) prediction_train.rf.type <- predict(rf.type, newdata = my_ahp_train) mean(prediction_train.rf.type != my_ahp_train$type) prediction_test.rf.type <- predict(rf.type,newdata = my_ahp_test) mean(prediction_test.rf.type != my_ahp_test$type) ``` ```{r,warning=FALSE} library(gbm) set.seed(1) boost.type <- gbm(type ~ gar_car+liv_area+lot_area+oa_qual, data = my_ahp_train, distribution = "multinomial", n.trees = 5000, interaction.depth = 1, cv.folds = 5) summary(boost.type) prediction_train.boost.type <- predict(boost.type, newdata = my_ahp_train, n.trees =5000,type = "response") type.train.boost <- levels(my_ahp_train$type)[apply(prediction_train.boost.type, 1, which.max)] mean(type.train.boost != my_ahp_train$type) prediction_test.boost.type <- predict(boost.type, newdata = my_ahp_test, n.trees =5000,type="response") type.test.boost <- levels(my_ahp_test$type)[apply(prediction_test.boost.type, 1, which.max)] mean(type.test.boost != my_ahp_test$type) ``` \newpage 3. Question 8.4.2 on Page 362 in ISLRv2.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

# Based on 8.2, for a one depth tree, the value of d equals to 1. Thus, if d = 1 in every term of $$ \hat{f}(x) = \sum_{b = 1}^{B}\lambda \hat{f}^b(x) $$ # Therefore, each additive term will depend on one predictor and make it to an additive model. \newpage 4. Question 8.4.5 on Page 362 in ISLRv2. # Set p >=0.5 produce red and p <0.5 produce green. For majority vote approach, ```{r} probs <- c(0.1,0.15,0.2,0.2,0.55,0.6,0.6,0.65,0.7,0.75) ifelse(sum(probs >= 0.5) > sum(probs < 0.5), "Red", "Green") ``` # For average probability, ```{r} mean(probs) ifelse(mean(probs) >= 0.5, "Red", "Green") ```

HW_7

Related Documents