homework5

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

4641

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

4

Uploaded by ChefHorse1515

Report
ISYE 4803 Homework 5 (Last Homework) Fall 2023 Total 100 points + bonus 10 points 1. Conceptual questions. (20 points) (a) (5 points) Give 5 types of neural networks, and explain what are they good for. (b) (5 points) Explain how we control the data-fit complexity in the regression tree. Name at least one hyperparameter that we can turn to achieve this goal. (c) (5 points) Explain how OOB errors are constructed and how to use them to understand a good choice for the number of trees in a random forest. Is OOB an error test or training error, and why? (d) (5 points) What’s the main difference between boosting and bagging? The random forest belongs to which type? 2. House price dataset. (20 points) The HOUSES dataset contains a collection of recent real estate listings in San Luis Obispo county and around it. The dataset is provided in RealEstate.csv. You may use “one-hot-keying” to expand the categorical variables. We suggest to normalize features as a pre-processing. The dataset contains the following useful fields (You may exclude the Location and MLS in your linear regression model). You can use any package for this question. Price: the most recent listing price of the house (in dollars). Bedrooms: number of bedrooms. Bathrooms: number of bathrooms. Size: size of the house in square feet. Price/SQ.ft: price of the house per square foot. Status: Short Sale, Foreclosure and Regular. (a) (10 points) Fit the Ridge regression model to predict Price from all variable. You can use one-hot keying to expand the categorical variable Status . Use 5-fold cross validation to select the regular- izer optimal parameter, and show the CV curve. Report the fitted model (i.e., the parameters), and the sum-of-squares residuals. You can use any package. The suggested search range for the regularization parameter is from 80 to 150. (b) (10 points) Use lasso to select variables. Use 5-fold cross validation to select the regularizer optimal parameter, and show the CV curve. Report the fitted model (i.e., the parameters selected and their coefficient). Show the Lasso solution path. You can use any package for this. 1
Table 1: Values of AdaBoost parameters at each timestep. t ϵ t α t Z t D t (1) D t (2) D t (3) D t (4) D t (5) D t (6) D t (7) D t (8) 1 2 3 3. AdaBoost. (20 points) Consider the following dataset, plotted in the following figure. The first two coordinates represent the value of two features, and the last coordinate is the binary label of the data. X 1 = ( 1 , 0 , +1) , X 2 = ( 0 . 5 , 0 . 5 , +1) , X 3 = (0 , 1 , 1) , X 4 = (0 . 5 , 1 , 1) , X 5 = (1 , 0 , +1) , X 6 = (1 , 1 , +1) , X 7 = (0 , 1 , 1) , X 8 = (0 , 0 , 1) . In this problem, you will run through T = 3 iterations of AdaBoost with decision stumps (as explained in the lecture) as weak learners. (a) (10 points) For each iteration t = 1 , 2 , 3, compute ϵ t , α t , Z t , D t by hand (i.e., show the calculation steps) and draw the decision stumps on the figure (you can draw this by hand). (b) (10 points) What is the training error of this AdaBoost? Give a short explanation for why AdaBoost outperforms a single decision stump. Figure 1: A small dataset for binary classification with AdaBoost. 2
4. Random forest and one-class SVM for email spam classifier (30 points) Your task for this question is to build a spam classifier using the UCR email spam dataset https: //archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and individuals who had filed spam. Please download the data from that website. The collection of non-spam emails came from filed work and personal emails, and hence the word ’george’ and the area code ’650’ (Palo Alto, CA) are indicators of non-spam. These are useful when constructing a personalized spam filter. You are free to choose any package for this homework. Note: there may be some missing values. You can just fill in zero. (a) (5 points) Build a CART model and visualize the fitted classification tree. (b) (5 points) Now, also build a random forest model. Randomly shuffle the data and partition to use 80% for training and the remaining 20% for testing. Compare and report the test error for your classification tree and random forest models on testing data. Plot the curve of test error (total misclassification error rate) versus the number of trees for the random forest, and plot the test error for the CART model (which should be a constant with respect to the number of trees). (c) (10 points) Fit a series of random-forest classifiers to the data to explore the sensitivity to the parameter ν (the number of variables selected at random to split). Plot both the OOB error as well as the test error against a suitably chosen range of values for ν . (d) (10 points) Now, we will use a one-class SVM approach for spam filtering. Randomly shuffle the data and partition to use 80% for training and the remaining 20% for testing. Extract all non-spam emails from the training block (80% of data you have selected) to build the one-class kernel SVM using RBF kernel (you can turn the kernel bandwidth to achieve good performance). Then apply it to the 20% of data reserved for testing (thus, this is a novelty detection situation), and report the total misclassification error rate on these testing data. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5. Locally weighted linear regression and bias-variance tradeoff. (10 points + 10 bonus points) Consider a dataset with n data points ( x i , y i ), x i R p , following the following linear model y i = β T x i + ϵ i , i = 1 , . . . , n, where ϵ i ∼ N (0 , σ 2 ) are i.i.d. Gaussian noise with zero mean and variance σ 2 . (a) (5 points) Show that the ridge regression, which introduces a squared 2 norm penalty on the parameter in the maximum likelihood estimate of β can be written as follows ˆ β ( λ ) = arg min β y 2 2 + λ β 2 2 for property defined matrix X and vector y . (b) (5 points) Find the close-form solution for b β ( λ ) and its distribution conditioning on { x i } . (c) (bonus 5 points) Derive the bias and variance of prediction for some fixed test point x . (d) (bonus 5 points) Now assuming the data are one-dimensional, the training dataset consists of two samples x 1 = 0 . 15 and x 2 = 1 . 1, and the test sample x = 1. The true parameter β 0 = 1, β 1 = 1, the noise variance is given by σ 2 = 0 . 5. Plot the MSE (Bias square plus variance) as a function of the regularization parameter λ . 4