hw7-1

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6501

Subject

Aerospace Engineering

Date

Jan 9, 2024

Type

pdf

Pages

45

Uploaded by CountFog22175

Report
Question 10.1 Using the same crime data set uscrime.txt as in Questions 8.2 and 9.1, find the best model you can using (a) a regression tree model, and (b) a random forest model. In R, you can use the tree package or the rpart package, and the randomForest package. For each model, describe one or two qualitative takeaways you get from analyzing the results (i.e., don’t just stop when you have a good model, but interpret it too). Answer 10.1 The US Crime file contains information from 47 states taken during the 1960s. The dataset contains the following datapoints: Variable Description M percentage of males aged 14—-24 in total state population So indicator variable for a southern state Ed mean years of schooling of the population aged 25 years or over Po1 per capita expenditure on police protection in 1960 Po2 per capita expenditure on police protection in 1959 LF labour force participation rate of civilian urban males in the age-group 14-24 M.F number of males per 100 females Pop state population in 1960 in hundred thousands NW percentage of nonwhites in the population U1 unemployment rate of urban males 14-24 U2 unemployment rate of urban males 35-39 Wealth wealth: median value of transferable assets or family income Ineq income inequality: percentage of families earning below half the median income Prob probability of imprisonment: ratio of number of commitments to number of offenses Time average time in months served by offenders in state prisons before their first release Crime crime rate: number of offenses per 100,000 population in 1960 Previously we have built a linear regression model using selected features from the dataset to predict the crime rate of a new city as well as one using Principal Component Analysis (PCA) on the dataset. Now we will look to use regression trees and random forest models to see if it improves the quality. We will begin by doing some basic exploratory data analysis, such as checking for outliers with a boxplot (Fig. 1), visualizing the distribution (Fig. 4) as well as a visualization of how each feature interacts with our response (Fig. 6). For the most part we see that each factor stays within its expected range, bar a few outliers. We see that number of males per 100 females, state population in 1960 in hundred thousands and percentage of nonwhites in the population have the most outliers. When looking at the density plots and how each feature interacts with our response, we see that most are within a normal range and close to a normal distribution except for So. When looking at the data this becomes apparent as it is actually a binary indicator variable for whether the state is a southern state or not. After visualizing the data we will begin to build our regression trees and random forests. Classification and Regression Trees (CART) models work differently than typical "math" based models. Instead of trying to fit a line or similiar, these models look to make decisions on how to split the data to reach a decision/prediction. Each split the model creates is called a branch, and each node is refered to as a leaf. Each leaf gets a simplified regression model on all the data points that are present. Yy=q Training a base model we see that only a few predictors were used, that there are 5 branches and 6 end leaves, and returns a Mean Absolute Error (MAE) of 171.93, which is only slighty worse than that of our initial regression model. var n dev yval splits.cutleft splits.cutright Po1l 35 571010497 911.9714 <10.75 >10.75 Pol 25 1466071.04 752.7200 <7.05 >7.05 Pop 13 501357.23 612.5385 <225 >22.5 leaf 8 84351.88 503.3750 leaf 5 169138.80 787.2000 LF 12 43250292 904.5833 <0.58 >0.58 leaf 7 133283.71 1017.4286 leaf 5 85287.20 746.6000 M.F 10 202494490 1310.1000 <96.75 >96.75 leaf 5 658284.80 1084.2000 leaf 5 856352.00 1536.0000
It is not surprising that Po1 is the primary split for the data, as it has both a strong correlation to Crime and was found to be statistically significant in our linear regression model. To test the fit and see if these are the correct number of branches, we can run 10 fold cross validation while pruning the tree to see if the model improves. Branches MAE 6 188.3099 5 201.8580 4 214.4362 3 232.0362 2 273.5210 Unsurpringly with the low number of datapoints to fit the model, and with the small number of branches to begin with, pruning the tree does not improve the model. Next we can look to train a Random Forest. Random Forests as the name suggest, is a collection of Trees created at random as opposed to one single tree. We lose explainability, but gain a better overall estimate of the data. We can loop through a large number of Trees and see at which point we begin to lose quality. The number of trees with the lowest MAE will be used to train the final Random Forest Model, which in this case is 12. Trees MAE 12 249.0137 23 260.8363 32 266.8360 89 267.6831 79 268.0012 107 268.5253 Looking at the MAE against our testing set we find 201.09, which is slightly worse than the base Regression Tree. Looking at the models Increase in Node Purity we see the following features having the most importance. Feature Node Purity Po1 1277314.89 Prob 556052.98 Po2 531698.20 Ineq 444998.31 Pop 359573.44 Looking over the features and thier importance between the two models, we can see that police spending has the most impact, whether it be for the current year (Po1) or the previous (Po2). Question 10.2 Describe a situation or problem from your job, everyday life, current events, etc., for which a logistic regression model would be appropriate. List some (up to 5) predictors that you might use. Answer 10.2 One area where logistic regression could be used is in beer brewing and distritbution. A concern amongst a lot of breweries is the shelf-life of thier product. Logistic regression can be used to predict the likelihood of beer spoilage over time, considering factors like temperature, packaging, and storage conditions. Question 10.3 1. Using the GermanCredit data set germancredit.txt from http://archive.ics.uci.edu/ml/machine- learning-databases/statlog/german / (description at http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29 ), use logistic regression to find a good predictive model for whether credit applicants are good credit risks or not. Show your model (factors used and their coefficients), the software output, and the quality of fit. You can use the glm function in R. To get a logistic regression (logit) model on data where the response is either zero or one, use family=binomial(link="logit") in your gim function call. 2. Because the model gives a result between 0 and 1, it requires setting a threshold probability to separate between “good” and “bad” answers. In this data set, they estimate that incorrectly identifying a bad customer as good, is 5 times worse than incorrectly classifying a good customer as bad. Determine a good threshold probability based on your model. Answer 10.3 Logisitc regression uses a similiar algorithim to linear regression, but with a sigmoid activation function to return a probability between 0-1 instead of a continous response.
In logistic regression, the sigmoid function maps the linear combination z to the range [0, 1], representing the probability of the binary outcome being in the positive class (usually class 1). The function's S-shaped curve ensures that the probability remains between 0 and 1, making it suitable for binary classification tasks. o(z) = 1 l—e =z Where: o(2) is the sigmoid function. zis the linear combination of predictor variables and their associated coefficients z=Po+ fiz1 + Pazz + ...+ Bpzy Bo, B1, B2, - . . , Bp are the coefficients z1,%2,...,x, are the predictor variables. The logistic regression model can be expressed as follows: PY =1|X) = l—e =z The German Credit Dataset contains 1000 credit applications and thier outcome of either good (1) or bad (2). Variable Name Role Type Demographic Description Attribute1 Feature Categorical Status of existing checking account Attribute2 Feature Integer Duration months Attribute3 Feature Categorical Credit history Attribute4 Feature Categorical Purpose AttributeS Feature Integer Credit amount Attribute6 Feature Categorical Savings account/bonds Attribute7 Feature Categorical Other Present employment since o Attribute8 Feature Integer Installment rate in percentage of disposable income Attribute9 Feature Categorical Marital Status Attribute10 Feature Categorical Other debtors / guarantors Attribute11 Feature Integer Present residence since Attribute12 Feature Categorical Property Attribute13 Feature Integer Age Attribute14 Feature Categorical Other installment plans Attribute15 Feature Categorical Other Housing Attribute16 Feature Integer Number of existing credits at this bank Attribute17 Feature Categorical Occupation Job Attribute18 Feature Integer Number of people being liable to provide maintenance for Attribute19 Feature Binary Telephone Attribute20 Feature Binary Other foreign worker class Target Binary 1 =Good, 2 = Bad The dataset is accompanied by a cost matrix, where the cost of incorrectly classifying a customer as good when they are bad is 5 times worse than to classify a customer as bad when they are good. This will need to be considered when setting a threshold for prediction and evaluating the model. The classes are heavily skewed towards the good results at a more than 2:1 ratio. To counteract this we will downsample so that the number in each class remains even to avoid any potential bias in the training set. We will begin by performing Exploratory Data Analysis on the dataset, checking for outliers where the variable type is integer, and looking at distributions where it is either binary or categorical. We can now train a base model using all the features to identify those that are statistically important to predicting the class. After training the model against the whole dataset, we recieve an accuracy of 74%. Looking into the model we find the following features have the most impact. Variable Pr(>z) V5 V8 0.01840 * 0.01099 * V1A11 5.46e-12 *** V1A12 1.70e-07 *** V3A30 0.01488 * V3A31 0.01316 * V4A41 0.00770 ** V6A61 0.02699 * V7A72 0.03043 * V7A73 0.03958 *
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Variable Pr(>z) V14A141 0.04293 * V14A142 0.04505* V17A171 0.03535* V20A201 0.00667 ** Since we now have a reduced set of parameters, we can use these to train a better model. We will use a 70/30% split for training/testing to avoid overfitting. This model returns a slightly higher accuracy at 76%. However, this incorrectly classifies 25 "bad" applications as "good". This comes at a high cost. To look to minimize this, we will run a loop through various thresholds, applying the cost matrix to weigh each incorrect response correctly. Doing this we find a threshold of .18 to return the lowest cost. This causes a decrease in accuracy to 68%, but gives the best results in terms of misclassifying a "bad" application as "good", by only classifying 4 "bad" applications as "good". Analyzing the final model returns the following formula: 1 ifL1_>.1 P(Y=1|X)={O ! 1+e—z> 8 where z = —b.4276285631 4 V5 - 0.0002035011 4 V8 - 0.2101376376 + V1A1l - 2.0302865296 + V1A12 - 1.2778372848 + V3A30 - 1.5598222459 + V3A31 - 1.2389246022 - V4A41 - —1.6316162897 - V6A61 - 0.5649107523 + VTAT72 - 1.0242411208 + V7A73-0.4603303719 + V14A141 - 0.9845414604 + V14A142 - 0.6207414093 + V17A171 - —0.0503063358 + V20A201 - 2.2325916676 Appendix e Code * Graphs - Crime o Figure 1 - Boxplots for all features o Figure 2 - Boxplot for M.F to examine outliers o Figure 3 - Boxplot for Pop to examine outliers o Figure 4 - Boxplot for NW to examine outliers o Figure 5 - Density plots for all features o Figure 6 - Density plot for So to examine multiple peaks o Figure 7 - Scatterplot to see interaction of features with response o Figure 8 - Scatterplot for So o Figure 9 - Regression Tree o Figure 10 - MAE vs Number of Branches o Figure 11 - MAE vs Number of Trees o Figure 12 - Error vs Number of Trees Credit Data o Figure 13 - Exploratory Analysis of Features o Figure 14 - Histogram of Class Distribution o Figure 15 - Confusion Matrix Base Model o Figure 16 - Confusion Matrix Improved Model o Figure 17 - ROC Curve o Figure 18 - Confusion Matrix with ROC threshold o Figure 19 - Confusion Matrix with cost threshold o Figure 20 - Cost vs Threshold o Figure 21 - Residuals vs Fitted o Figure 22 - Q-Q Residuals o Figure 23 - Scale-Location o Figure 24 - Residuals vs Leverage v Code install.packages("tree") library(tree) install.packages("randomForest™) library(randomForest)
Installing package into ¢/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) Installing package into ¢/usr/local/lib/R/site-library’ (as 1lib’ is unspecified) randomForest 4.7-1.1 Type rfNews() to see new features/changes/bug fixes. ¥ Question 10.1 data <- read.delim("http://www.statsci.org/data/general/uscrime.txt") head(data) A data.frame: 6 x 16 M So Ed Pol Po2 LF M.F Pop NW <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> 1 151 1 9.1 5.8 56 0510 950 33 301 2 143 0 113 103 9.5 0583 101.2 13 10.2 3 142 1 8.9 45 44 0533 96.9 18 219 4 136 0 121 149 141 0577 994 157 8.0 5 141 0 121 109 101 0591 985 18 3.0 6 121 0 110 18 115 0547 964 25 44 # Visual check for outliers par(mfrow = c(2, 2)) for (name in names(data)) { boxplot(data[ [name]],main=name) } par(mfrow = c(1, 1)) boxplot(data[["M.F"]],main="M.F.") boxplot(data[["Pop"]],main="Pop") boxplot(data[["NW"]],main="NW") par(mfrow = c(2, 2)) for (name in names(data)) { density_data <- density(data[[name]]) uli <dbl> 0.108 0.096 0.094 0.102 0.091 0.084 U2 Wealth <dbl> <int> 4.1 3940 3.6 5570 3.3 3180 39 6730 20 5780 29 6890 plot(density_data, main=paste(name,"Density Plot"), xlab=name, ylab="Density") } par(mfrow = c(1, 1)) density_data <- density(data[["So"]]) plot(density_data, main="So Density Plot", xlab="So", ylab="Density") par(mfrow = c(2,2)) for (name in names(data)) { plot(data[[name]],data$Crime, xlab=name, ylab="Crime") } par(mfrow = c(1, 1)) plot(data[["So"]],data$Crime, xlab="So", ylab="Crime") install.packages("caret") library(caret) library(ggplot2) Ineq <dbl> 26.1 19.4 25.0 16.7 17.4 12.6 Prob <dbl> 0.084602 0.029599 0.083401 0.015801 0.041399 0.034201 Time <dbl> 26.2011 25.2999 24.3006 29.9012 21.2998 20.9995 Crime <int> 791 1635 578 1969 1234 682
Installing package into ¢/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) also installing the dependencies f‘listenv’, ‘parallelly’, ‘future’, ‘globals’, ¢‘shape’, ‘future.apply’, Loading required package: ggplot2 Loading required package: lattice #create train/test split set.seed(1234) #for reproducability train_indices <- createDataPartition(data$Crime, times=1, p=.7, list=FALSE) training set <- data[train_indices,] testing set <- data[-train_indices,] tree_model <- tree(Crime summary (tree_model) Regression tree: tree(formula = Crime ~ ~y ., training set) ., data = training_set) Variables actually used in tree construction: "M.F" Number of terminal nodes: 6 68510 = 1987000 / 29 Distribution of residuals: [1] llPolll "P0p" IlLFII Residual mean devian Min. 1st Qu. Med -607.00 -112.40 12 #quality of fit ce. ian .57 Mean 3rd Qu. Max. 0.00 117.90 589.80 preds <- predict(tree_model, testing_set[,1:15]) MAE <- mean(abs(preds - testing set$Crime)) MAE 171.935714285714 plot(tree_model) text(tree_model) print(tree_model$frame) var n 1 Pol 35 5710104. 2 Pol 25 1466071. 4 Pop 13 501357. 8 <leaf> 8 84351. 9 <leaf> 5 169138. 5 LF 12 432502. 10 <leaf> 7 133283. 11 <leaf> 5 85287. 3 M.F 10 2024944. 6 <leaf> 5 658284. 7 <leaf> 5 856352, # Create a function for cross-validation dev 97 04 23 88 80 92 71 20 90 80 00 911 1017 yval splits.cutleft splits.cutright .9714 752. 612. 503. 787. 904. .4286 746. 1310. 1084. 1536. 7200 5385 3750 2000 5833 6000 1000 2000 0000 <10.75 <7.05 <22.5 <9.58 <96.75 cross_val_prune <- function(data, folds, branches) { fold_size <- nrow(data) %/% folds avg_accuracy <- list() for (k in branches) { accuracy list <- list for (i in 1:folds) { 0 start <- (i - 1) * fold _size + 1 end <- ifelse(i == folds, nrow(data), i * fold size) val data <- data[start:end, ] train_data <- data[-c(start:end), ] # Train model prune.tree_model <- prune.tree(tree model, best = k) >10.75 >7.05 >22.5 >0.58 >96.75 ‘numDeriv’, f‘progressr’, ¢SQUAR
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# Make predictions preds <- predict(prune.tree_model, val_data[,1:15]) MAE <- mean(abs(preds - val_data$Crime)) # Add to list accuracy_list[[as.character(i)]] <- MAE } avg_accuracy[[as.character(k)]] <- mean(unlist(accuracy_list)) } return(avg_accuracy) } result <- cross_val_prune(training_set, folds = 1@, branches = 2:6) # Print the results accuracy_df = data.frame(Branches = names(result), MAE = unlist(result)) accuracy_df <- accuracy_df[order(accuracy_df$MAE), ] # Create a line plot ggplot(accuracy_df, aes(x = Branches, y = MAE)) + geom_point() + labs(x = "Number of Branches", y = "Mean Absolute Error") + ggtitle("MAE vs. Number of Trees") # Create a function for cross-validation cross_val <- function(data, folds, numtrees) { fold_size <- nrow(data) %/% folds avg_accuracy <- list() for (k in numtrees) { accuracy_list <- list() for (i in 1:folds) { start <- (i - 1) * fold_size + 1 end <- ifelse(i == folds, nrow(data), i * fold_size) val_data <- data[start:end, ] train_data <- data[-c(start:end), ] # Train model rf_classifier <- randomForest(Crime ~ ., data = train_data, ntree = k) # Make predictions preds <- predict(rf_classifier, val_data[,1:15]) MAE <- mean(abs(preds - val_data$Crime})) # Add to list accuracy_list[[as.character(i)]] <- MAE } avg_accuracy[[as.character(k)]] <- mean(unlist(accuracy_list)) } return(avg_accuracy) } result <- cross_val(training_set, folds = 10, numtrees = 10:500) # Print the results accuracy_df = data.frame(Trees = names(result), MAE = unlist(result)) accuracy_df <- head(accuracy_df[order(accuracy_df$MAE), ]) head(accuracy_df)
Adata.frame: 6 x 2 Trees MAE <chr> <dbl> # Create a line plot ggplot(accuracy_df, aes(x = Trees, y = MAE)) + geom_point() + labs(x = "Number of Trees", y = "Mean Absolute Error") + ggtitle("MAE vs. Number of Trees") L* -4 oI LU VO 1 rf_classifier <- randomForest(Crime ~ ., data = training_set, ntree = 12) #quality of fit preds <- predict(rf_classifier, testing_set[,1:15]) MAE <- mean(abs(preds - testing_set$Crime)) MAE 201.094560185185 rf_classifier$importance A matrix: 15 x 1 of type dbl IncNodePurity M 202046.89 So 0.00 Ed 346167.78 Po1 1277314.89 Po2 531698.20 LF 216288.30 M.F 83758.37 Pop 359573.44 NW 168214.81 u1 263844.81 U2 238374.61 Wealth 367466.09 Ineq 444998.31 Prob 556052.98 Time 182935.93 plot(rf_classifier) ¥ Question 10.3 credit.data <- read.delim("germancredit.txt",sep=" ",header=F) head(credit.data)
A data.frame: 6 x 21 Vi V2 V3 V4 V5 V6 V7 V8 V9 vie - V12 V13 Vi4 V15 V16 V17 V18 V19 V20 # Identify categorical columns (excluding the target variable 'class’') categorical_cols <- names(credit.data)[sapply(credit.data, is.character) & names(credit.data) != "v21"] categorical_cols 'V1'-'V3'-'V4'-'V6' - 'VT7' - 'V9' - 'V10' - 'V12' - 'V14' - V15 - VAT - VA9 - V20 # Create dummy variables for categorical columns dummy_data <- dummyVars(V21l ~ ., data = credit.data[, c("v21", categorical _cols)]) dummy_data <- predict(dummy_data, newdata = credit.data) # Combine the dummy variables with the original data credit.data_encoded <- cbind(credit.data[, -which(names(credit.data) %in% categorical cols)], dummy_data) head(credit.data_encoded) A data.frame: 6 x 62 V2 V5 V8 Vil V13 V16 V18 V21 V1Al11l V1Al2 .- V15A152 V15A153 V17A171 V17A172 V17A173 V17A174 V19A191] <int> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> - <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl: 1 6 1169 4 4 67 2 1 1 1 o .- 1 0 0 0 1 0 ( 2 48 5951 2 2 22 1 1 2 0 1 . 1 0 0 0 1 0 1 3 12 2096 2 3 49 1 2 1 0 o - 1 0 0 1 0 0 1 4 42 7882 2 4 45 1 2 1 1 o - 0 1 0 0 1 0 1 5 24 4870 3 4 53 2 2 2 1 o .- 0 1 0 0 1 0 1 6 36 9055 2 4 35 1 2 1 0 o .- 0 1 0 1 0 0 ( summary (credit.data_encoded)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
V2 V5 v8 V1l V13 Min. : 4.0 Min. : 250 Min. :1.000 Min. :1.000 Min. :19.00 1st Qu.:12.0 1st Qu.: 1366 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:27.00 Median :18.0 Median : 2320 Median :3.000 Median :3.000 Median :33.00 Mean :20.9 Mean : 3271 Mean :2.973 Mean :2.845 Mean :35.55 3rd Qu.:24.0 3rd Qu.: 3972 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:42.00 Max. 72,6 Max. :18424 Max. 14,000 Max. :4,000 Max. :75.00 V1ie V18 V21 V1Al1ll V1A12 Min. :1.000 Min. 1.000 Min. :1.0 Min. :0.000 Min. 0.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:0.000 1st Qu.:0.000 Median :1.000 Median :1.000 Median :1.0 Median :0.000 Median :0.000 Mean :1.407 Mean 1.155 Mean :1.3 Mean :0.274 Mean 0.269 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.0 3rd Qu.:1.000 3rd Qu.:1.000 Max. :4.000 Max :2.000 Max. :2.0 Max. :1.000 Max. :1.000 V1A13 V1A14 V3A30 V3A31 V3A32 Min. :0.000 Min. :0.000 Min. 10.00 Min. :10.000 Min. :10.00 1st Qu.:0.00@0 1st Qu.:9.000 1st Qu.:0.00 1st Qu.:8.000 1st Qu.:0.00 Median :0.000 Median :9.000 Median :0.00 Median :0.000 Median :1.00 Mean :0.063 Mean 9.394 Mean 10.04 Mean 0.049 Mean :0.53 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.00 3rd Qu.:0.000 3rd Qu.:1.00 Max. :1.000 Max 1.000 Max. :1.00 Max. :1.000 Max. :1.00 V3A33 V3A34 V4A40 V4A41 Min. :0.000 Min 0.000 Min. :0.000 Min. 0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:9.000 1st Qu.:0.000 Median :0.000 Median :9.000 Median :0.000 Median :0.000 Mean :0.088 Mean 9.293 Mean 10.234 Mean 0.103 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.000 Max. :1.000 Max. :1.000 Max. :1.800 Max. :1.000 V4A410 V4A42 V4A43 V4A44 V4AA45 Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000 1st Qu.:9.000 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:9.000 1st Qu.:0.000 Median :0.000 Median :0.000 Median :0.00 Median :0.000 Median :0.000 Mean :10.012 Mean 9.181 Mean :9.28 Mean :0.012 Mean :19.022 3rd Qu.:0.000 3rd Qu.:9.000 3rd Qu.:1.00 3rd Qu.:0.000 3rd Qu.:0.000 Max. :1.000 Max. :1.000 Max. :1.00 Max. :1.000 Max. :1.000 VAA46 V4A48 V4A49 V6A61 V6A62 Min :0.00 Min 0.000 Min. 0.000 Min., :0.000 Min, :9.000 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 Median :0.00 Median :0.000 Median :0.000 Median :1.000 Median :0.000 Mean :0.05 Mean 0.009 Mean 0.097 Mean :0.603 Mean :0.1063 3rd Qu.:0.00 3rd Qu.:9.000 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000 Max. :1.006 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 V6A63 V6A64 V6A65 V7A71 Min. :0.000 Min. :9.000 Min. :0.000 Min. :0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 Median :0.000 Median :0.000 Median :0.000 Median :0.000 Mean :0.063 Mean :9.048 Mean 9.183 Mean :0.062 3rd Qu.:0.000 3rd Qu.:90.000 3rd Qu.:0.000 3rd Qu.:0.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 V7A72 V7A73 V7A74 V7A75 VOA91 Min. :0.000 Min. :0.000 Min. 0.000 Min. :0.000 Min. :0.00 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:90.000 1st Qu.:0.000 1st Qu.:9.00 Median :0.000 Median :0.000 Median :0.000 Median :9.000 Median :0.00 Mean :0.172 Mean :9.339 Mean 0.174 Mean :0.253 Mean :19.05 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:90.000 3rd Qu.:1.000 3rd Qu.:0.00 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.00 VIA92 VIA93 V9A94 V10A101 V10A102 credit.data_encoded$v21[credit.data_encoded$v21==1] <- © credit.data_encoded$V21[credit.data_encoded$v21==2] <- 1 Mean :@0.31 Mean :0.548 Mean :0.092 Mean :08.907 Mean 10.041 ggplot(credit.data_encoded, aes(x = V21)) + geom_histogram(binwidth = 9.5, fill = "blue", color = "black") + labs(title = "Histogram", x = "Values™, y = "Frequency") 15L YU. :0.9099 15L YU. :9.990 45T YU. :9.909 15L Yu. :9.009 install.packages("rattle") library(rattle) library(dplyr) library(pROC) install.packages("tidymodels") library(tidymodels) M~ - ArCA Mo = 19N M ~in A NAT? LY PRy - 014 smaller_class_size <- min(table(credit.data_encoded$v21)) balanced_data <- credit.data_encoded %>% group_by(V21) %>% sample_n(size = smaller_class_size) %>% ungroup() Mean :0.179 Mean :0.713 Mean :0.108 Mean :0.822 Mean 9.2 credit.model <- glm(V2l ~ ., data = balanced_data, family =binomial(link="logit"))
V17A173 V17A174 V19A191 V19A192 V20A201 summary (credit.model) Call: glm(formula = V21 ~ ., family = binomial(link = "logit"), data = balanced_data) Coefficients: (13 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) V2 V5 V8 V1l V13 V16 V18 V1A11 V1A12 V1A13 V1A14 V3A30 V3A31 V3A32 V3A33 V3A34 V4A40 V4p41l V4A410 V4A42 V4A43 V4A44 V4A45 V4A46 V4A48 V4A49 V6A61 V6A62 V6A63 V6A64 V6A65 V7A71 V7A72 V7A73 V7A74 V7A75 VIA91 VIA92 V9A93 V9A94 V10A101 V10A102 V1eA103 V12A121 V12A122 V12A123 V12A124 V14A141 V14A142 V14A143 V15A151 V15A152 V15A153 V17A171 V17A172 V17A173 V17A174 V19A191 V19A192 V20A201 V20A202 Signif. codes: © (Dispersion parameter for binomial -6.948e+00 2.062e-02 1.344e-04 2.821e-01 2.949e-02 -6.047e-03 3.799e-01 6.173e-01 2.041e+00 1.493e+00 4.711e-01 NA 1.529e+00 1.474e+00 5.633e-01 6.767e-01 NA 6.586e-01 -1.531e+00 -1.626e-01 2.024e-02 -6.732e-02 -5.528e-01 1.242e-01 5.794e-01 -1.301e+00 NA 6.952e-01 1.586e-01 7.248e-01 -6.518e-01 NA 5.201e-01 8.090e-01 6.628e-01 -3.576e-02 NA 1.794e-01 -2.309%e-01 -7.687e-01 NA 7.272e-01 1.094e+00 NA -3.872e-01 -2.026e-01 -2.845e-01 NA 6.067e-01 1.074e+00 NA 9.321e-02 -2.641e-01 NA -1.622e+00 -7.689%e-01 -6.018e-01 NA 2.900e-01 NA 1.987e+00 NA €xkk 1.562e+00 1.161e-02 5.702e-05 1.110e-01 1.085e-01 1.182e-02 2.415e-01 3.365e-01 2.961e-01 2.856e-01 4.242e-01 NA 6.279%e-01 5.946e-01 3.049%9¢e-01 4.403e-01 NA 4.27%e-01 5.745e-01 1.374e+00 4.553e-01 4.270e-01 8.457e-01 6.996e-01 5.825e-01 1.358e+00 NA 3.143e-01 4.201e-01 5.332e-01 6.465e-01 NA 5.102e-01 3.737e-01 3.220e-01 3.834e-01 NA 5.933e-01 3.925e-01 3.969e-01 NA 5.259e-01 7.075e-01 NA 5.222e-01 5.0858e-01 4.97%e-01 NA 2.997e-01 5.357e-01 NA 6.179e-01 5.941e-01 NA 7.707e-01 4.63%e-01 3.838e-01 NA 2.586e-01 NA 7.324e-01 NA €%k 0.001 Null deviance: 831.78 on 599 Residual deviance: AIC: 682.39 584.39 on 551 -4.447 8.70e-06 *** 1.775 @.07589 . 2.357 0.01840 * 2.543 0.01099 0.272 0.78573 -8.512 0.60881 1.573 0.11566 1.834 0.06663 . 6.893 5.46e-12 5.230 1.70e-07 1.111 0.26673 NA NA 2.435 0.01488 * 2.479 0.01316 * 1.848 0.06467 . 1.537 0.12434 NA NA 1.539 0.12377 -2.665 ©0.00770 ** -0.118 0.90575 0.044 0.96454 -8.158 0.87473 -0.654 0.51330 0.178 0.85910 8.995 0.31986 -0.957 0.33832 NA NA 2.212 9.02699 * 8.378 0.70573 1.359 0.17406 -1.008 0.31334 NA NA 1.019 0.30805 2.164 0.03043 * 2.958 0.03958 -0.093 0.92567 NA NA 0.302 0.76239 -0.588 0.55632 -1.937 0.05275 . NA NA 0.16678 0.12210 NA NA 0.45838 0.68869 0.56767 NA NA 0.04293 * 0.04505 NA NA 0.88009 0.65668 NA NA ©0.03535 * 0.09738 0.11685 NA NA 0.26200 NA NA 0.00667 ** NA NA * * * 0.01 ‘*’ @.05 .’ 0.1 °° 1 family taken to be 1) degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 5
preds <- predict(credit.model,newdata=credit.data_encoded,type="response") threshold <- 0.5 predicted_classes <- ifelse(preds >= threshold, 1, 9) true_values <- as.factor(credit.data_encoded$v21) confusion_matrix <- confusionMatrix(as.factor(predicted_classes), true_values) cm_data <- as.data.frame(as.table(confusion_matrix)) confusion_matrix ggplot(data = cm_data, aes(x = Reference, y = Prediction, fill = Freq)) + geom_tile() + geom_text(aes(label = sprintf("%d", Freq)), vjust = 1) + scale_fill_gradient(low = "lightblue", high = "darkred™) + theme_minimal() + labs(title = "Confusion Matrix”, x = "Actual”, y = "Predicted”, fill = "Frequency”) + theme(legend.position = "right") Confusion Matrix and Statistics Reference Prediction (%] 1 8 505 65 1 195 235 Accuracy : 0.74 95% CI : (0.7116, 0.7669) No Information Rate : 0.7 P-Value [Acc > NIR] : ©.002908 Kappa : 0.4492 Mcnemar's Test P-Value : 1.242e-15 Sensitivity : 0.7214 Specificity : ©.7833 Pos Pred Value : ©.8860 Neg Pred Value : 0.5465 Prevalence : 0.7000 Detection Rate : ©.5850 Detection Prevalence : ©.5700 Balanced Accuracy : 0.7524 'Positive’ Class : @ Confusion Matrix 5 195 235 Frequency 500 3 | j 400 S g 300 & 200 100 0 505 65 Actual #create train/test split set.seed(1234) #for reproducability train_indices <- createDataPartition(credit.data_encoded$v2l, times=1, p=.7, 1list=FALSE) training_set <- balanced_data[train_indices, ] testing_set <- balanced_data[-train_indices, ] credit.model <- glm(V21l ~ V5 + V8 + V1A11l + V1A12 + V3A30 + V3A31 + VAAALl + V6A61 + V7A72 + V7A73 + V14A141 + V14A142 + V17A171 + V20A201, d:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
summary (credit.model) Call: glm(formula = V21 ~ V5 + V8 + V1A11l + V1A12 + V3A30 + V3A31 + V4A41 + V6A61 + V7A72 + V7A73 + V14A141 + V14A142 + V17A171 + V20A201, family = binomial(link = "logit"), data = training_set) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.428e+00 1.025e+0@ -5.294 1.20e-07 *** V5 2.035e-04 5.125e-85 3.971 7.16e-@5 *** V8 2.101e-01 1.122e-01 1.872 0.06114 . V1Al1ll 2.030e+00 2.996e-01 6.776 1.24e-11 *** V1A12 1.278e+00 2.903e-01 4.402 1.07e-05 *** V3A30 1.560e+00 7.658e-01 2.037 0.04168 * V3A31 1.239e+00 5.862e-01 2.114 @.03455 * V4A41 -1.632e+00 4.773e-01 -3.418 0.00063 *** V6A61 5.649e-01 2.553e-01 2.213 ©.02690 * V7A72 1.024e+00 3.253e-01 3.149 0.00164 ** V7A73 4.603e-01 2.721e-01 1.691 0.09074 . V14A141 9.845e-01 3.523e-01 2.794 0.00520 ** V14A142 6.207e-01 6.421e-01 0.967 0.33369 V17A171 -5.031e-02 7.116e-01 -0.071 0.94364 V20A201 2.233e+00 8.677e-01 2.573 0.01008 * Signif. codes: @ ‘***’ @.,901 “**’ @.01 ‘*’ 9.05 ‘.’ 0.1 ¢’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 576.69 on 415 degrees of freedom Residual deviance: 438.45 on 401 degrees of freedom (284 observations deleted due to missingness) AIC: 468.45 Number of Fisher Scoring iterations: 5 preds <- predict(credit.model,newdata=testing_set,type="response”) threshold <- 0.5 predicted_classes <- ifelse(preds >= threshold, 1, @) true_values <- as.factor(testing_set$v21) confusion_matrix <- confusionMatrix(as.factor(predicted_classes), true_values) c¢m_data <- as.data.frame(as.table(confusion_matrix)) confusion_matrix ggplot(data = cm_data, aes(x = Reference, y = Prediction, fill = Freq)) + geom_tile() + geom_text(aes(label = sprintf("%d", Freq)), vjust = 1) + scale_fill_gradient(low = "lightblue", high = "darkred™) + theme_minimal() + labs(title = "Confusion Matrix", x = "Actual”, y = "Predicted"”, fill = "Frequency") + theme(legend.position = "right") roc_obj <- roc(response = true_values, predictor = preds) # Plot the ROC curve with AUC plot.roc(roc_obj, print.auc = TRUE, auc.polygon = TRUE, grid = TRUE, legacy.axes = TRUE) preds <- predict(credit.model,newdata=testing_set,type="response"”) threshold <- 0.806 predicted_classes <- ifelse(preds >= threshold, 1, @) true_values <- as.factor(testing_set$v21) confusion_matrix <- confusionMatrix(as.factor(predicted_classes), true_values) cm_data <- as.data.frame(as.table(confusion_matrix)) confusion_matrix ggplot(data = cm_data, aes(x = Reference, y = Prediction, fill = Freq)) + geom_tile() + geom_text(aes(label = sprintf("%d", Freq)), vjust = 1) + scale_fill_gradient(low = "lightblue", high = "darkred™) + theme_minimal() + labs(title = "Confusion Matrix", x = "Actual", y = "Predicted”,
fill = "Frequency") + theme(legend.position = "right") Confusion Matrix and Statistics Reference Prediction © 1 0 84 68 1 7 25 Accuracy : 0.5924 95% CI : (@©.5177, 9.6641) No Information Rate : ©.5054 P-vValue [Acc > NIR] : ©.01098 Kappa : ©.1905 Mcnemar's Test P-Value : 4.262e-12 Sensitivity : 0.9231 Specificity : 0.2688 Pos Pred Value : 0.5526 Neg Pred Value : 0.7813 Prevalence : 0.4946 Detection Rate : 0.4565 Detection Prevalence : 0.8261 Balanced Accuracy : 0.5959 'Positive’ Class : © Confusion Matrix fifl B 7 25 Frequency 80 g % = 20 G 84 68 Actual # Define the cost matrix cost_matrix = matrix(c(®, 5, 1, @), nrow = 2) # Compute the total cost for different thresholds thresholds <- seq(@, 1, by = 0.01) total_costs <- numeric(length(thresholds)) for (i in 1:length(thresholds)) { # Apply the threshold to predicted probabilities thresholded_predictions <- ifelse(preds >= thresholds[i], 1,0) # Create a confusion matrix confusion_matrix <- confusionMatrix(as.factor(thresholded_predictions), true_values) fn <- confusion_matrix$table[1, 2] fp <- confusion_matrix$table[2, 1] # Calculate the total cost based on the cost matrix total_costs[i] <- fn*5+fp*1 # Find the threshold with the lowest total cost best_threshold <- thresholds[which.min(total_costs)] cat("Best Threshold:", best_threshold, "\n") # Use the best threshold to classify your predictions classified_predictions <- ifelse(preds >= best_threshold, 1, 0)
# Evaluate your model using the best threshold and cost-sensitive metrics confusion_matrix <- confusionMatrix(as.factor(classified_predictions), true_values) c¢m_data <- as.data.frame(as.table(confusion_matrix)) ggplot(data = cm_data, aes(x = Reference, y = Prediction, fill = Freq)) + geom_tile() + geom_text(aes(label = sprintf("%d", Freq)), vjust = 1) + scale_fill_gradient(low = "lightblue", high = "darkred™) + theme_minimal() + labs(title = "Confusion Matrix", x = "Actual", y = "Predicted n 2 fill = "Frequency™) + theme(legend.position = "right") confusion_matrix Warning message “Levels are not Warning message “Levels are not Warning message “Levels are not Warning message “Levels are not Warning message “Levels are not Best Threshold: Reference Prediction © 1 e 37 4 1 54 89 in in in in in in in in in in confusionMatrix.default(as.factor(thresholded_predictions), true_values): the same order for reference and data. Refactoring data to match.” confusionMatrix.default(as.factor(thresholded_predictions), true_values): the same order for reference and data. Refactoring data to match.” confusionMatrix.default(as.factor(thresholded_predictions), true_values): the same order for reference and data. Refactoring data to match.” confusionMatrix.default(as.factor(thresholded _predictions), true_values): the same order for reference and data. Refactoring data to match.” confusionMatrix.default(as.factor(thresholded_predictions), true_values): the same order for reference and data. Refactoring data to match.” 9.18 Confusion Matrix and Statistics Accuracy : 0.6848 95% CI : (0.6123, @.7512) No Information Rate : ©.5054 P-Value [Acc > NIR] : 6.222¢-07 Kappa : ©.3657 Mcnemar's Test P-Value : 1.243e-10 Sensitivity : 0.4066 Specificity : 0.9570 Pos Pred Value : 0.9024 Neg Pred Value : 0.6224 Prevalence : 0.4946 Detection Rate : 0.2011 Detection Prevalence : ©.2228 Balanced Accuracy : 9.6818 'Positive’ Class : © Confusion Matrix Predicted Frequency Actual total_costs <- data.frame(Cost = total_costs, Threshold=thresholds)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
plot <- ggplot(total_costs, aes(Threshold, Cost)) + geom_line() + geom_point(data = total_costs[total_costs$Cost == min(total_costs$Cost), ], aes(Threshold, Cost), color = "red", size = 3) + labs( title = "Total Costs vs Threshold”, X = "Threshold”, y = "Cost" ) plot plot(credit.model) data.frame(credit.model$coefficients) A data.frame: 15 x 1 credit.model.coefficients <dbl> (Intercept) -5.4276285631 V5 0.0002035011 Vs 0.2101376376 V1A 2.0302865296 V1A12 1.2778372848 V3A30 15598222459 V3A31 1.2389246022 V4A41 -1.6316162897 V6A61 0.5649107523 V7AT2 1.0242411208 V7AT3 0.4603303719 V14A141 0.9845414604 V14A142 0.6207414093 VA7A171 -0.0503063358 V20A201 2.2325916676
u1 Nw L - T T T T T T T €10 TT0 600 L00 o * ....... + T T T T T or oe oz ot 0 Wealth u2 0002 0009 000S 000F 000€
So T T T T T T 0T 80 90 ¥0 20 00 T T T T T LT 9T ST ¥#T €T 2T Pol Ed 9T ¥T 2T OT 8 9 T ozt T 0TT T oot T 06
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Prob Ineq T 0T'0o 90°0 T 200 k14 oz ST Crime Time T T T T 0002 00ST 000T 005 T T T T T 1T S OF S€ 0€ SZ 02 ST
M.F. 90T
Pop oo 0ST 00T 0s
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
NwW or 0g oz oT
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Density Density 0.10 0.20 0.30 0.00 0.10 020 0.30 0.00 M Density Plot So Density Plot 12 z 3 ] 2 5 a = S < T T T T ° T T T T T 12 14 16 18 05 00 05 10 15 M So Ed Density Plot Po1 Density Plot o q S 2 8 2 o 5 a g s 8 T T T T s T T T T 9 10 11 12 13 5 10 15 20 Ed Pol
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Density Density 0.05 0.10 0.15 0.00 0.05 010 015 0.00 Po2 Density Plot LF Density Plot © 2 ] 5 g < ~ ° T T T T T T T T 5 10 15 045 050 055 0.60 065 0.70 Po2 LF M.F Density Plot Pop Density Plot 2 ] 2 5 o 95 100 105 110 0.000 0.005 0.010 0.015 200
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Density Density 0.02 0.04 0.06 0.00 00 01 02 03 04 NW Density Plot U1 Density Plot < ) B 2 3 5 S a 3 o T T T T 1 T T T T T T -10 10 20 30 40 50 0.06 0.08 0.0 0.12 0.14 0.16 NW u1 U2 Density Plot Wealth Density Plot ; z 3 z2 & a 3 Ei 8 T T T T T T 8 T T T T T T T 1 3 4 5 6 2000 4000 6000 8000
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Density 1.2 1.0 0.8 0.6 0.4 0.2 0.0 So Density Plot 0.0 0.5 So 1.0 1.5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
P o ° 0 o 2% ° P o o =) @OQ OO o o 8o ¢ 00 80 3% o o ° T T T T 0002 00ST 000T 00S suwiio S ° o el o o ° o o Foo o o 00 oowsa o ° %% 80 8o 8s T T T T 0002 00ST 000T 00S Bawny 0.55 0.60 0.50 LF Po2 000z 00ST 000T sunp B ° ° o OO o o o o o o0 o 0 AN 00 0020 ° o %, © o T T T T 000z 00ST 00T 008 sunp 150 100 104 94 96 98 100 Pop M.F
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
o 00 @@ o 0 © ® ® ® ocamam 1.0 06 08 Pol 0.4 0.2 0.0 T T T T 0002 00ST 000T 005 Bawny ° %05 0 © ° ° o ° %00 g o o m%o ° 15 16 17 105 115 Ed 14 13 9.0 95 12 T T T T 0002 00ST 000T 005 Bawny
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Crime Crime 500 1000 1500 2000 500 1000 1500 2000 2000 B bl ® 8 ° ° gl o 3 ? oo g 0% o ° ® 5 8 ° o o gs o5 S o ©o % o o 00 oo o © Po o ° 08 o * o ° oo @0°% g @ °° ° ol o o T T T T T T T T T 0 10 20 007 009 011 013 NW U1 % ° o ° g ° 5 8 COQ)O ° 8 °© ®0° o Qo o ® °8 o 6 o ° ° o T T T T 2 3 4 3000
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
o 0 0 ap oo o @ © oo om ® 0000 @O O T T T T 000z 00ST 000T 00S 1.0 0.8 0.6 0.4 0.2 0.0 So
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Pol 7.05 22.5 LF <] Pol <,10.75 } M.F <[96.75 0.58 1017.0 1084.0 1536.0 746.6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Mean Absolute Error MAE vs. Number of Trees 260~ 240- 220- 200- 1 Number of Branches
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Mean Absolute Error MAE vs. Number of Trees 265~ 260~ 255+ 250~ 107 12 23 32 Number of Trees 79 89
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
rf_classifier T 00000€ T 000052 Jou3 T 000002 T 0000ST 12 10 trees
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Histogram 600~ 400~ Frequency 200~ 0.0 Values 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Predicted Confusion Matrix B 195 505 235 Frequency 500 400 i 300 200 100 65 Actual
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Predicted Confusion Matrix 19 72 Actual Frequency 70 60 50 25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Sensitivity 1.0 0.8 0.6 0.4 0.2 0.0 AUC: 0.806 0.0 0.2 T T 0.4 0.6 1 - Specificity 0.8 1.0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Predicted Confusion Matrix Actual 25 Frequency 80 60 40 20
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Predicted Confusion Matrix 37 Actual Frequency 80 60 40 20
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Cost Total Costs vs Threshold 400~ 300~ 200- 100~ 050 Threshold
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Pearson Residuals Residuals vs Fitted < 0310 o o ~ o @D ooao @ o o So@oo o ® Q)() ? ¥ A 1700 779 © | ' T T T T T -4 -2 0 2 4 Predicted values gim(V21 ~ V5 + V8 + V1ALl + V1A12 + V3A30 + V3A31 + VAA4L + V6A6L + VTAT2 + ...
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
IStd. Deviance resid.| Q-Q Residuals 25 00 goceiro 2.0 1.5 1.0 0.5 770 0.0 0.5 1.0 1.5 2.0 2.5 Theoretical Quantiles gim(V21 ~ V5 + V8 + V1A1l + VI1A12 + V3A30 + V3A31 + VAA4L + VBABL + VTAT2 + ...
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Scale-Location 70 N 9217 170 & ° v | = o k=] o 2 c 5 2 o £ 2 = (% v S < S T T T T T -4 -2 0 2 4 Predicted values gim(V21 ~ V5 + V8 + V1ALl + V1A12 + V3A30 + V3A31 + VAA4L + V6A6L + VTAT2 + ...
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Std. Pearson resid. Residuals vs Leverage < 0217 ~ o 4 g ¥ o o7 © | ---- Cook's distance T T 0.00 0.05 T 0.10 Leverage gim(V21 ~ V5 + V8 + V1A1l + VI1A12 + V3A30 + V3A31 + VAA4L + VBABL + VTAT2 + ...
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help