Lab4

docx

School

Western Michigan University *

*We aren’t endorsed by this school

Course

5821

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by CaptainSeaLion4027

Comparing KNN, LDA, and Logistic Regression Maisha Maliha 10/04/2023 # Load necessary libraries library (class) library (Metrics) library (MASS) library (nnet) # Load and preprocess your dataset data ( "iris" ) # Replace 'X' and 'y' with your feature matrix and target variable X <- iris[, c ( "Sepal.Length" , "Sepal.Width" , "Petal.Length" , "Petal.Width" )] y <- iris $ Species # Split the dataset into training (70%) and testing (30%) sets set.seed ( 123 ) splitIndex <- sample ( 1 : nrow (X), 0.7 * nrow (X)) train_data <- X[splitIndex, ] test_data <- X[ - splitIndex, ] train_labels <- y[splitIndex] test_labels <- y[ - splitIndex] # Convert to binary classification train_labels_binary <- ifelse (train_labels == "versicolor" , 1 , 0 ) test_labels_binary <- ifelse (test_labels == "versicolor" , 1 , 0 ) # Split the dataset into training (70%) and testing (30%) sets set.seed ( 123 ) splitIndex <- sample ( 1 : nrow (X), 0.7 * nrow (X)) train_data <- X[splitIndex, ] test_data <- X[ - splitIndex, ] train_labels <- y[splitIndex] test_labels <- y[ - splitIndex] # K-Nearest Neighbors (KNN) best_k <- NULL best_accuracy <- 0 # Iterate through different values of K to find the optimal K for (k in 1 : 20 ) {

knn_model <- knn (train_data, test_data, train_labels, k = k) accuracy <- sum (knn_model == test_labels) / length (test_labels) if (accuracy > best_accuracy) { best_accuracy <- accuracy best_k <- k } } # Train the final KNN model with the best K value final_knn_model <- knn (train_data, test_data, train_labels_binary, k = best_k) # Evaluate KNN cm <- table ( Actual = test_labels_binary, Predicted = final_knn_model) knn_accuracy <- sum ( diag (cm)) / sum (cm) knn_precision <- cm[ 2 , 2 ] / sum (cm[, 2 ]) knn_recall <- cm[ 2 , 2 ] / sum (cm[ 2 , ]) knn_f1_score <- 2 * (knn_precision * knn_recall) / (knn_precision + knn_recall) # Linear Discriminant Analysis (LDA) + Logistic Regression lda_model <- lda (train_labels ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = train_data) # LDA scores for the training data using the predict function lda_scores_train <- predict (lda_model, train_data) $ x # Train logistic regression model logistic_model <- glm (train_labels_binary ~ lda_scores_train[, 1 ] + lda_scores_train[, 2 ], family = binomial) # Predict probabilities for the test data lda_scores_test <- predict (lda_model, test_data) $ x predicted_probabilities <- predict (logistic_model, newdata = data.frame (lda_scores_test), type = "response" ) ## Warning: 'newdata' had 45 rows but variables found have 105 rows predicted_labels <- rep ( 0 , length (test_labels_binary)) predicted_labels[ 1 : 45 ] <- ifelse (predicted_probabilities > 0.5 , 1 , 0 ) ## Warning in predicted_labels[1:45] <- ifelse(predicted_probabilities > 0.5, : ## number of items to replace is not a multiple of replacement length # Evaluate LDA + Logistic Regression conf_matrix <- table ( Actual = test_labels_binary, Predicted = predicted_labels)

lda_accuracy <- sum ( diag (conf_matrix)) / sum (conf_matrix) lda_precision <- conf_matrix[ 2 , 2 ] / sum (conf_matrix[, 2 ]) lda_recall <- conf_matrix[ 2 , 2 ] / sum (conf_matrix[ 2 , ]) lda_f1_score <- 2 * (lda_precision * lda_recall) / (lda_precision + lda_recall) # Logistic Regression # Fit logistic regression model lr_model <- glm (train_labels_binary ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = train_data, family = binomial) # Predict on the test data using the logistic regression model lr_predicted <- predict (lr_model, newdata = test_data, type = "response" ) # Evaluate Logistic Regression # Convert probabilities to binary predictions lr_predicted_binary <- ifelse (lr_predicted > 0.5 , 1 , 0 ) # Evaluate Logistic Regression Actual <- test_labels_binary Predicted <- lr_predicted_binary cm_lr <- as.table ( table (Actual, Predicted)) log_accuracy <- sum ( diag (cm_lr)) / sum (cm_lr) log_precision <- cm_lr[ 2 , 2 ] / sum (cm_lr[, 2 ]) log_recall <- cm_lr[ 2 , 2 ] / sum (cm_lr[ 2 , ]) log_f1_score <- 2 * (log_precision * log_recall) / (log_precision + log_recall) Results and Discussion After conducting the experiments, we obtained the following results: K-Nearest Neighbors (KNN) • Optimal K: cat ( best_k, " \n " ) ## 1 cat ( knn_accuracy, " \n " ) ## 0.9777778 cat ( knn_precision, " \n " ) ## 1

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

cat ( knn_recall, " \n " ) ## 0.9444444 cat (knn_f1_score, " \n " ) ## 0.9714286 Linear Discriminant Analysis (LDA) + Logistic Regression cat ( "Accuracy: " , lda_accuracy, " \n " ) ## Accuracy: 0.5111111 cat ( "Precision: " , lda_precision, " \n " ) ## Precision: 0.3 cat ( "Recall: " , lda_recall, " \n " ) ## Recall: 0.1666667 cat ( "F1 Score: " , lda_f1_score, " \n " ) ## F1 Score: 0.2142857 Logistic Regression cat ( "Accuracy: " , log_accuracy, " \n " ) ## Accuracy: 0.7111111 cat ( "Precision: " , log_precision, " \n " ) ## Precision: 0.7272727 cat ( "Recall: " , log_recall, " \n " ) ## Recall: 0.4444444 cat ( "F1 Score: " , log_f1_score, " \n " ) ## F1 Score: 0.5517241 Based on the evaluation metrics, let’s discuss the performance of each model in the specific class “versicolor” and gain insights into their strengths and weaknesses: KNN stands out with its high accuracy, strong recall and perfect precision, meaning it rarely misclassifies positive cases, and its high recall indicates its ability to capture most positive cases. The impressive F1 Score further confirms the model’s strong performance. The combination of Linear Discriminant Analysis (LDA) and Logistic Regression has shown significantly lower accuracy compared to KNN. The model’s precision, recall, and F1 Score are also quite low. This suggests that the LDA might not be the best feature reduction technique for this dataset, and the logistic regression model may struggle to fit the reduced

feature space effectively. This approach might require further tuning or different feature selection methods to improve its performance. Logistic Regression demonstrates moderate performance, with an accuracy of 71.11% which offers a reasonable balance between precision and recall. While the F1 Score is higher than that of the LDA + Logistic Regression model, it is still lower than that of KNN. Logistic Regression is a simple and interpretable model, making it a good choice for cases where model interpretability is essential. Conclusion When selecting a model, it is critical to take the individual use case, dataset properties, and computing needs into account. While Logistic Regression offers a straightforward and understandable option, KNN shines in situations when complicated patterns are essential. These models’ performance in their respective domains could be enhanced with more research and hyperparameter optimization.

Lab4

Related Documents