IE6400_Day23

html

School

Northeastern University *

*We aren’t endorsed by this school

Course

6400

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

html

Pages

52

Uploaded by ColonelStraw13148

Report
IE6400 Foundations of Data Analytics Engineering Fall 2023 Module 4: Introduction to Machine Learning Machine Learning Overview Machine learning (ML) is a branch of artificial intelligence (AI) that focuses on building systems that can learn from data. Rather than being explicitly programmed to perform a task, a machine learning algorithm uses statistical techniques to learn patterns in data and make predictions or decisions based on it. Key Aspects of Machine Learning 1. Supervised Learning Description : An algorithm is trained on a labeled dataset, where the data comes with the correct answers. The algorithm makes predictions and is corrected if those predictions are wrong, leading it to learn over time. Common Tasks : Classification (categorizing items) and regression (predicting numerical values). 2. Unsupervised Learning Description : The algorithm is given data without any explicit instructions on what to do with it. It tries to learn patterns and the structure from the data. Common Tasks : Clustering (grouping similar items) and association (finding rules that describe data). 3. Reinforcement Learning Description : An agent learns how to behave in an environment by performing actions and receiving rewards or penalties. Analogy : Teaching a dog new tricks. The dog is the agent, the environment is where the dog can perform tricks, and rewards (or penalties) are treats (or lack of treats). 4. Semi-Supervised and Active Learning Description : These methods use both labeled and unlabeled data for training. Typically, a small amount of labeled data and a large amount of unlabeled data are used. 5. Deep Learning Description : A subset of ML, deep learning models data with deep neural networks, which are algorithms inspired by the structure of the brain. Particularly powerful for tasks like image and speech recognition. Applications of Machine Learning Machine learning has a myriad of applications, including: Web search engines Recommendation systems (e.g., Netflix, Amazon) Image and speech recognition Medical diagnosis Financial forecasting The core idea behind machine learning is that machines take data and "learn" from it, thereby improving their performance over time without being explicitly programmed for the task at hand. Popular Machine Learning Algorithms Machine learning encompasses a wide range of algorithms used for various tasks. Here's an overview of some of the most popular ones:
1. Linear Regression Type : Supervised Use Case : Predicting a continuous target variable based on one or more input features. Description : Assumes a linear relationship between the inputs and the target. It tries to find the best-fit straight line that accurately predict the output values within a range. 2. Logistic Regression Type : Supervised Use Case : Binary classification problems. Description : Estimates the probability that a given instance belongs to a particular category. Despite its name, it's used for classification, not regression. 3. Decision Trees Type : Supervised Use Case : Classification and regression tasks. Description : Splits the data into subsets based on the value of input features. This process is repeated recursively, resulting in a tree-like model of decisions. 4. Random Forest Type : Supervised Use Case : Classification and regression. Description : An ensemble method that creates a 'forest' of decision trees. Each tree is trained on a random subset of the data and makes its own predictions. The random forest algorithm then aggregates these predictions to produce a final result. 5. Support Vector Machines (SVM) Type : Supervised Use Case : Classification and regression. Description : Tries to find a hyperplane that best separates the classes of data. It's particularly useful for classifying complex but small- or medium-sized datasets. 6. K-Means Clustering Type : Unsupervised Use Case : Clustering similar data points together. Description : Partitions the data into 'K' number of clusters where each data point belongs to the cluster with the nearest mean. 7. Neural Networks (Deep Learning) Type : Supervised, Unsupervised Use Case : Complex tasks like image and speech recognition. Description : Composed of layers of nodes or 'neurons'. Can automatically learn and extract features from raw data. 8. Naive Bayes Type : Supervised Use Case : Classification tasks, often used for text data. Description : Based on Bayes' theorem with the 'naive' assumption of conditional independence between every pair of features. 9. Principal Component Analysis (PCA) Type : Unsupervised Use Case : Dimensionality reduction. Description : Transforms the original variables into a new set of variables (the principal components) which are orthogonal (and linearly independent) and which reflect the maximum variance in the data.
10. Gradient Boosting Machines (GBM) Type : Supervised Use Case : Classification and regression. Description : Builds an additive model in a forward stage-wise fashion. It allows for the optimization of arbitrary differentiable loss functions. Each of these algorithms has its strengths and weaknesses and is suitable for different types of tasks. The choice of algorithm often depends on the size, quality, and nature of data, the task to be performed, and the available computational resources. Linear Regression Model Linear Regression is one of the simplest and most commonly used statistical techniques for predictive modeling. It is used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. Key Concepts of Linear Regression Dependent Variable This is the target variable that we are trying to predict or explain. In the context of the Boston Housing dataset, it is the median value of owner-occupied homes ( MEDV ). Independent Variables These are the features or predictors that we use to predict the dependent variable. In the Boston Housing dataset, these could be features like CRIM (crime rate), RM (average number of rooms per dwelling), etc. Linear Relationship Linear Regression assumes that there is a linear relationship between the independent variables and the dependent variable. This means that if you plot the independent variable(s) on the x-axis and the dependent variable on the y-axis, the data points should fall around a straight line. Equation of a Line The equation for a line in a simple linear regression (one independent variable) is: $y = \beta_0 + \beta_1x + \epsilon$ where: ( $y$ ) is the dependent variable, ( $\beta_0$ ) is the y-intercept, ( $\beta_1$ ) is the slope of the line, ( $x$ ) is the independent variable, ( $\epsilon$ ) is the error term. Least Squares Method The parameters ( $\beta_0$ ) and ( $\beta_1$ ) are chosen such that they minimize the sum of the squared differences between the observed values and the values predicted by the model. This method is known as the Least Squares Method. Evaluation Metrics To evaluate the performance of a linear regression model, we commonly use metrics such as Mean Squared Error (MSE) and R-squared (( $R^2$ )). MSE measures the average of the squares of the errors, i.e., the average squared difference between the estimated values and the actual value. ( $R^2$ ) is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination. Application in Python In Python's sklearn library, the LinearRegression class is used to perform linear regression and make predictions. The model is trained using the .fit() method and predictions are made with the .predict() method. Conclusion Linear Regression is a good starting point for regression tasks. It works best when the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
predictors have a linear relationship with the dependent variable, the predictors are not highly correlated with each other, and when the data is homoscedastic, meaning the residuals are equal across the regression line. Exercise 1 Understanding Linear Regression in Machine Learning Problem Statement In this exercise, we will build a Linear Regression model to predict the median house values in California districts. The California Housing dataset includes metrics like median income and housing median age, which we will use as predictors. Our goal is to understand the relationship between the features and the median house value and to evaluate the performance of our regression model. Step-by-Step Guide Step 1: Import Libraries In [1]: # Step 1: Import necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score import warnings # Settings the warnings to be ignored warnings.filterwarnings('ignore') Step 2: Load the Dataset In [2]: california = fetch_california_housing() california_df = pd.DataFrame(california.data, columns=california.feature_names) california_df['MedHouseVal'] = california.target Step 3: Exploratory Data Analysis In [3]: print(california_df.head()) # Display the first five rows of the dataset print(california_df.describe()) # Get the summary statistics MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \ 0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 Longitude MedHouseVal 0 -122.23 4.526 1 -122.22 3.585 2 -122.24 3.521 3 -122.25 3.413 4 -122.25 3.422 MedInc HouseAge AveRooms AveBedrms Population \ count 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 mean 3.870671 28.639486 5.429000 1.096675 1425.476744 std 1.899822 12.585558 2.474173 0.473911 1132.462122 min 0.499900 1.000000 0.846154 0.333333 3.000000 25% 2.563400 18.000000 4.440716 1.006079 787.000000
50% 3.534800 29.000000 5.229129 1.048780 1166.000000 75% 4.743250 37.000000 6.052381 1.099526 1725.000000 max 15.000100 52.000000 141.909091 34.066667 35682.000000 AveOccup Latitude Longitude MedHouseVal count 20640.000000 20640.000000 20640.000000 20640.000000 mean 3.070655 35.631861 -119.569704 2.068558 std 10.386050 2.135952 2.003532 1.153956 min 0.692308 32.540000 -124.350000 0.149990 25% 2.429741 33.930000 -121.800000 1.196000 50% 2.818116 34.260000 -118.490000 1.797000 75% 3.282261 37.710000 -118.010000 2.647250 max 1243.333333 41.950000 -114.310000 5.000010 Step 4: Split the Data into Training and Testing Sets In [4]: X = california_df.drop('MedHouseVal', axis=1) y = california_df['MedHouseVal'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) Step 5: Build the Linear Regression Model In [5]: lin_reg = LinearRegression() lin_reg.fit(X_train, y_train) Out[5]: LinearRegression() Step 6: Make Predictions In [6]: # Make predictions on the testing set y_pred = lin_reg.predict(X_test) Step 7: Evaluate the Model In [7]: # Calculate the Mean Squared Error (MSE) and Coefficient of Determination (R^2) mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f'Mean Squared Error: {mse:.2f}') print(f'R^2 Score: {r2:.2f}') Mean Squared Error: 0.56 R^2 Score: 0.58 Step 8: Visualize the Results In [8]: # Plotting the true values vs predicted values plt.scatter(y_test, y_pred) plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--k') plt.xlabel('Actual Values') plt.ylabel('Predicted Values') plt.title('Actual vs Predicted Housing Values') plt.show()
Interpretation The Mean Squared Error (MSE) and R-Sqaured Score are metrics used to evaluate the performance of the regression model. A lower MSE indicates a model that accurately predicts the target variable. The R-Sqaured Score indicates the proportion of variance in the dependent variable that is predictable from the independent variables. By visualizing the actual vs predicted prices, we can see the accuracy of our model's predictions. Points closer to the diagonal line indicate more accurate predictions. Logistic Regression Model Logistic Regression is a statistical method for predicting binary outcomes from data. Examples of this are "yes" vs "no" or "young" vs "old". These are categories that translate to probability of being a 0 or a 1. Understanding Logistic Regression Binary Outcomes Logistic regression deals with situations where the outcome for a dependent variable is binary or dichotomous. This means the outcome can be classified into one of two classes. Odds and Probabilities The odds of an event is the ratio of the probability of the event to the probability of not the event. Logistic regression predicts the log-odds, or the logarithm of the odds of the dependent variable being true (1). The Logistic Function The logistic function, also called the sigmoid function, is an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits. $f(x) = \frac{1}{1 + e^{-x}}$ Logistic Regression Equation The logistic regression equation is: $\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \ldots + \beta_nx_n$ where: ( $p$ ) is the probability of the dependent event occurring,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
( $\beta_0$ ) is the intercept, ( $\beta_1$, $\beta_2$, $\ldots$, $\beta_n$ ) are the coefficients, ( $x_1$, $x_2$, $\ldots$, $x_n$ ) are the independent variables. Estimating Probabilities The logistic regression function ( $p$ ) is the sigmoid function of ( $\beta_0 + \ beta_1x_1 + \beta_2x_2 + \ldots + \beta_nx_n$ ), which yields values between 0 and 1. These values represent the probability that the dependent variable equals a 1. Model Training The parameters of the logistic regression model are estimated from the training data using the method of maximum likelihood estimation (MLE). Evaluation Metrics The performance of a logistic regression model is typically evaluated using metrics like accuracy, precision, recall, F1 score, and the ROC-AUC curve. Application in Python In Python's sklearn library, the LogisticRegression class is used to perform logistic regression. The model is trained using the .fit() method, and predictions can be made with the .predict() method for classification, or .predict_proba() for obtaining the probability estimates. Conclusion Logistic Regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function. Exercise 2 Logistic Regression Model on the Iris Dataset Problem Statement The goal of this exercise is to build a logistic regression model to predict the species of an iris flower based on the sepal length, sepal width, petal length, and petal width. Dataset We will use the Iris dataset, which is a classic in the field of machine learning. The dataset contains 150 observations of iris flowers, each with four features and a label indicating the species of the iris. Objectives Perform exploratory data analysis Visualize the data Prepare the data for modeling Train a logistic regression model Evaluate the model's performance Interpret the results In [9]: # Import necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix, classification_report import seaborn as sns Load and Explore the Iris Dataset In [10]: # Load the dataset iris = load_iris()
X = iris.data y = iris.target # Convert to DataFrame for easier analysis iris_df = pd.DataFrame(X, columns=iris.feature_names) iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) # Display the first 5 rows of the DataFrame iris_df.head() Out[10]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Data Visualization Let's visualize the relationships between the different features and the species. In [11]: # Pairplot to visualize the relationships between features sns.pairplot(iris_df, hue='species') plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Preparing Data for Logistic Regression We will split the data into a training set and a testing set. In [12]: # Split the data into training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) Training the Logistic Regression Model In [13]: # Initialize the Logistic Regression model logreg = LogisticRegression(max_iter=200) # Fit the model with the training data logreg.fit(X_train, y_train) Out[13]: LogisticRegression(max_iter=200) Model Evaluation We will now evaluate the model's performance on the test set using a confusion matrix and classification report. In [14]: # Predict the labels for the test set y_pred = logreg.predict(X_test) # Classification report print(classification_report(y_test, y_pred)) precision recall f1-score support 0 1.00 1.00 1.00 19 1 1.00 1.00 1.00 13 2 1.00 1.00 1.00 13 accuracy 1.00 45 macro avg 1.00 1.00 1.00 45 weighted avg 1.00 1.00 1.00 45 In [15]: from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt # Assuming y_test are the true labels and y_pred are the predicted labels from your classifier # Compute confusion matrix conf_matrix = confusion_matrix(y_test, y_pred) # Set up the matplotlib figure fig, ax = plt.subplots(figsize=(8, 6)) # Adjust the size as needed # Use imshow to display the confusion matrix cax = ax.matshow(conf_matrix, cmap=plt.cm.Blues) # Add colorbar for reference fig.colorbar(cax)
# Add annotations with the confusion matrix values for (i, j), val in np.ndenumerate(conf_matrix): ax.text(j, i, f'{val:d}', ha='center', va='center', color='white' if val > conf_matrix.max()/2 else 'black') # Set labels for axes ax.set_xlabel('Predicted Label') ax.set_ylabel('True Label') # Adjust the tick labels on x and y axis ax.set_xticks(np.arange(len(np.unique(y_test)))) ax.set_yticks(np.arange(len(np.unique(y_test)))) ax.set_xticklabels(np.unique(y_test)) ax.set_yticklabels(np.unique(y_test)) # Show the plot plt.show() Interpretation The confusion matrix shows the number of correct and incorrect predictions for each class. The classification report provides key metrics such as precision, recall, and F1- score, which help us understand the accuracy of our model. Conclusion This exercise demonstrated the process of implementing a logistic regression model
on the Iris dataset. Through exploratory data analysis, data visualization, model training, and evaluation, we have developed a basic understanding of logistic regression in machine learning. Decision Trees Decision Trees are a type of supervised learning algorithm that is used for both classification and regression tasks. The goal of using a Decision Tree is to create a training model that can use object observations (branches) to conclude about the target value (leaves). How Decision Trees Work A Decision Tree repeatedly splits the data into separate branches. This is a process that is conceptually similar to playing the game of "20 Questions," where you try to guess something by asking yes-or-no questions. In a Decision Tree, these questions are about the features of the data. Components of a Decision Tree: Root Node : This is the first node of the tree where the data is split. Splitting : It is a process of dividing a node into two or more sub-nodes. Decision Node : When a sub-node splits into further sub-nodes, it is called a decision node. Leaf/Terminal Node : Nodes that do not split are called Leaves or Terminal nodes. Pruning : Removing sub-nodes of a decision node is called pruning. This is the opposite of splitting. Branch / Sub-Tree : A subsection of the entire tree is called a branch or sub- tree. Parent and Child Node : A node, which is divided into sub-nodes is called the parent node of the sub-nodes, whereas sub-nodes are the child of the parent node. Advantages of Decision Trees: Simple to understand and interpret : Trees can be visualised, which makes them easy to interpret. Requires little data preparation : Other techniques often require data normalization, dummy variables need to be created and blank values to be removed. None of this is required by Decision Trees. Able to handle both numerical and categorical data : Other techniques are usually specialized in analyzing datasets that have only one type of variable. Disadvantages of Decision Trees: Overfitting : Decision-tree learners can create over-complex trees that do not generalize well from the training data. Not fit for continuous variables : While they can handle numeric data, they are not the best tool for it, especially if the relationship is not linear. Can be unstable : Small variations in the data might result in a completely different tree being generated. How to Build a Decision Tree: 1. Select the best attribute using Attribute Selection Measures (ASM) to split the records. 2. Make that attribute a decision node and breaks the dataset into smaller subsets. 3. Starts tree building by repeating this process recursively for each child until one of the condition will match: All the tuples belong to the same attribute value. There are no more remaining attributes. There are no more instances.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Attribute Selection Measures Attribute selection measure is a heuristic for selecting the splitting criterion that partition data into the best possible manner. It is also known as splitting rules because it helps us to determine breakpoints for tuples on a given node. ASM provides a rank to each feature (or attribute) by explaining the given dataset. The best ASM is used to split the node. Common measures are: Information Gain Gini Index Chi-Square Reduction in Variance Conclusion Decision Trees are a powerful tool for classification and regression. They are intuitive and easy to explain but can become complex and overfit the data. Therefore, understanding how to tune and prune a Decision Tree is essential for effective modeling. Exercise 3 Understanding Decision Trees with the Iris Dataset Introduction In this exercise, we'll use the Iris dataset to build a decision tree classifier. We'll visualize the tree and interpret the results to gain insights into how decision trees make predictions. Load the Dataset The Iris dataset is available through the sklearn.datasets module. We'll start by loading the data and taking a look at its structure. In [16]: from sklearn.datasets import load_iris import pandas as pd # Load the iris dataset iris = load_iris() df_iris = pd.DataFrame(iris.data, columns=iris.feature_names) df_iris['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) # Display the first 5 rows of the dataset df_iris.head() Out[16]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Exploratory Data Analysis (EDA) Before training our model, it's important to explore the dataset to understand the distribution of the different features and classes.
In [17]: import seaborn as sns import matplotlib.pyplot as plt # Pairplot to visualize the relationships between features sns.pairplot(df_iris, hue='species') plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Splitting the Data We'll split our dataset into a training set and a test set to evaluate the performance of our model. In [18]: from sklearn.model_selection import train_test_split # Split the data X = df_iris.iloc[:, :-1] # Features y = df_iris['species'] # Target variable X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) Training the Decision Tree Classifier With our data split, we can train a decision tree classifier. In [19]: from sklearn.tree import DecisionTreeClassifier # Initialize the classifier dt_classifier = DecisionTreeClassifier(random_state=42) # Train the classifier dt_classifier.fit(X_train, y_train) Out[19]: DecisionTreeClassifier(random_state=42) Visualizing the Decision Tree We can visualize the decision tree using the plot_tree function from sklearn . In [20]: from sklearn.tree import plot_tree # Convert the class names from a NumPy array to a list class_names_list = iris.target_names.tolist() # Visualize the decision tree plt.figure(figsize=(20,10)) plot_tree(dt_classifier, feature_names=iris.feature_names, class_names=class_names_list, filled=True) plt.show()
Evaluating the Classifier Finally, we evaluate the performance of the classifier using the test data. In [21]: from sklearn.metrics import classification_report # Predictions y_pred = dt_classifier.predict(X_test) # Classification report print(classification_report(y_test, y_pred)) precision recall f1-score support
setosa 1.00 1.00 1.00 19 versicolor 1.00 1.00 1.00 13 virginica 1.00 1.00 1.00 13 accuracy 1.00 45 macro avg 1.00 1.00 1.00 45 weighted avg 1.00 1.00 1.00 45 Interpretation The classification report will provide us with the precision, recall, f1-score, and support for each class. The decision tree visualization will show us the decision paths that the model uses to make predictions. Precision: Indicates the accuracy of positive predictions. Recall: Indicates the fraction of positives that were correctly identified. F1-Score: A weighted harmonic mean of precision and recall. Support: The number of actual occurrences of the class in the specified dataset. By analyzing these metrics and the decision tree diagram, we can gain insights into the model's performance and how it makes decisions. In conclusion, the model has shown 100% accuracy, which can be unusual in real- world scenarios and might indicate that the dataset is relatively simple or that the model has overfit to this particular dataset. However, the Iris dataset is known to be a very clean and well-behaved dataset, which makes it suitable for educational purposes but not representative of real-world tasks where data can be noisy and complex. Random Forest Model in Machine Learning Overview A Random Forest is an ensemble learning method used for classification and regression that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the Random Forest is the class selected by most trees. For regression tasks, it is the average prediction of the individual trees. Key Features Ensemble Method : Random Forest combines multiple decision trees to produce a more robust and accurate prediction. Bagging : Each tree in a Random Forest is built from a sample drawn with replacement (bootstrap sample) from the training set. Feature Randomness : When splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best among a random subset of the features. This results in a wide diversity that generally results in a better model. Advantages Accuracy : Random Forests achieve a high level of accuracy in many tasks right out of the box. Prevents Overfitting : Due to the randomness introduced in the model, it is less likely to overfit on the training data. Handles Missing Values : Can handle missing values by imputation. Flexibility : Can be used for both classification and regression tasks. Disadvantages Complexity : A Random Forest model is inherently more complex than a Decision Tree. Resource Intensive : They require more computational resources and are slower to train than a single decision tree. Interpretability : They are not as easy to interpret as decision trees.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
How it Works 1. Bootstrap Sampling : Random Forest starts by selecting random samples from the dataset using bootstrap sampling. 2. Building Trees : It then builds a decision tree for each sample. The trees are grown to the largest extent possible and there is no pruning. 3. Random Feature Selection : During the construction of trees, only a random subset of features is considered for splitting at each node. 4. Aggregation : For classification, each tree votes and the most popular class is chosen as the final result. For regression, the average prediction of all trees is used. Applications Banking : For detecting customers likely to default on loans. Medicine : To identify the correct combination of components in medicine. Stock Market : To predict stock behavior. E-commerce : For predicting whether a customer will like a product recommendation. Random Forests are a powerful tool in the machine learning toolkit but should be used with consideration of their complexity and computational cost. Exercise 4 Predicting Wine Quality with Random Forest Introduction In this exercise, we'll use the Wine Quality dataset to build a Random Forest classifier that predicts the quality of white wines. We'll explore the importance of different physicochemical properties in determining wine quality. Load the Dataset The Wine Quality dataset can be downloaded from the UCI Machine Learning Repository. We'll load the data and inspect the first few rows to understand its structure. In [22]: import pandas as pd # Load the dataset url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/ winequality-white.csv' df_wine = pd.read_csv(url, sep=';') # Display the first 5 rows of the dataset df_wine.head() Out[22]: fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates a 0 7.0 0.27 0.36 20.7 0.045 45.0 170.0 1.0010 3.00 0.45 8 1 6.3 0.30 0.34 1.6 0.049 14.0 132.0 0.9940 3.30 0.49 9 2 8.1 0.28 0.40 6.9 0.050 30.0 97.0 0.9951 3.26 0.44 1 3 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9 4 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates a Exploratory Data Analysis (EDA) Let's perform EDA to visualize the distribution of wine qualities and the relationship between the physicochemical properties and quality. In [23]: import seaborn as sns import matplotlib.pyplot as plt # Visualize the distribution of wine quality ratings sns.countplot(x='quality', data=df_wine) plt.title('Distribution of Wine Quality') plt.show() # Box plots for each feature df_wine.drop('quality', axis=1).plot(kind='box', subplots=True, layout=(3,4), figsize=(20,10), title='Boxplot of physicochemical properties') plt.show()
Data Preprocessing Before we can train our model, we'll need to preprocess the data. This may include scaling the features and handling imbalanced data. In [24]: from sklearn.model_selection import train_test_split
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
from sklearn.preprocessing import StandardScaler # Split the data into features and target variable X = df_wine.drop('quality', axis=1) y = df_wine['quality'] # Standardize the features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Split the data into training and test sets X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42) Training the Random Forest Classifier With our data preprocessed, we can proceed to train a Random Forest classifier. In [25]: from sklearn.ensemble import RandomForestClassifier # Initialize the Random Forest classifier rf_classifier = RandomForestClassifier(random_state=42) # Train the classifier rf_classifier.fit(X_train, y_train) Out[25]: RandomForestClassifier(random_state=42) Evaluating the Classifier After training the model, we'll evaluate its performance on the test set. In [26]: from sklearn.metrics import classification_report # Predict the wine quality for the test set y_pred = rf_classifier.predict(X_test) # Generate a classification report print(classification_report(y_test, y_pred, zero_division=0)) precision recall f1-score support 3 0.00 0.00 0.00 7 4 0.44 0.10 0.16 40 5 0.68 0.71 0.69 426 6 0.66 0.78 0.71 668 7 0.77 0.54 0.64 280 8 0.78 0.37 0.50 49 accuracy 0.68 1470 macro avg 0.55 0.42 0.45 1470 weighted avg 0.68 0.68 0.67 1470 Summary and Conclusion Model Performance The Random Forest classifier achieved an overall accuracy of 68%. This indicates that the classifier correctly predicted the wine quality 68% of the time on the test dataset.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Class-wise Performance The model performed poorly on the extreme quality class 3, likely due to the very small number of samples for this class. For the middle quality classes (4, 5, and 6), the model showed a better performance, with a balance between precision and recall, suggesting that it was able to predict these classes more reliably. The highest quality class (7 and 8) had a high precision but a low recall, indicating that while the predictions made for this class were usually correct, the model often failed to identify this class correctly. Overall, the Random Forest model shows promise for predicting wine quality based on physicochemical properties, but there is room for improvement, especially in predicting the quality of wines at the extremes of the scale. Feature Importances Random Forest provides an easy way to measure the importance of each feature in the prediction. Let's visualize and interpret these importances. In [27]: import numpy as np # Get feature importances feature_importances = rf_classifier.feature_importances_ # Convert the feature importances to a pandas series features = pd.Series(feature_importances, index=df_wine.columns[:-1]) # Plot the importances features.sort_values(ascending=False).plot(kind='bar', title='Feature Importances in the Random Forest Model') plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Interpretation The classification report shows us how well the Random Forest model is performing across the different quality ratings. The feature importances give us insight into which physicochemical properties contribute most to a wine's quality rating according to our model. Support Vector Machine (SVM) Model in Machine Learning Overview Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. SVM performs classification by finding the hyperplane that best divides a dataset into classes. Key Concepts Hyperplane : In SVM, a hyperplane is a decision boundary that separates different classes in the feature space. The goal of SVM is to find the optimal hyperplane that maximizes the margin between classes. Support Vectors : Support vectors are the data points that are closest to the hyperplane and influence its position and orientation. SVM uses these points to maximize the margin of the classifier. Margin : A margin is a gap between the two lines on the closest class points. This is calculated as the perpendicular distance from the line to support vectors or closest points. Advantages Effective in High Dimensional Spaces : SVM works well with high-dimensional data, such as text and genomic data. Memory Efficient : Uses a subset of training points in the decision function (support vectors), so it is also memory efficient.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Versatility : Different Kernel functions can be specified for the decision function. Common kernels are linear, polynomial, RBF, and sigmoid. Disadvantages Not Suitable for Large Data Sets : SVM can be inefficient to train on very large datasets. Sensitive to Noise : A relatively small number of mislabeled examples can dramatically decrease the performance of the algorithm. No Probability Estimates : SVM does not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation. How it Works 1. Choosing a Kernel : The first step is to choose a kernel function. The kernel transforms the input data into a higher dimensional space where a hyperplane can be used to separate classes. 2. Maximizing the Margin : SVM then finds the hyperplane that maximizes the margin between the support vectors across classes. 3. Classification : Once the optimal hyperplane is found, new data can be classified by seeing which side of the hyperplane they fall on. Applications Face Detection : SVMc classify parts of the image as a face and non-face and create a square boundary around the face. Text and Hypertext Categorization : SVMs allow Text and hypertext categorization for both inductive and transductive models. Classification of Images : Use of SVMs provides better search accuracy for image classification. SVMs are a powerful tool, particularly in cases where the complexity of the data requires a high-dimensional feature space. Exercise 5 Handwritten Digit Recognition with Support Vector Machine (SVM) Problem Statement: You are a machine learning engineer at a tech company that is developing an optical character recognition (OCR) software to digitize handwritten documents. Your task is to build a machine learning model that can recognize handwritten digits (0 through 9) from images. Given the importance of accuracy in this classification task, you decide to use a Support Vector Machine (SVM) for its effectiveness on high-dimensional spaces and its ability to use kernel tricks for non-linear classification. Introduction In this exercise, we will use the digits dataset from sklearn to build an SVM classifier that can recognize handwritten digits. SVMs are particularly well-suited for classification of complex but small- or medium-sized datasets. Load and Visualize the Dataset First, we'll load the dataset and visualize some digit images to get a better understanding of the data we're working with. In [28]: from sklearn import datasets import matplotlib.pyplot as plt # Load the dataset digits = datasets.load_digits() # Visualize the first 4 images fig, axes = plt.subplots(1, 4, figsize=(10, 3)) for ax, image, label in zip(axes, digits.images, digits.target): ax.set_axis_off()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest') ax.set_title('Training: %i' % label) plt.show() Preprocessing and Splitting the Data Before training the SVM, we need to split the data into training and testing sets. We will use 75% of the data for training and the remaining 25% for testing our model. In [29]: from sklearn.model_selection import train_test_split # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split( digits.data, digits.target, test_size=0.25, random_state=0 ) Training the SVM Classifier Now, we'll create an instance of an SVM and fit it to our training data. In [30]: from sklearn import svm # Create an SVM classifier svm_classifier = svm.SVC(gamma=0.001) # Train the classifier svm_classifier.fit(X_train, y_train) Out[30]: SVC(gamma=0.001) Optimizing SVM Parameters with Grid Search We will use grid search to find the best parameters for our SVM classifier. In [31]: from sklearn.model_selection import GridSearchCV # Set the parameters by cross-validation param_grid = [ {'C': [1, 10, 100, 1000], 'kernel': ['linear']}, {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']}, ] # Create a classifier with the parameter candidates clf = GridSearchCV(svm.SVC(), param_grid, cv=5) # Train the classifier on training data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
clf.fit(X_train, y_train) Out[31]: GridSearchCV(cv=5, estimator=SVC(), param_grid=[{'C': [1, 10, 100, 1000], 'kernel': ['linear']}, {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']}]) Evaluating the Optimized Classifier After finding the best parameters, we evaluate the performance of our optimized classifier on the test set. In [32]: from sklearn.metrics import classification_report # Predict the labels for the test set y_pred = clf.predict(X_test) # Generate a classification report print(classification_report(y_test, y_pred)) precision recall f1-score support 0 1.00 1.00 1.00 37 1 0.98 1.00 0.99 43 2 1.00 1.00 1.00 44 3 1.00 1.00 1.00 45 4 1.00 1.00 1.00 38 5 0.98 0.98 0.98 48 6 1.00 1.00 1.00 52 7 1.00 1.00 1.00 48 8 1.00 0.98 0.99 48 9 0.98 0.98 0.98 47 accuracy 0.99 450 macro avg 0.99 0.99 0.99 450 weighted avg 0.99 0.99 0.99 450 Visualizing the Confusion Matrix A confusion matrix is a great way to understand the classification accuracy of a classifier. In [33]: from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt # Assuming y_test are the true labels and y_pred are the predicted labels from your classifier # Compute confusion matrix cm = confusion_matrix(y_test, y_pred) # Set up the matplotlib figure plt.figure(figsize=(10, 10)) # Adjust the size as needed # Use matshow to display the confusion matrix plt.matshow(cm, cmap=plt.cm.Blues)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
plt.title('Confusion matrix of the classifier') plt.colorbar() plt.xlabel('Predicted') plt.ylabel('True') # Adding the text labels thresh = cm.max() / 2. for i in range(cm.shape[0]): for j in range(cm.shape[1]): plt.text(j, i, format(cm[i, j], 'd'), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") # Ensure the plot displays fully without cutting off edges plt.tight_layout() # Show the plot plt.show() <Figure size 720x720 with 0 Axes> Conclusion and Discussion In this exercise, we trained an SVM classifier to recognize handwritten digits. We optimized the classifier using grid search and evaluated its performance using a classification report and confusion matrix. The SVM classifier has shown to be effective for this task, but there are considerations to be made regarding its scalability and computational efficiency for larger datasets. Neural Network Model in Machine Learning Overview Neural Networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling, or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text, or time series, must be translated.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Key Components Neurons : Basic units of neural networks, analogous to human brain neurons, which process input data. Weights and Biases : Parameters of the neural network that are adjusted through learning to make accurate predictions. Activation Function : Determines if a neuron should be activated or not, adding non-linearity to the model. Advantages Ability to Model Non-linear Relationships : Neural networks can model complex non-linear relationships between inputs and outputs. Adaptability : They can adjust to changing input; once trained, they can infer unseen relationships from unseen data. Parallel Processing : Neural networks can perform parallel processing, which accelerates their computation. Disadvantages Require a Large Amount of Data : To perform well, neural networks require a large amount of training data. Opaque Nature : Often referred to as "black boxes" because their decision- making process is not transparent. Computationally Intensive : They require significant computational resources, especially for large and deep networks. How it Works 1. Input Layer : Receives the initial data. 2. Hidden Layers : Perform computations and feature extraction. The complexity of the neural network is determined by the number of hidden layers and neurons within them. 3. Output Layer : Produces the final prediction or classification. Training Process Forward Propagation : Data is passed through the network to get an output. Loss Calculation : The difference between the predicted output and the actual output (loss) is calculated. Backpropagation : The loss is propagated back through the network to adjust the weights and biases, using algorithms like gradient descent. Applications Image and Speech Recognition : Neural networks excel in tasks like facial recognition and voice-to-text services. Natural Language Processing : Used in machine translation, sentiment analysis, and other language-related tasks. Predictive Analytics : Employed in forecasting demand, stock market trends, and consumer behavior. Neural Networks have revolutionized various fields of artificial intelligence and continue to be at the forefront of many innovative applications in technology. Exercise 6 Predicting Housing Prices with Neural Networks Problem Statement: You are a data scientist working for a real estate company that is interested in predicting the median house value in various districts of California. Your task is to build a neural network model that can accurately predict housing prices based on a set of features such as the number of rooms, population, and median income. Introduction In this exercise, we will use the California Housing dataset from sklearn to build a Neural Network model for predicting median housing prices. Neural Networks are powerful tools for modeling complex relationships in data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Load the Dataset First, we'll load the dataset and perform any necessary preprocessing to prepare the data for our Neural Network. In [34]: from sklearn.datasets import fetch_california_housing from sklearn.preprocessing import StandardScaler # Load California Housing data housing = fetch_california_housing() # Perform preprocessing if necessary # For instance, scale the features scaler = StandardScaler() housing.data = scaler.fit_transform(housing.data) Preprocessing the Data and Splitting Into Train/Test Sets Before building the neural network, we'll split our dataset into a training set and a test set. In [35]: from sklearn.model_selection import train_test_split # Split the dataset into training and test sets X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42) Building the Neural Network We will use the MLPRegressor class from sklearn.neural_network to create and train our Neural Network model. In [36]: from sklearn.neural_network import MLPRegressor # Create an MLPRegressor model nn_regressor = MLPRegressor(hidden_layer_sizes=(100,), activation='relu', solver='adam', max_iter=500, random_state=42) # Train the model nn_regressor.fit(X_train, y_train) Out[36]: MLPRegressor(max_iter=500, random_state=42) Evaluating the Model After training the model, we'll evaluate its performance on the test data. In [37]: from sklearn.metrics import mean_squared_error import numpy as np # Predict the values for the test set y_pred = nn_regressor.predict(X_test) # Calculate the mean squared error (MSE) of the predictions mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) print(f'Mean Squared Error: {mse}') print(f'Root Mean Squared Error: {rmse}')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Mean Squared Error: 0.29930135699320415 Root Mean Squared Error: 0.5470844148695923 Visualizing Predictions A scatter plot can help visualize how well our predicted values match the true values. In [38]: plt.figure(figsize=(10, 6)) plt.scatter(y_test, y_pred, alpha=0.3) plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], '--r', linewidth=2) plt.xlabel('True Values') plt.ylabel('Predictions') plt.title('True Values vs Predictions') plt.show() Interpretation and Discussion The performance of the Neural Network can be assessed using the Root Mean Squared Error (RMSE). A lower RMSE indicates better performance. The scatter plot provides a visual cue to the accuracy of the predictions. Points that fall along the diagonal represent accurate predictions. To improve the model: We could consider tuning the hyperparameters of the Neural Network, such as the number of hidden layers, the number of neurons in each layer, and the activation function. We might also explore more advanced techniques such as early stopping, regularization methods (like dropout), and different optimization algorithms to
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
prevent overfitting and to speed up training. Additional feature engineering and inclusion of more relevant features could also potentially improve the model's performance. Naive Bayes Model in Machine Learning Overview Naive Bayes is a probabilistic machine learning algorithm based on applying Bayes' theorem with the "naive" assumption of conditional independence between every pair of features given the value of the class variable. Despite its simplicity, Naive Bayes can outperform more sophisticated classification methods. Key Concepts Bayes' Theorem : Provides a way of calculating the posterior probability, ( $P(c| x)$ ), from ( $P(c)$ ), ( $P(x)$ ), and ( $P(x|c)$ ). Conditional Independence : Assumes that the effect of an attribute value on a given class is independent of the values of other attributes. This is a strong assumption but simplifies the computation. Prior Probability : The probability of observing each class in the training dataset. Advantages Efficiency : Naive Bayes is highly scalable and can quickly make predictions. Simplicity : It works well with high-dimensional data and is easy to implement. Performance : Often performs well in cases where the independence assumption holds. Disadvantages Strong Feature Independence Assumption : In real-world scenarios, it is uncommon for features to be completely independent. Data Scarcity : The probability of a particular feature can be zero if it has not been observed in the training data, causing a multiplication by zero in the posterior probability estimation. Functionality : Primarily used for classification problems and not suited for regression. How it Works 1. Model Construction : Calculate the prior probability for each class of the target variable, along with the probability of each attribute belonging to each class. 2. Prediction : Use the model to estimate the probability for a new instance belonging to each class. The class with the highest posterior probability is the outcome of prediction. Training Process Frequency Tables : Calculate the frequencies of different features for each class. Likelihood Tables : Convert these frequencies into likelihoods. Posterior Calculation : For a new instance, calculate the posterior probability for each class and choose the class with the highest probability. Applications Spam Filtering : Classifying emails as spam or not spam based on the presence of certain words. Sentiment Analysis : Determining whether a given text expresses positive, negative, or neutral sentiment. Document Classification : Categorizing news articles into predefined topics. Naive Bayes is a straightforward yet powerful algorithm for predictive modeling and classification tasks.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Exercise 7 Iris Species Classification with Naive Bayes Problem Statement: You are a botanist seeking an automated way to categorize each iris plant into one of three species based on the sizes of their petals and sepals. You decide to use the Naive Bayes classifier, a probabilistic machine learning model that is based on applying Bayes' theorem with the “naive” assumption of independence between every pair of features. Introduction In this exercise, we will use the iris dataset from sklearn to build a Naive Bayes classifier that can identify the species of an iris plant by its features. Naive Bayes is a simple yet powerful algorithm for predictive modeling, especially for text classification. Load the Dataset Let's begin by loading the dataset and visualizing the feature distributions. In [39]: from sklearn.datasets import load_iris import seaborn as sns import pandas as pd # Load the iris dataset iris = load_iris() iris_df = pd.DataFrame(iris.data, columns=iris.feature_names) iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) # Pairplot to visualize the relationships between features sns.pairplot(iris_df, hue='species') Out[39]: <seaborn.axisgrid.PairGrid at 0x7fae8fd5b050>
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Splitting the Data Before we can train our classifier, we need to split our data into a training set and a test set. In [40]: from sklearn.model_selection import train_test_split # Split the dataset into a training set and a test set X_train, X_test, y_train, y_test = train_test_split( iris.data, iris.target, test_size=0.3, random_state=42 ) Training the Naive Bayes Classifier We will use the Gaussian Naive Bayes implementation from sklearn for our classifier. In [41]: from sklearn.naive_bayes import GaussianNB # Create a Gaussian Naive Bayes classifier gnb = GaussianNB() # Train the classifier gnb.fit(X_train, y_train) Out[41]: GaussianNB() Evaluating the Classifier Now that we have trained our classifier, we can make predictions on the test set and evaluate the results. In [42]: from sklearn.metrics import classification_report, confusion_matrix # Predict the labels for the test set y_pred = gnb.predict(X_test) # Generate a classification report print(classification_report(y_test, y_pred, target_names=iris.target_names)) precision recall f1-score support setosa 1.00 1.00 1.00 19 versicolor 1.00 0.92 0.96 13 virginica 0.93 1.00 0.96 13 accuracy 0.98 45 macro avg 0.98 0.97 0.97 45 weighted avg 0.98 0.98 0.98 45 In [43]: from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt # Assuming y_test are the true labels and y_pred are the predicted labels from your classifier # Compute confusion matrix cm = confusion_matrix(y_test, y_pred)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# Set up the matplotlib figure plt.figure(figsize=(8, 6)) # Adjust the size as needed # Use matshow to display the confusion matrix plt.matshow(cm, cmap=plt.cm.Blues) plt.title('Confusion matrix of the classifier') plt.colorbar() plt.xlabel('Predicted') plt.ylabel('True') # Adding the text labels thresh = cm.max() / 2. for i in range(cm.shape[0]): for j in range(cm.shape[1]): plt.text(j, i, format(cm[i, j], 'd'), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") # Ensure the plot displays fully without cutting off edges plt.tight_layout() # Show the plot plt.show() <Figure size 576x432 with 0 Axes> Interpretation The confusion matrix and classification report provide a detailed breakdown of the classifier's performance. The diagonal elements of the confusion matrix represent the number of points for which the predicted label is equal to the true label, while off- diagonal elements are those that are mislabeled by the classifier. The higher the diagonal values of the confusion matrix, the better, indicating many correct predictions.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The classification report shows key metrics in classification, such as precision (the ratio of correctly predicted positive observations to the total predicted positives), recall (the ratio of correctly predicted positive observations to the all observations in actual class), and f1-score (the weighted average of Precision and Recall). Overall, Naive Bayes usually performs well on multi-class classification problems, particularly if the assumption of independence holds. For the Iris dataset, this classifier is expected to do a good job in distinguishing between the different species based on the given features. Principal Component Analysis (PCA) Overview Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. It's a tool used for exploratory data analysis and for making predictive models. PCA is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data's variation as possible. Key Concepts Variance : PCA seeks to capture the total variance in the data. Covariance Matrix : Represents the covariance (a measure of how much two random variables vary together) between each pair of elements in the data. Eigenvalues and Eigenvectors : PCA involves the calculation of the eigenvalue and eigenvector of the covariance matrix to identify the principal components. Advantages Noise Reduction : By keeping only the significant principal components, noise in the data can be reduced. Visualization : Reducing the number of dimensions makes the visualization of complex data possible. Efficiency : Lower-dimensional data reduces computational costs and improves algorithmic efficiency. Disadvantages Data Loss : Some information is lost when dimensions are dropped, which can sometimes be important. Interpretability : Principal components do not have any real meaning as they are a combination of the original features. How it Works 1. Standardization : The data is standardized to have a mean of zero and a standard deviation of one. 2. Covariance Matrix Computation : The covariance matrix is computed to understand how the variables of the input data are varying from the mean with respect to each other. 3. Compute Eigenvalues and Eigenvectors : The eigenvalues and eigenvectors of the covariance matrix are computed. The eigenvectors determine the directions of the new feature space, and the eigenvalues determine their magnitude. 4. Sort Eigenvalues : The eigenvalues are sorted in descending order to rank the corresponding eigenvectors. 5. Projection : The data points are projected onto the new feature space. Applications Face Recognition : PCA is used to reduce the dimensionality of facial images to perform recognition efficiently. Genomics : PCA reduces the dimensionality of genetic data, allowing for the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
identification of genetic markers. Market Research : PCA can simplify complex customer satisfaction surveys into just a few principal components. PCA is a powerful technique for feature extraction and dimensionality reduction, often used as a step in data preprocessing for machine learning models. Exercise 8 Dimensionality Reduction with PCA on Wine Dataset Problem Statement: You are a data analyst at a wine distributor trying to simplify the complexities in your vast wine dataset. The goal is to reduce the number of features to make analysis easier, without losing much information. You decide to use Principal Component Analysis (PCA), a technique that reduces the dimensionality of the data by transforming it into a new set of variables, the principal components, which are orthogonal to each other and are a linear combination of the original variables. Introduction In this exercise, we will apply Principal Component Analysis (PCA) to the wine dataset from sklearn to reduce its dimensions. Our aim is to simplify the dataset without losing significant information and to visualize it in two dimensions. Loading the Dataset Let's start by loading the dataset and standardizing the features, which is an important preprocessing step before applying PCA. In [44]: from sklearn.datasets import load_wine from sklearn.preprocessing import StandardScaler # Load the wine dataset wine = load_wine() wine_df = pd.DataFrame(wine.data, columns=wine.feature_names) wine_df['target'] = wine.target # Standardize the features scaler = StandardScaler() wine_std = scaler.fit_transform(wine.data) Applying PCA We will now apply PCA from sklearn to reduce the dataset to two principal components and then visualize the results. In [45]: from sklearn.decomposition import PCA # Apply PCA and reduce the dataset to 2 dimensions pca = PCA(n_components=2) wine_pca = pca.fit_transform(wine_std) # Create a DataFrame with the PCA results wine_pca_df = pd.DataFrame(data=wine_pca, columns=['Principal Component 1', 'Principal Component 2']) wine_pca_df['target'] = wine.target Visualizing the PCA Results Now that we have reduced the dimensions of the dataset, we can visualize it in a 2D plot. In [46]: # Plot the PCA-transformed version of the wine dataset sns.scatterplot(x='Principal Component 1', y='Principal Component 2', hue='target',
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
data=wine_pca_df, palette='Set1') plt.title('PCA on Wine Dataset') plt.show() Classifier Comparison We will train a simple classifier using both the original and the PCA-reduced datasets to compare the performance. In [47]: from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score # Split the original and PCA-reduced datasets into training and test sets X_train_orig, X_test_orig, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3, random_state=42) X_train_pca, X_test_pca, _, _ = train_test_split(wine_pca, wine.target, test_size=0.3, random_state=42) # Train a classifier on the original dataset rf_orig = RandomForestClassifier(random_state=42) rf_orig.fit(X_train_orig, y_train) y_pred_orig = rf_orig.predict(X_test_orig) # Train a classifier on the PCA-reduced dataset rf_pca = RandomForestClassifier(random_state=42) rf_pca.fit(X_train_pca, y_train) y_pred_pca = rf_pca.predict(X_test_pca) # Compare accuracy accuracy_orig = accuracy_score(y_test, y_pred_orig) accuracy_pca = accuracy_score(y_test, y_pred_pca)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
print(f'Accuracy with original features: {accuracy_orig:.2f}') print(f'Accuracy with PCA features: {accuracy_pca:.2f}') Accuracy with original features: 1.00 Accuracy with PCA features: 0.98 Interpretation and Discussion The accuracy scores reveal the impact of dimensionality reduction on the classification performance. Although there might be a slight decrease in accuracy when using the PCA-reduced dataset, the simplification of the data can significantly speed up training time and help with visualizing high-dimensional data. PCA is most effective in cases where there are high correlations among the features or when the feature space is too large to work with. For the wine dataset, PCA allowed us to reduce the dimensionality from 13 features to just 2 principal components, which could be visualized and analyzed more easily. In practice, the choice of using PCA should consider the trade-off between simplicity and performance. Further investigation might include exploring the cumulative variance explained by the principal components to choose an appropriate number of components that balances complexity with sufficient information retention. Gradient Boosting Machines (GBM) Overview Gradient Boosting Machines (GBM) are a group of machine learning algorithms that combine multiple weak learning models together to create a strong predictive model. Decision trees are usually used when doing gradient boosting. GBM builds the model in a stage-wise fashion like other boosting methods do, but it generalizes them by allowing optimization of an arbitrary differentiable loss function. Key Concepts Boosting : An ensemble technique that attempts to create a strong classifier from a number of weak classifiers. Weak Learners : Base models that are often simple with poor performance, but contribute to the overall strong ensemble model. Gradient Descent : Used to minimize the loss when adding trees. Advantages Predictive Power : GBMs often provide predictive accuracy that cannot be trumped by other algorithms. Flexibility : Can be used for both regression and classification problems. Handling of Non-linear Relationships : Due to the nature of boosting, it can handle non-linear relationships in the data. Disadvantages Overfitting : Without proper tuning, the model can overfit on the training data. Computationally Intensive : GBMs are more sensitive to overfitting if the data is noisy. Difficult to Interpret : As an ensemble method, it can be hard to interpret compared to simpler models. How it Works 1. Initialize with a Base Model : GBM starts with a simple model, such as a decision tree, and calculates the error. 2. Add Weak Models : Add weak models sequentially, each correcting the errors of the combined ensemble of all previous models. 3. Gradient Descent : Use gradient descent to minimize the loss when adding trees. 4. Combine Models : The final model combines the predictions from all individual trees to produce the final prediction.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Training Process Loss Function : Define a loss function (like mean squared error for regression) and compute its gradient. Fit a New Model : Fit a new model to the gradients. Add to Ensemble : Add the new model to the ensemble with an optimal weight. Iterate : Repeat the process until a stopping criterion is met, like a maximum number of trees or no further improvement. Applications Search Engines : Ranking pages based on a variety of features. Ecology : Modeling species' habitats based on environmental conditions. Finance : Credit scoring and risk management. GBMs are a powerful ensemble learning method, particularly useful for complex datasets where the relationship between features and the target variable is sophisticated. Exercise 9 Predicting the diabete using GBM Model Problem Statement The Pima Indians Diabetes Database includes data on female patients of Pima Indian heritage, 21 years and older. The objective is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. The dataset consists of several medical predictor variables and one target variable, Outcome . Introduction The task is to predict the onset of diabetes based on diagnostic measures from the Pima Indians Diabetes Database. We aim to build a GBM model that can accurately predict whether a patient has diabetes. First, we need to import the necessary libraries and load the data. In [48]: # Import necessary libraries import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import GradientBoostingClassifier from sklearn.metrics import accuracy_score, classification_report import matplotlib.pyplot as plt Loading the Dataset We will use the Pima Indians Diabetes Database for this exercise. Let's load the data and take a quick look at the first few rows. In [49]: # Load the dataset url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians- diabetes.data.csv" columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'] data = pd.read_csv(url, names=columns) # Display the first 5 rows of the dataset data.head() Out[49]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre 0 6 148 72 35 0 33.6 0.627
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre 1 1 85 66 29 0 26.6 0.351 2 8 183 64 0 0 23.3 0.672 3 1 89 66 23 94 28.1 0.167 4 0 137 40 35 168 43.1 2.288 Data Preprocessing Before we can feed the data into our GBM model, we need to split it into features (X) and the target variable (y), and then split these into training and testing sets. In [50]: # Split the data into features and target X = data.iloc[:, :-1] y = data['Outcome'] # Split the dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) Building the GBM Model Now, we will create our GBM model using sklearn 's GradientBoostingClassifier , train it on our training data, and then make predictions on our test set. In [51]: # Create Gradient Boosting Classifier gbm = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=42) # Train the model using the training sets gbm.fit(X_train, y_train) # Predict the response for the test dataset y_pred = gbm.predict(X_test) Model Evaluation After making predictions, we will evaluate our model's performance using the accuracy score and a classification report. In [52]: # Evaluate the model's accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}") # Print classification report print(classification_report(y_test, y_pred)) Accuracy: 0.73 precision recall f1-score support 0 0.79 0.78 0.79 99 1 0.61 0.64 0.62 55 accuracy 0.73 154
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
macro avg 0.70 0.71 0.71 154 weighted avg 0.73 0.73 0.73 154 Feature Importance Visualization Understanding which features are most important to our model can provide insights into the dataset. Let's visualize the feature importances as determined by our GBM model. In [53]: # Get feature importances feature_importance = gbm.feature_importances_ # Plot feature importance plt.barh(columns[:-1], feature_importance) plt.xlabel('Feature Importance Score') plt.ylabel('Features') plt.title('Feature Importance') plt.show() Interpretation of Results The accuracy score tells us the proportion of correct predictions. The classification report provides detailed metrics such as precision, recall, and f1-score for each class. The feature importance plot shows which features the model found most useful in making predictions. These insights can help us understand the model's behavior and the underlying data. Convolutional Neural Network (CNN) Overview Convolutional Neural Networks (CNNs) are a class of deep neural networks, most commonly applied to analyzing visual imagery. They are particularly powerful for tasks such as image recognition, classification, and detection. CNNs are designed to automatically and adaptively learn spatial hierarchies of features from input images. Key Components Convolutional Layers : Apply a convolution operation to the input, passing the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
result to the next layer. This helps the network to focus on small regions of the input image. Activation Function : Typically ReLU (Rectified Linear Unit) is used to introduce non-linear properties to the network. Pooling Layers : Reduce the dimensions of the data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Fully Connected Layers : Neurons in a fully connected layer have full connections to all activations in the previous layer, as seen in regular Neural Networks. Dropout : A regularization technique where randomly selected neurons are ignored during training to prevent overfitting. Advantages Feature Learning : CNNs can automatically learn and improve from experience without being explicitly programmed to do so. Efficiency : Reduces the number of parameters, making the network less computationally intensive by sharing weights. Translation Invariance : Once learned, features can be recognized anywhere in the image. Disadvantages High Computational Cost : Training can be computationally expensive due to the complexity of the models. Overfitting : Without proper regularization, CNNs can easily overfit, especially when the number of images is small. Requirement for Large Datasets : CNNs need to learn from a large set of labeled images to perform well. How it Works 1. Input Layer : Takes the raw pixel values of the image. 2. Convolutional Layer : Applies various filters to create a feature map. 3. Activation Layer : Applies the non-linear ReLU function to introduce non- linearity. 4. Pooling Layer : Performs down-sampling to reduce dimensionality. 5. Fully Connected Layer : Computes the class scores, resulting in the final classification. Training Process Forward Propagation : Pass the input through the layers to get the prediction. Backpropagation : Calculate the error and propagate it back through the network to update the weights. Optimization : Use optimization algorithms like SGD (Stochastic Gradient Descent) to minimize the loss function. Applications Image Classification : Assigning a label to an image from a fixed set of categories. Object Detection : Identifying objects within an image and drawing a bounding box around them. Face Recognition : Identifying and verifying a person from a digital image by comparing and analyzing patterns. CNNs have revolutionized the field of computer vision, achieving remarkable performance in many visual tasks. Exercise 10 Understanding CNNs with PyTorch on Fashion-MNIST In this exercise, we will explore Convolutional Neural Networks (CNNs) using the PyTorch library. CNNs have revolutionized the field of computer vision by providing a mechanism to effectively learn spatial hierarchies of features from image data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Problem Statement The Fashion-MNIST dataset is a collection of article images from Zalando meant for benchmarking machine learning models. It is more challenging compared to the regular MNIST digits dataset. Our goal is to build a CNN that can accurately classify these images into their respective fashion categories. Dataset The Fashion-MNIST dataset includes 10 categories: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, and Ankle boot. Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels. The images are grayscale, with each pixel ranging between 0 and 255. Let's start by loading the dataset and visualizing some of the images. In [1]: #!conda install -c pytorch pytorch In [2]: #!conda install -c pytorch torchvision In [3]: # Import necessary libraries import torch import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import numpy as np import warnings # Settings the warnings to be ignored warnings.filterwarnings('ignore') # Set device to GPU if available device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Load the Fashion-MNIST dataset transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True) testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False) # Define the classes classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot') # Function to show images def imshow(img): img = img / 2 + 0.5 # unnormalize npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0))) plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# Get some random training images dataiter = iter(trainloader) images, labels = next(dataiter) # Show images imshow(torchvision.utils.make_grid(images)) # Print labels print(' '.join(f'{classes[labels[j]]:11s}' for j in range(4))) Pullover Sandal Pullover Trouser Defining the CNN Model We will now define our CNN architecture. We'll use two convolutional layers followed by max pooling, and then fully connected layers. In [4]: import torch.nn as nn import torch.nn.functional as F # Define the CNN architecture class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 32, 3, padding=1) self.conv2 = nn.Conv2d(32, 64, 3, padding=1) self.pool = nn.MaxPool2d(2, 2) self.fc1 = nn.Linear(64 * 7 * 7, 600) self.fc2 = nn.Linear(600, 120) self.fc3 = nn.Linear(120, 10) self.dropout = nn.Dropout(0.25) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 64 * 7 * 7) # flatten image input x = F.relu(self.fc1(self.dropout(x))) x = F.relu(self.fc2(self.dropout(x))) x = self.fc3(x) return x net = Net().to(device) Training the CNN Model With our model defined, we will now train it using the training data. In [5]: import torch.optim as optim
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
import warnings # Settings the warnings to be ignored warnings.filterwarnings('ignore') # Define a Loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.001) # Lists for saving training and validation history train_losses = [] train_accuracies = [] val_losses = [] val_accuracies = [] # Number of epochs num_epochs = 5 # Reduced for quicker training # Reduce the dataset size for quicker training (use a fraction of the dataset) subset_indices_train = torch.utils.data.SubsetRandomSampler(range(10000)) # 10,000 random samples for training subset_indices_test = torch.utils.data.SubsetRandomSampler(range(2000)) # 2,000 random samples for testing trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=False, num_workers=2, sampler=subset_indices_train) testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=2, sampler=subset_indices_test) for epoch in range(num_epochs): # Training net.train() running_loss = 0.0 correct = 0 total = 0 for i, data in enumerate(trainloader, 0): inputs, labels = data[0].to(device), data[1].to(device) optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() train_losses.append(running_loss / len(trainloader)) train_accuracies.append(100 * correct / total) # Validation net.eval() val_loss = 0.0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
correct = 0 total = 0 with torch.no_grad(): for data in testloader: images, labels = data[0].to(device), data[1].to(device) outputs = net(images) loss = criterion(outputs, labels) val_loss += loss.item() _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() val_losses.append(val_loss / len(testloader)) val_accuracies.append(100 * correct / total) print(f'Epoch {epoch+1}, Train Loss: {train_losses[-1]:.3f}, Train Accuracy: {train_accuracies[-1]:.2f}, Val Loss: {val_losses[-1]:.3f}, Val Accuracy: {val_accuracies[-1]:.2f}') Epoch 1, Train Loss: 0.761, Train Accuracy: 71.50, Val Loss: 0.491, Val Accuracy: 81.65 Epoch 2, Train Loss: 0.466, Train Accuracy: 83.05, Val Loss: 0.420, Val Accuracy: 84.55 Epoch 3, Train Loss: 0.395, Train Accuracy: 85.49, Val Loss: 0.401, Val Accuracy: 85.20 Epoch 4, Train Loss: 0.357, Train Accuracy: 87.22, Val Loss: 0.365, Val Accuracy: 87.00 Epoch 5, Train Loss: 0.316, Train Accuracy: 88.26, Val Loss: 0.332, Val Accuracy: 88.10 Visualizing the Training History Let's plot the training and validation accuracy and loss to visualize the learning process over the epochs. In [6]: import matplotlib.pyplot as plt import numpy as np plt.figure(figsize=(12, 4)) # Plotting training and validation accuracy plt.subplot(1, 2, 1) plt.plot(range(1, len(train_accuracies) + 1), train_accuracies, label='Train Accuracy') plt.plot(range(1, len(val_accuracies) + 1), val_accuracies, label='Validation Accuracy') plt.title('Accuracy over Epochs') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.legend() plt.xticks(np.arange(1, len(train_accuracies) + 1, step=1)) # Set x-axis ticks to integer values # Plotting training and validation loss plt.subplot(1, 2, 2) plt.plot(range(1, len(train_losses) + 1), train_losses, label='Train Loss') plt.plot(range(1, len(val_losses) + 1), val_losses, label='Validation Loss') plt.title('Loss over Epochs') plt.xlabel('Epoch') plt.ylabel('Loss') plt.legend() plt.xticks(np.arange(1, len(train_losses) + 1, step=1)) # Set x-axis ticks to integer
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
values plt.show() Visualizing the Results Let's visualize some test images along with their predicted and true labels to see how our model performs. In [7]: import warnings # Settings the warnings to be ignored warnings.filterwarnings('ignore') # Updated imshow function for displaying grayscale images def imshow(img): # img is a torch tensor, so we need to change the order of dimensions to (HxWxC) npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0)), cmap='gray') # Function to visualize images and compare true labels with model predictions def visualize_model_performance(net, testloader, classes, device, num_images=10): net.eval() # Set the model to evaluation mode images_so_far = 0 fig = plt.figure(figsize=(25, 10)) # Adjust the figure size as needed with torch.no_grad(): for i, data in enumerate(testloader): inputs, labels = data[0].to(device), data[1].to(device) outputs = net(inputs) _, predictions = torch.max(outputs, 1) for j in range(inputs.size()[0]):
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
images_so_far += 1 ax = plt.subplot(num_images//5, 5, images_so_far) ax.axis('off') # Add a border around the images for spine in ax.spines.values(): spine.set_edgecolor('black') spine.set_linewidth(2) imshow(inputs.cpu().data[j]) # Set the title for the true label true_label = f'{classes[labels[j]]}' pred_label = f'Predicted: {classes[predictions[j]]}' pred_color = 'red' if predictions[j] != labels[j] else 'green' # Position the text below the image and increase the font size plt.text(0.5, -0.05, true_label, ha='center', transform=ax.transAxes, fontsize=20, verticalalignment='top', weight='bold') plt.text(0.5, -0.15, pred_label, ha='center', color=pred_color, transform=ax.transAxes, fontsize=20, verticalalignment='top') if images_so_far == num_images: plt.tight_layout(pad=3.0) return # Call the function to visualize the model performance visualize_model_performance(net, testloader, classes, device, num_images=10)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Interpretation of Results The plots provide a visual representation of the learning process. Ideally, both training
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
and validation accuracy should increase over time, while the loss decreases. If the validation loss begins to increase while the training loss continues to decrease, it may indicate overfitting. In such cases, techniques like adding dropout layers, regularization, or more data for training could be considered to improve the model's generalization. By understanding the dynamics of training and validation metrics, we can make informed decisions about how to adjust our model for better performance. This exercise is a stepping stone towards mastering CNNs and applying them to solve complex real-world problems using PyTorch. Summary Machine learning (ML) and deep learning, especially CNNs, have revolutionized how we interact with data and technology, allowing us to make sense of vast amounts of information and create innovative solutions to complex problems across many industries. However, as the field continues to evolve, we must delve into advanced topics, such as transfer learning and ethical AI, to ensure that the benefits of these powerful technologies are accessible to all and used to promote a more sustainable, healthy, and equitable world. The real-world impact of ML is profound, from enhancing medical diagnostics to combating climate change, breaking down barriers for people with disabilities, and improving education and accessibility. Revised Date: November 5, 2023 In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help