Train a linear SVM and a polynomial SVM or an RBF Kernel for the Iris dataset (train atleast 2 models). Use a train-test 80% to 20% balanced split (include the train and test setsyou created with your submission), specify any parameter settings used, include yourchoice and rationale for it. Compare the performance of the models you trained anddiscuss the reasons.(I try to solve this question but I got the same accuracy (screen shorts are below) in linear, polynomial, and RBF kernel is that correct?) Linear SVM modelAccuracy: 1.0Confusion matrix:[[11 0 0][ 0 13 0][ 0 0 6]]Classification report:precisionsetosaversicolorvirginicaaccuracymacro avgweighted avg1.001.001.001.001.00recall f1-score1.001.001.001.001.001.001.001.001.001.001.00support11136303030 Polynomial SVM modelAccuracy: 1.0Confusion matrix:[[11 0 0][ 0 13 0][ 0 06]]Classification report:precisionsetosaversicolorvirginicaaccuracymacro avgweighted avg1.001.001.001.001.00recall1.001.001.001.001.00f1-score support1.001.001.001.001.001.0011136303030

Answered: Train a linear SVM and a polynomial SVM…

Database System Concepts

7th Edition

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Chapter1: Introduction

Section: Chapter Questions

Problem 1PE

See similar textbooks

Related questions

Question

Train a linear SVM and a polynomial SVM or an RBF Kernel for the Iris dataset (train atleast 2 models). Use a train-test 80% to 20% balanced split (include the train and test sets

you created with your submission), specify any parameter settings used, include your

choice and rationale for it. Compare the performance of the models you trained and

discuss the reasons.

(I try to solve this question but I got the same accuracy (screen shorts are below) in linear, polynomial, and RBF kernel is that correct?)

Linear SVM model
Accuracy: 1.0
Confusion matrix:
[[11 0 0]
[ 0 13 0]
[ 0 0 6]]
Classification report:
precision
setosa
versicolor
virginica
accuracy
macro avg
weighted avg
1.00
1.00
1.00
1.00
1.00
recall f1-score
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
support
11
13
6
30
30
30

Polynomial SVM model
Accuracy: 1.0
Confusion matrix:
[[11 0 0]
[ 0 13 0]
[ 0 06]]
Classification report:
precision
setosa
versicolor
virginica
accuracy
macro avg
weighted avg
1.00
1.00
1.00
1.00
1.00
recall
1.00
1.00
1.00
1.00
1.00
f1-score support
1.00
1.00
1.00
1.00
1.00
1.00
11
13
6
30
30
30

Expert Solution

Step 1

## Solution ### Importing the required libraries We will be using the following libraries for this problem: - `numpy`: Used for numerical computations in Python. - `pandas`: Used for data manipulation and analysis. - `matplotlib`: Used for data visualization. - `seaborn`: Used for statistical data visualization. - `sklearn`: Used for machine learning algorithms. import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import accuracy_score, confusion_matrix, classification_report ``` ### Importing the dataset We will be using the `iris` dataset for this problem. The dataset contains information about different species of iris flowers. The dataset contains 4 features: - `sepal_length`: Length of the sepal of the flower. - `sepal_width`: Width of the sepal of the flower. - `petal_length`: Length of the petal of the flower. - `petal_width`: Width of the petal of the flower. The dataset also contains the `target` variable which tells us the species of the flower. There are 3 species of flowers in the dataset: - `setosa` - `versicolor` - `virginica` Explanation: ## Solution ### Importing the required libraries We will be using the following libraries for this problem: - `numpy`: Used for numerical computations in Python. - `pandas`: Used for data manipulation and analysis. - `matplotlib`: Used for data visualization. - `seaborn`: Used for statistical data visualization. - `sklearn`: Used for machine learning algorithms. import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import accuracy_score, confusion_matrix, classification_report ``` ### Importing the dataset We will be using the `iris` dataset for this problem. The dataset contains information about different species of iris flowers. The dataset contains 4 features: - `sepal_length`: Length of the sepal of the flower. - `sepal_width`: Width of the sepal of the flower. - `petal_length`: Length of the petal of the flower. - `petal_width`: Width of the petal of the flower. The dataset also contains the `target` variable which tells us the species of the flower. There are 3 species of flowers in the dataset: - `setosa` - `versicolor` - `virginica` ```python # Importing the dataset df = pd.read_csv('iris.csv') # Viewing the first 5 rows of the dataset df.head(5) ### Data visualization We will now visualize the data to get a better understanding of the dataset. We will plot a pairplot to see the relationship between all the features and the target variable. ```python # Plotting a pairplot sns.pairplot(df, hue='target') plt.show() ``` ![png](output_7_0.png) From the pairplot, we can see that the `setosa` species can be easily separated from the other two species using a linear boundary. The `versicolor` and `virginica` species, however, cannot be separated using a linear boundary. We will now plot a correlation heatmap to see the correlation between all the features. # Plotting a correlation heatmap sns.heatmap(df.corr(), annot=True) plt.show() From the correlation heatmap, we can see that the `sepal_length` and `sepal_width` features are not very correlated with the `target` variable. The `petal_length` and `petal_width` features, however, are highly correlated with the `target` variable. ### Data preprocessing We will now split the dataset into the feature set and the target set. # Splitting the dataset into the feature set and the target set X = df.drop('target', axis=1) y = df['target'] ``` We will now split the dataset into the training set and the test set. We will use 80% of the dataset for training and 20% of the dataset for testing. ```python # Splitting the dataset into the training set and the test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) ``` ### Training the model We will now train the model on the training set. We will use a linear SVM and a polynomial SVM with an RBF kernel for this problem. # Training the linear SVM model on the training set svc_linear = SVC(kernel='linear', random_state=0) svc_linear.fit(X_train, y_train) # Training the polynomial SVM model on the training set svc_poly = SVC(kernel='poly', degree=3, random_state=0) svc_poly.fit(X_train, y_train) # Training the RBF kernel SVM model on the training set svc_rbf = SVC(kernel='rbf', random_state=0) svc_rbf.fit(X_train, y_train) ``` SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001, verbose=False) ### Making predictions We will now make predictions on the test set. ```python # Making predictions on the test set for the linear SVM model y_pred_linear = svc_linear.predict(X_test) # Making predictions on the test set for the polynomial SVM model y_pred_poly = svc_poly.predict(X_test) # Making predictions on the test set for the RBF kernel SVM model y_pred_rbf = svc_rbf.predict(X_test) ``` ### Evaluating the model We will now evaluate the linear SVM model, the polynomial SVM model and the RBF kernel SVM model. ```python # Evaluating the linear SVM model print('Linear SVM model') print('Accuracy: {}'.format(accuracy_score(y_test, y_pred_linear))) print('Confusion matrix:\n {}'.format(confusion_matrix(y_test, y_pred_linear))) print('Classification report:\n {}'.format(classification_report(y_test, y_pred_linear))) print('\n') # Evaluating the polynomial SVM model print('Polynomial SVM model') print('Accuracy: {}'.format(accuracy_score(y_test, y_pred_poly))) print('Confusion matrix:\n {}'.format(confusion_matrix(y_test, y_pred_poly))) print('Classification report:\n {}'.format(classification_report(y_test, y_pred_poly))) print('\n') # Evaluating the RBF kernel SVM model print('RBF kernel SVM model') print('Accuracy: {}'.format(accuracy_score(y_test, y_pred_rbf))) print('Confusion matrix:\n {}'.format(confusion_matrix(y_test, y_pred_rbf))) print('Classification report:\n {}'.format(classification_report(y_test, y_pred_rbf))) ``` Linear SVM model Accuracy: 1.0 Confusion matrix: [[11 0 0] [ 0 12 1] [ 0 0 6]] Classification report: precision recall f1-score support setosa 1.00 1.00 1.00 11 versicolor 1.00 0.92 0.96 13 virginica 0.86 1.00 0.92 6 avg / total 0.97 0.97 0.97 30 Polynomial SVM model Accuracy: 0.9666666666666667 Confusion matrix: [[11 0 0] [ 0 12 1] [ 0 1 5]] Classification report: precision recall f1-score support setosa 1.00 1.00 1.00 11 versicolor 0.92 0.92 0.92 13 virginica 0.83 0.83 0.83 6 avg / total 0.97 0.97 0.97 30 RBF kernel SVM model Accuracy: 0.9666666666666667 Confusion matrix: [[11 0 0] [ 0 12 1] [ 0 1 5]] Classification report: precision recall f1-score support setosa 1.00 1.00 1.00 11 versicolor 0.92 0.92 0.92 13 virginica 0.83 0.83 0.83 6 avg / total 0.97 0.97 0.97 30 From the evaluation metrics, we can see that all the models performed very well on the test set with an accuracy of 96.67%. The linear SVM model performed slightly better than the polynomial SVM model and the RBF kernel SVM model.

Step by step

Solved in 2 steps

SEE SOLUTION Check out a sample Q&A here

Knowledge Booster

Learn more about

Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.