Train a linear SVM and a polynomial SVM or an RBF Kernel for the Iris dataset (train atleast 2 models). Use a train-test 80% to 20% balanced split (include the train and test sets you created with your submission), specify any parameter settings used, include your choice and rationale for it. Compare the performance of the models you trained and

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

Train a linear SVM and a polynomial SVM or an RBF Kernel for the Iris dataset (train atleast 2 models). Use a train-test 80% to 20% balanced split (include the train and test sets

you created with your submission), specify any parameter settings used, include your

choice and rationale for it. Compare the performance of the models you trained and

discuss the reasons.

(I try to solve this question but I got the same accuracy (screen shorts are below) in linear, polynomial, and RBF kernel is that correct?)

 

Linear SVM model
Accuracy: 1.0
Confusion matrix:
[[11 0 0]
[ 0 13 0]
[ 0 0 6]]
Classification report:
precision
setosa
versicolor
virginica
accuracy
macro avg
weighted avg
1.00
1.00
1.00
1.00
1.00
recall f1-score
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
support
11
13
6
30
30
30
Transcribed Image Text:Linear SVM model Accuracy: 1.0 Confusion matrix: [[11 0 0] [ 0 13 0] [ 0 0 6]] Classification report: precision setosa versicolor virginica accuracy macro avg weighted avg 1.00 1.00 1.00 1.00 1.00 recall f1-score 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 support 11 13 6 30 30 30
Polynomial SVM model
Accuracy: 1.0
Confusion matrix:
[[11 0 0]
[ 0 13 0]
[ 0 06]]
Classification report:
precision
setosa
versicolor
virginica
accuracy
macro avg
weighted avg
1.00
1.00
1.00
1.00
1.00
recall
1.00
1.00
1.00
1.00
1.00
f1-score support
1.00
1.00
1.00
1.00
1.00
1.00
11
13
6
30
30
30
Transcribed Image Text:Polynomial SVM model Accuracy: 1.0 Confusion matrix: [[11 0 0] [ 0 13 0] [ 0 06]] Classification report: precision setosa versicolor virginica accuracy macro avg weighted avg 1.00 1.00 1.00 1.00 1.00 recall 1.00 1.00 1.00 1.00 1.00 f1-score support 1.00 1.00 1.00 1.00 1.00 1.00 11 13 6 30 30 30
Expert Solution
Step 1

## Solution ### Importing the required libraries We will be using the following libraries for this problem: - `numpy`: Used for numerical computations in Python. - `pandas`: Used for data manipulation and analysis. - `matplotlib`: Used for data visualization. - `seaborn`: Used for statistical data visualization. - `sklearn`: Used for machine learning algorithms. import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import accuracy_score, confusion_matrix, classification_report ``` ### Importing the dataset We will be using the `iris` dataset for this problem. The dataset contains information about different species of iris flowers. The dataset contains 4 features: - `sepal_length`: Length of the sepal of the flower. - `sepal_width`: Width of the sepal of the flower. - `petal_length`: Length of the petal of the flower. - `petal_width`: Width of the petal of the flower. The dataset also contains the `target` variable which tells us the species of the flower. There are 3 species of flowers in the dataset: - `setosa` - `versicolor` - `virginica` Explanation: ## Solution ### Importing the required libraries We will be using the following libraries for this problem: - `numpy`: Used for numerical computations in Python. - `pandas`: Used for data manipulation and analysis. - `matplotlib`: Used for data visualization. - `seaborn`: Used for statistical data visualization. - `sklearn`: Used for machine learning algorithms. import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import accuracy_score, confusion_matrix, classification_report ``` ### Importing the dataset We will be using the `iris` dataset for this problem. The dataset contains information about different species of iris flowers. The dataset contains 4 features: - `sepal_length`: Length of the sepal of the flower. - `sepal_width`: Width of the sepal of the flower. - `petal_length`: Length of the petal of the flower. - `petal_width`: Width of the petal of the flower. The dataset also contains the `target` variable which tells us the species of the flower. There are 3 species of flowers in the dataset: - `setosa` - `versicolor` - `virginica` ```python # Importing the dataset df = pd.read_csv('iris.csv') # Viewing the first 5 rows of the dataset df.head(5) ### Data visualization We will now visualize the data to get a better understanding of the dataset. We will plot a pairplot to see the relationship between all the features and the target variable. ```python # Plotting a pairplot sns.pairplot(df, hue='target') plt.show() ``` ![png](output_7_0.png) From the pairplot, we can see that the `setosa` species can be easily separated from the other two species using a linear boundary. The `versicolor` and `virginica` species, however, cannot be separated using a linear boundary. We will now plot a correlation heatmap to see the correlation between all the features. # Plotting a correlation heatmap sns.heatmap(df.corr(), annot=True) plt.show() From the correlation heatmap, we can see that the `sepal_length` and `sepal_width` features are not very correlated with the `target` variable. The `petal_length` and `petal_width` features, however, are highly correlated with the `target` variable. ### Data preprocessing We will now split the dataset into the feature set and the target set. # Splitting the dataset into the feature set and the target set X = df.drop('target', axis=1) y = df['target'] ``` We will now split the dataset into the training set and the test set. We will use 80% of the dataset for training and 20% of the dataset for testing. ```python # Splitting the dataset into the training set and the test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) ``` ### Training the model We will now train the model on the training set. We will use a linear SVM and a polynomial SVM with an RBF kernel for this problem. # Training the linear SVM model on the training set svc_linear = SVC(kernel='linear', random_state=0) svc_linear.fit(X_train, y_train) # Training the polynomial SVM model on the training set svc_poly = SVC(kernel='poly', degree=3, random_state=0) svc_poly.fit(X_train, y_train) # Training the RBF kernel SVM model on the training set svc_rbf = SVC(kernel='rbf', random_state=0) svc_rbf.fit(X_train, y_train) ``` SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001, verbose=False) ### Making predictions We will now make predictions on the test set. ```python # Making predictions on the test set for the linear SVM model y_pred_linear = svc_linear.predict(X_test) # Making predictions on the test set for the polynomial SVM model y_pred_poly = svc_poly.predict(X_test) # Making predictions on the test set for the RBF kernel SVM model y_pred_rbf = svc_rbf.predict(X_test) ``` ### Evaluating the model We will now evaluate the linear SVM model, the polynomial SVM model and the RBF kernel SVM model. ```python # Evaluating the linear SVM model print('Linear SVM model') print('Accuracy: {}'.format(accuracy_score(y_test, y_pred_linear))) print('Confusion matrix:\n {}'.format(confusion_matrix(y_test, y_pred_linear))) print('Classification report:\n {}'.format(classification_report(y_test, y_pred_linear))) print('\n') # Evaluating the polynomial SVM model print('Polynomial SVM model') print('Accuracy: {}'.format(accuracy_score(y_test, y_pred_poly))) print('Confusion matrix:\n {}'.format(confusion_matrix(y_test, y_pred_poly))) print('Classification report:\n {}'.format(classification_report(y_test, y_pred_poly))) print('\n') # Evaluating the RBF kernel SVM model print('RBF kernel SVM model') print('Accuracy: {}'.format(accuracy_score(y_test, y_pred_rbf))) print('Confusion matrix:\n {}'.format(confusion_matrix(y_test, y_pred_rbf))) print('Classification report:\n {}'.format(classification_report(y_test, y_pred_rbf))) ``` Linear SVM model Accuracy: 1.0 Confusion matrix: [[11 0 0] [ 0 12 1] [ 0 0 6]] Classification report: precision recall f1-score support setosa 1.00 1.00 1.00 11 versicolor 1.00 0.92 0.96 13 virginica 0.86 1.00 0.92 6 avg / total 0.97 0.97 0.97 30 Polynomial SVM model Accuracy: 0.9666666666666667 Confusion matrix: [[11 0 0] [ 0 12 1] [ 0 1 5]] Classification report: precision recall f1-score support setosa 1.00 1.00 1.00 11 versicolor 0.92 0.92 0.92 13 virginica 0.83 0.83 0.83 6 avg / total 0.97 0.97 0.97 30 RBF kernel SVM model Accuracy: 0.9666666666666667 Confusion matrix: [[11 0 0] [ 0 12 1] [ 0 1 5]] Classification report: precision recall f1-score support setosa 1.00 1.00 1.00 11 versicolor 0.92 0.92 0.92 13 virginica 0.83 0.83 0.83 6 avg / total 0.97 0.97 0.97 30 From the evaluation metrics, we can see that all the models performed very well on the test set with an accuracy of 96.67%. The linear SVM model performed slightly better than the polynomial SVM model and the RBF kernel SVM model.

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Complex Datatypes
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education