You Try ... 1. Comparing normalized vs raw data Using knns with Euclidean distance and 3 nearest neighbors, compare the performance of a knn trained with the raw data vs. a knn trained with the normalized data. First, we need to divide the DataFrames as we did before. []....#TODO []....#TODO
intro_to_sklearn.ipynb
You Try ...
1. Comparing normalized vs raw data
Using knns with Euclidean distance and 3 nearest neighbors, compare the performance of a knn trained with the raw data vs. a knn trained with the normalized data.
First, we need to divide the DataFrames as we did before.
[]....#TODO
[]....#TODO
Now the actual building, fitting and testing of a knn classifier
[].....
[].....
2. The Iris Dataset
What is the accuracy of your new model with one epoch of training?
We are going to use the Iris Dataset, one of the standard data mining data sets which has been around since 1988. The data set contains 3 classes of 50 instances each
- Iris Setosa
- Iris Versicolour
- Iris Virginica
There are only 4 attributes or features:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
Here is an example of the data:
Sepal Length | Sepal Width | Petal Length | Petal Width | Class |
---|---|---|---|---|
5.3 | 3.7 | 1.5 | 0.2 | Iris-setosa |
5.0 | 3.3 | 1.4 | 0.2 | Iris-setosa |
5.0 | 2.0 | 3.5 | 1.0 | Iris-versicolor |
5.9 | 3.0 | 4.2 | 1.5 | Iris-versicolor |
6.3 | 3.4 | 5.6 | 2.4 | Iris-virginica |
6.4 | 3.1 | 5.5 | 1.8 | Iris-virginica |
The job of the classifier is to determine the class of an instance (the type of Iris) based on the values of the attributes.
The dataset is available at
https://raw.githubusercontent.com/zacharski/ml-class/master/data/irisTrain.csv
When you divide into training and test sets please use random_state=0 so we can compare results.
You should include a short paragraph describing your results.
[]....
[]....
[]....
[].....
Trending now
This is a popular solution!
Step by step
Solved in 2 steps