Buy using the above datasets calculate and compare the accuracy between KNN Accuracy and 10-fold Cross-validation with Logistic Regression.
Solve the following cases before calculating the accuracy.
- Import the sklearn library for all the datasets.
- Read the data for all the datasets and normalize them.
- For the athletes dataset replace the data of column Sport with 0, 1 and 2 accordingly. (For example 0 for Gymnastics, 1 for Basketball and 2 for Track). For the Iris dataset replace the data of column Species with 0, 1 and 2 accordingly. (For example 0 for Iris-setosa, 1 for Iris-versicolor and 2 for Iris-virginica). For the mpg dataset replace the data of column Model with 0,1, 2 and so on accordingly.(For example 0 for chevrolet chevelle malibu, 1 for buick skylark 320 and so on).
- Define the predictions and print them for all datasets.
- Create a classifier to search for an optimal value of K for KNN Algorithm for cross-validation accuracy.
- Plot the value of K for KNN Versus the cross-validation accuracy.
- Print out the KNN accuracy and 10-fold cross-validation with logistic regression and compare them.
All of the functions that you need for this system are given inside the notebook Cross-validation.ipynb in Module 10 of your class sessions.
You will need to test how accurate your classification system is by running your model on the test sets ( Explanation in detail is given in the Zacharski Chapter4)
You should submit / upload one IPython Notebook file ( .ipynb )