Machine Learning Classification

Warren HansenMachine LearningLeave a Comment

This is a continuation from my last post about machine learning for social networks.

I was working on a test project for a luxury SUV client to help analyze a large data set. I wrote a logistic regression model predicting if given a user’s age and salary, would they buy the SUV.

Today I’d like to use the same dataset and compare these 7 machine learning classification techniques:

Logistic Regression
K-Nearest Neighbor
Support Vector Machine
Support Vector Machine Kernel
Naive Bayes
Decision Tree
Random Forests

I’ll keep this simple by just showing you the plots and results. I found them interesting and I hope you do too.

Logistic Regression

K-Nearest Neighbor

Support Vector Machine

Support Vector Machine Kernel

Naive Bayes

Decision Tree

Random Forests

Here are the results. KNN and SVM Kernel did best with only a 7% error and Decision Tree looked like a clear case of overfitting.

Logistic Regression = 11/89 [[65 3][ 8 24]] 12%
KNN = 07/93 [[64 4][ 3 29]] 07%
SVM = 10/90 [[66 2][ 8 24]] 11%
SVM Kernel = 7/93 [[64 4][ 3 29]] 07%
Naive Bayes = 10/90 [[65 3][ 7 25]] 11%
Decision Tree = 09/91 [[62 6][ 3 29]] 09%
Random Forests = 08/91 [[63 5][ 3 29]] 08%

Here is my code for K Nearest Neighbor:

I learned this from a comprehensive course on machine learning at Udemy.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.