No commit activity in last 3 years
No release in over 3 years
Implement the Gaussian Naive Bayes algorithm for classification
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

Use case

Given that the Iris Data Set as the following

Sepal length Sepal width Petal length Petal width Species
6.8 3.2 5.9 2.3 virginica
5.1 3.5 1.4 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.8 4 1.2 0.2 setosa
6.3 3.3 6 2.5 virginica
5.7 4.4 1.5 0.4 setosa
6.2 2.2 4.5 1.5 versicolor
7.1 3 5.9 2.1 virginica
6.4 2.8 5.6 2.1 virginica
6 2.9 4.5 1.5 versicolor
5.4 3.9 1.3 0.4 setosa
6.1 2.8 4 1.3 versicolor
4.4 3 1.3 0.2 setosa
5.5 4.2 1.4 0.2 setosa
6.1 3 4.6 1.4 versicolor
5.9 3 5.1 1.8 virginica
4.6 3.2 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3 1.4 0.1 setosa
5.6 2.5 3.9 1.1 versicolor
5.1 3.4 1.5 0.2 setosa
5.1 3.8 1.5 0.3 setosa
7.9 3.8 6.4 2 virginica
6.3 2.5 5 1.9 virginica
6.5 3 5.2 2 virginica
5.4 3.9 1.7 0.4 setosa
5.7 2.5 5 2 virginica
5.5 2.5 4 1.3 versicolor
5.8 2.8 5.1 2.4 virginica
5.4 3 4.5 1.5 versicolor
7.6 3 6.6 2.1 virginica
7.7 2.8 6.7 2 virginica
5.6 2.8 4.9 2 virginica
5.1 3.5 1.4 0.2 setosa
6.3 3.3 4.7 1.6 versicolor
6.2 2.8 4.8 1.8 virginica
4.9 3.6 1.4 0.1 setosa
6.3 2.5 4.9 1.5 versicolor
4.8 3.1 1.6 0.2 setosa
6.1 2.8 4.7 1.2 versicolor
5.1 3.3 1.7 0.5 setosa
6.9 3.1 5.4 2.1 virginica
6.1 3 4.9 1.8 virginica
5.1 3.7 1.5 0.4 setosa
6.8 2.8 4.8 1.4 versicolor
5.4 3.4 1.7 0.2 setosa
6.8 3 5.5 2.1 virginica
5.6 2.9 3.6 1.3 versicolor
5.1 2.5 3 1.1 versicolor
6.5 3 5.5 1.8 virginica
6.5 2.8 4.6 1.5 versicolor
5.8 2.7 5.1 1.9 virginica
6.5 3.2 5.1 2 virginica
6.9 3.1 4.9 1.5 versicolor
5 3 1.6 0.2 setosa
5.6 2.7 4.2 1.3 versicolor
6.3 2.8 5.1 1.5 virginica
5 3.4 1.5 0.2 setosa
5.7 3 4.2 1.2 versicolor
6.4 2.7 5.3 1.9 virginica
4.9 3.1 1.5 0.2 setosa
5.2 4.1 1.5 0.1 setosa
6.4 3.1 5.5 1.8 virginica
6.4 2.8 5.6 2.2 virginica
5.7 2.8 4.5 1.3 versicolor
6 3.4 4.5 1.6 versicolor
5.8 2.7 4.1 1 versicolor
6.9 3.2 5.7 2.3 virginica
6.4 3.2 5.3 2.3 virginica
5 3.5 1.6 0.6 setosa
7 3.2 4.7 1.4 versicolor
7.2 3 5.8 1.6 virginica
5.3 3.7 1.5 0.2 setosa
5.6 3 4.1 1.3 versicolor
4.8 3 1.4 0.3 setosa
6.7 3.3 5.7 2.1 virginica
5.2 3.4 1.4 0.2 setosa
6.9 3.1 5.1 2.3 virginica
5.1 3.8 1.9 0.4 setosa
6.2 2.9 4.3 1.3 versicolor
6.4 2.9 4.3 1.3 versicolor
7.7 3.8 6.7 2.2 virginica
6.6 2.9 4.6 1.3 versicolor
5 3.5 1.3 0.3 setosa
4.4 2.9 1.4 0.2 setosa
5.8 2.7 5.1 1.9 virginica
5.5 2.6 4.4 1.2 versicolor
6.3 3.4 5.6 2.4 virginica
6.7 3 5 1.7 versicolor
4.7 3.2 1.3 0.2 setosa
5 3.3 1.4 0.2 setosa
5.8 2.7 3.9 1.2 versicolor
4.3 3 1.1 0.1 setosa
6.6 3 4.4 1.4 versicolor
6.7 2.5 5.8 1.8 virginica
5.7 2.6 3.5 1 versicolor
4.9 3 1.4 0.2 setosa
5.7 3.8 1.7 0.3 setosa
4.9 2.5 4.5 1.7 virginica
5.1 3.8 1.6 0.2 setosa

The requirement is predicting the species of data in the table below

Sepal length Sepal width Petal length Petal width Species
6.3 2.3 4.4 1.3 ?
6 3 4.8 1.8 ?
5.9 3.2 4.8 1.8 ?
6.7 3.1 4.7 1.5 ?
4.6 3.1 1.5 0.2 ?
6.1 2.6 5.6 1.4 ?
4.6 3.4 1.4 0.3 ?
7.7 3 6.1 2.3 ?
5.7 2.9 4.2 1.3 ?
5 3.2 1.2 0.2 ?
5.4 3.4 1.5 0.4 ?
5 3.6 1.4 0.2 ?
7.7 2.6 6.9 2.3 ?
5.2 3.5 1.5 0.2 ?
5.5 2.3 4 1.3 ?
5.6 3 4.5 1.5 ?
6 2.2 4 1 ?
4.9 3.1 1.5 0.1 ?
4.8 3.4 1.9 0.2 ?
6.3 2.7 4.9 1.8 ?
6.7 3.1 4.4 1.4 ?
5.2 2.7 3.9 1.4 ?
5.9 3 4.2 1.5 ?
5.4 3.7 1.5 0.2 ?
7.2 3.2 6 1.8 ?
6 2.2 5 1.5 ?
6.7 3.1 5.6 2.4 ?
6.2 3.4 5.4 2.3 ?
5.5 2.4 3.7 1 ?
6.3 2.9 5.6 1.8 ?
6.4 3.2 4.5 1.5 ?
4.6 3.6 1 0.2 ?
6.7 3 5.2 2.3 ?
5.5 3.5 1.3 0.2 ?
5.8 2.6 4 1.2 ?
6.5 3 5.8 2.2 ?
4.8 3.4 1.6 0.2 ?
6.7 3.3 5.7 2.5 ?
5.5 2.4 3.8 1.1 ?
5.7 2.8 4.1 1.3 ?
5 3.4 1.6 0.4 ?
6.1 2.9 4.7 1.4 ?
4.9 2.4 3.3 1 ?
7.4 2.8 6.1 1.9 ?
6 2.7 5.1 1.6 ?
4.5 2.3 1.3 0.3 ?
5 2 3.5 1 ?
7.2 3.6 6.1 2.5 ?
5 2.3 3.3 1 ?
7.3 2.9 6.3 1.8 ?

How to resolve this issue using Gaussian Naive Bayes

Install gem

gem install gaussian_naive_bayes

Prepare the training set

training_set = [[6.8,3.2,5.9,2.3],[5.1,3.5,1.4,0.3],[4.4,3.2,1.3,0.2],[5.8,4,1.2,0.2],[6.3,3.3,6,2.5],[5.7,4.4,1.5,0.4],[6.2,2.2,4.5,1.5],[7.1,3,5.9,2.1],[6.4,2.8,5.6,2.1],[6,2.9,4.5,1.5],[5.4,3.9,1.3,0.4],[6.1,2.8,4,1.3],[4.4,3,1.3,0.2],[5.5,4.2,1.4,0.2],[6.1,3,4.6,1.4],[5.9,3,5.1,1.8],[4.6,3.2,1.4,0.2],[4.7,3.2,1.6,0.2],[4.8,3,1.4,0.1],[5.6,2.5,3.9,1.1],[5.1,3.4,1.5,0.2],[5.1,3.8,1.5,0.3],[7.9,3.8,6.4,2],[6.3,2.5,5,1.9],[6.5,3,5.2,2],[5.4,3.9,1.7,0.4],[5.7,2.5,5,2],[5.5,2.5,4,1.3],[5.8,2.8,5.1,2.4],[5.4,3,4.5,1.5],[7.6,3,6.6,2.1],[7.7,2.8,6.7,2],[5.6,2.8,4.9,2],[5.1,3.5,1.4,0.2],[6.3,3.3,4.7,1.6],[6.2,2.8,4.8,1.8],[4.9,3.6,1.4,0.1],[6.3,2.5,4.9,1.5],[4.8,3.1,1.6,0.2],[6.1,2.8,4.7,1.2],[5.1,3.3,1.7,0.5],[6.9,3.1,5.4,2.1],[6.1,3,4.9,1.8],[5.1,3.7,1.5,0.4],[6.8,2.8,4.8,1.4],[5.4,3.4,1.7,0.2],[6.8,3,5.5,2.1],[5.6,2.9,3.6,1.3],[5.1,2.5,3,1.1],[6.5,3,5.5,1.8],[6.5,2.8,4.6,1.5],[5.8,2.7,5.1,1.9],[6.5,3.2,5.1,2],[6.9,3.1,4.9,1.5],[5,3,1.6,0.2],[5.6,2.7,4.2,1.3],[6.3,2.8,5.1,1.5],[5,3.4,1.5,0.2],[5.7,3,4.2,1.2],[6.4,2.7,5.3,1.9],[4.9,3.1,1.5,0.2],[5.2,4.1,1.5,0.1],[6.4,3.1,5.5,1.8],[6.4,2.8,5.6,2.2],[5.7,2.8,4.5,1.3],[6,3.4,4.5,1.6],[5.8,2.7,4.1,1],[6.9,3.2,5.7,2.3],[6.4,3.2,5.3,2.3],[5,3.5,1.6,0.6],[7,3.2,4.7,1.4],[7.2,3,5.8,1.6],[5.3,3.7,1.5,0.2],[5.6,3,4.1,1.3],[4.8,3,1.4,0.3],[6.7,3.3,5.7,2.1],[5.2,3.4,1.4,0.2],[6.9,3.1,5.1,2.3],[5.1,3.8,1.9,0.4],[6.2,2.9,4.3,1.3],[6.4,2.9,4.3,1.3],[7.7,3.8,6.7,2.2],[6.6,2.9,4.6,1.3],[5,3.5,1.3,0.3],[4.4,2.9,1.4,0.2],[5.8,2.7,5.1,1.9],[5.5,2.6,4.4,1.2],[6.3,3.4,5.6,2.4],[6.7,3,5,1.7],[4.7,3.2,1.3,0.2],[5,3.3,1.4,0.2],[5.8,2.7,3.9,1.2],[4.3,3,1.1,0.1],[6.6,3,4.4,1.4],[6.7,2.5,5.8,1.8],[5.7,2.6,3.5,1],[4.9,3,1.4,0.2],[5.7,3.8,1.7,0.3],[4.9,2.5,4.5,1.7],[5.1,3.8,1.6,0.2]]
target_category = ["virginica","setosa","setosa","setosa","virginica","setosa","versicolor","virginica","virginica","versicolor","setosa","versicolor","setosa","setosa","versicolor","virginica","setosa","setosa","setosa","versicolor","setosa","setosa","virginica","virginica","virginica","setosa","virginica","versicolor","virginica","versicolor","virginica","virginica","virginica","setosa","versicolor","virginica","setosa","versicolor","setosa","versicolor","setosa","virginica","virginica","setosa","versicolor","setosa","virginica","versicolor","versicolor","virginica","versicolor","virginica","virginica","versicolor","setosa","versicolor","virginica","setosa","versicolor","virginica","setosa","setosa","virginica","virginica","versicolor","versicolor","versicolor","virginica","virginica","setosa","versicolor","virginica","setosa","versicolor","setosa","virginica","setosa","virginica","setosa","versicolor","versicolor","virginica","versicolor","setosa","setosa","virginica","versicolor","virginica","versicolor","setosa","setosa","versicolor","setosa","versicolor","virginica","versicolor","setosa","setosa","virginica","setosa"]

Train

require "gaussian_naive_bayes"

learner = GaussianNaiveBayes::Learner.new
training_set.each_with_index do |vector, index|
  learner.train(vector, target_category[index])
end

Prepare the test set

test_set = [[6.3,2.3,4.4,1.3],[6,3,4.8,1.8],[5.9,3.2,4.8,1.8],[6.7,3.1,4.7,1.5],[4.6,3.1,1.5,0.2],[6.1,2.6,5.6,1.4],[4.6,3.4,1.4,0.3],[7.7,3,6.1,2.3],[5.7,2.9,4.2,1.3],[5,3.2,1.2,0.2],[5.4,3.4,1.5,0.4],[5,3.6,1.4,0.2],[7.7,2.6,6.9,2.3],[5.2,3.5,1.5,0.2],[5.5,2.3,4,1.3],[5.6,3,4.5,1.5],[6,2.2,4,1],[4.9,3.1,1.5,0.1],[4.8,3.4,1.9,0.2],[6.3,2.7,4.9,1.8],[6.7,3.1,4.4,1.4],[5.2,2.7,3.9,1.4],[5.9,3,4.2,1.5],[5.4,3.7,1.5,0.2],[7.2,3.2,6,1.8],[6,2.2,5,1.5],[6.7,3.1,5.6,2.4],[6.2,3.4,5.4,2.3],[5.5,2.4,3.7,1],[6.3,2.9,5.6,1.8],[6.4,3.2,4.5,1.5],[4.6,3.6,1,0.2],[6.7,3,5.2,2.3],[5.5,3.5,1.3,0.2],[5.8,2.6,4,1.2],[6.5,3,5.8,2.2],[4.8,3.4,1.6,0.2],[6.7,3.3,5.7,2.5],[5.5,2.4,3.8,1.1],[5.7,2.8,4.1,1.3],[5,3.4,1.6,0.4],[6.1,2.9,4.7,1.4],[4.9,2.4,3.3,1],[7.4,2.8,6.1,1.9],[6,2.7,5.1,1.6],[4.5,2.3,1.3,0.3],[5,2,3.5,1],[7.2,3.6,6.1,2.5],[5,2.3,3.3,1],[7.3,2.9,6.3,1.8]]

Classify

classifier = learner.classifier
p classifier.classify([6.3,2.3,4.4,1.3]) # => "versicolor"
p classifier.classify([6,3,4.8,1.8]) # => "virginica"

References

Naive Bayes

Gaussian Naive Bayes