Classifier Reborn
Getting Started
Classifier Reborn is a general classifier module to allow Bayesian and other types of classifications. It is a fork of cardmagic/classifier under more active development. Currently, it has Bayesian Classifier and Latent Semantic Indexer (LSI) implemented.
Here is a quick illustration of the Bayesian classifier.
$ gem install classifier-reborn
$ irb
irb(main):001:0> require 'classifier-reborn'
irb(main):002:0> classifier = ClassifierReborn::Bayes.new 'Ham', 'Spam'
irb(main):003:0> classifier.train "Ham", "Sunday is a holiday. Say no to work on Sunday!"
irb(main):004:0> classifier.train "Spam", "You are the lucky winner! Claim your holiday prize."
irb(main):005:0> classifier.classify "What's the plan for Sunday?"
#=> "Ham"
Now, let's build an LSI, classify some text, and find a cluster of related documents.
irb(main):006:0> lsi = ClassifierReborn::LSI.new
irb(main):007:0> lsi.add_item "This text deals with dogs. Dogs.", :dog
irb(main):008:0> lsi.add_item "This text involves dogs too. Dogs!", :dog
irb(main):009:0> lsi.add_item "This text revolves around cats. Cats.", :cat
irb(main):010:0> lsi.add_item "This text also involves cats. Cats!", :cat
irb(main):011:0> lsi.add_item "This text involves birds. Birds.", :bird
irb(main):012:0> lsi.classify "This text is about dogs!"
#=> :dog
irb(main):013:0> lsi.find_related("This text is around cats!", 2)
#=> ["This text revolves around cats. Cats.", "This text also involves cats. Cats!"]
There is much more that can be done using Bayes and LSI beyond these quick examples. For more information read the following documentation topics.
- Installation and Dependencies
- Bayesian Classifier
- Latent Semantic Indexer (LSI)
- Classifier Validation
- Development and Contributions (Optional Docker instructions included)
Notes on JRuby support
gem 'classifier-reborn-jruby', platforms: :java
While experimental, this gem should work on JRuby without any kind of additional changes. Unfortunately, you will not be able to use C bindings to GNU/GSL or similar performance-enhancing native code. Additionally, we do not use fast_stemmer
, but rather an implementation of the Porter Stemming algorithm. Stemming will differ between MRI and JRuby, however you may choose to disable stemming and do your own manual preprocessing (or use some other popular Java library).
If you encounter a problem, please submit your issue with [JRuby]
in the title.
Code of Conduct
In order to have a more open and welcoming community, Classifier Reborn
adheres to the Jekyll
code of conduct adapted from the Ruby on Rails
code of conduct.
Please adhere to this code of conduct in any interactions you have in the Classifier
community.
If you encounter someone violating these terms, please let Chase Gilliam know and we will address it as soon as possible.
Authors and Contributors
- Lucas Carlson
- David Fayram II
- Cameron McBride
- Ivan Acosta-Rubio
- Parker Moore
- Chase Gilliam
- and many more...
The Classifier Reborn library is released under the terms of the GNU LGPL-2.1.