Project

judgee

0.0
No commit activity in last 3 years
No release in over 3 years
A simple Bayesian Classifier with additive smoothing and its focus on performance.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 0

Runtime

>= 0
 Project Readme

Judgee

A simple Bayesian Classifier with additive smoothing built in. The primary focus of judgee lies on performance and a minimal but flexible feature set. So it's up to you to do stemming, text analysis, etc.

Backed by Redis

Redis is an open source, BSD licensed, advanced key-value store, which is often referred to as a data structure server. It supports strings, hashes, lists, sets, sorted sets and offers an incredible performance.

Installation

gem install judgee

Getting started

# Require Judgee
require "judgee"

# Create an instance of Judgee.
# Judgee assumes that your Redis instance is running on localhost at port 6379.
judgee = Judgee::Classifier.new

# Is your Redis instance running on a host in your network, simply pass your options
judgee = Judgee::Classifier.new(:host => "10.0.1.1", :port => 6380)

# Judgee also supports Unix sockets
judgee = Judgee::Classifier.new(:path => "/tmp/redis.sock")


# Now you can train the classifier
judgee.train(:spam, ["bad", "worse", "stupid", "idiotic"])
judgee.train(:ham, ["good", "better", "best", "lovely"])

# After training, classify your text sample
judgee.classify(["good", "better", "best", "worse"]) # => :ham


# Want to untrain some words?
judgee.untrain(:spam, ["bad", "worse"])

Information on Performance

If you have a look at the source code, you might stumble upon two different method namings. There are two methods for training (train, train_fast), two methods for untraining (untrain, untrain_fast) and two methods for classification (classify, classify_fast). The difference is quite simple. As the name suggests, all methods with the suffix '_fast' are (really) faster (3x to 10x) in processing the data, but virtually unreadable.

So use the '_fast' methods if you need performance, e.g. in you production environment and the 'slow' methods just for learning purposes or small data.

Using the _fast methods

# Now you can train the classifier
judgee.train_fast(:spam, ["bad", "worse", "stupid", "idiotic"])
judgee.train_fast(:ham, ["good", "better", "best", "lovely"])

# After training, classify your text sample
judgee.classify_fast(["good", "better", "best", "worse"]) # => :ham


# Want to untrain some words?
judgee.untrain_fast(:spam, ["bad", "worse"])