Low commit activity in last 3 years
A long-lived project that still receives updates
Classify search query strings
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

 Project Readme

known_item_search_classifier

A ruby gem that classifies search query strings as either known-item searches or unknown-item searches. It uses features identified by Min-Yen Kan and Danny C.C. Poo in their 2005 paper called Detecting and Supporting Known Item Queries
in Online Public Access Catalogs](http://www.comp.nus.edu.sg/~kanmy/papers/f57-kan.pdf). It uses a Guassian Naive Bayes algorithm and winds up being about 74% accurate.

Usage

# Using the default training set
require('known_item_search_classifier')
c = KnownItemSearchClassifier::Classifier.new
c.is_known_item_search? "Perl Best Practices by Damian Conway" # => :known

# Using your own training set
require('known_item_search_classifier')
c = KnownItemSearchClassifier::Classifier.new
c.train([
    ["The Illiad by Homer", :known],
    ["bugs", :unknown],
    ["Teaching Community: A Pedagogy of Hope", :known],
    ["pre-exam stress", :unknown]])
c.train_from_csv('training_set.csv') # With format "where the wild things are",known
c.is_known_item_search? "Denison Avenue" # => :known

To retrain

bin/console
c = KnownItemSearchClassifier::Classifier.new
c.train_from_csv('new_data.csv')
c.custom_training_set.classifier.categories_probabilities # copy this as the return value in lib/known_item_search_classifier/default_training_set.rb
c.custom_training_set.classifier.categories_summaries # copy this as the return value in lib/known_item_search_classifier/default_training_set.rb

Tests

bundle exec rake