SegmentRuby
SegmentRuby is a module for segmenting (English) text based on various language models.
Installation
Add this line to your application's Gemfile:
gem 'segment_ruby'
And then execute:
$ bundle
Or install it yourself as:
$ gem install segment_ruby
Usage
require 'segment_ruby'
t = SegmentRuby::Analyzer.new(:twitter)
t.segment("theboywholived")
=> ["the", "boy", "who", "lived"]
Models include:
-
:norvig
: based on Google web data -
:google_books
: based on Google books data -
:anchor
: based on Web anchor text -
:twitter
: based on Twitter data -
:small
: smaller version of the Google books data -
:us_names
: US names, based on SSI data
The default model is small
. Use it if is seems to work for you.
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/willf/segment_ruby.