Project

tactful_tokenizer

0.03

No commit activity in last 3 years

No release in over 3 years

tactful_tokenizer zencephalon/tactful_tokenizer Homepage Documentation Source Code Bug Tracker Wiki

TactfulTokenizer uses a naive bayesian model train on the Brown and WSJ corpuses to provide high quality sentence tokenization.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

44,080

80

13

3

Releases

Current version

0.0.5

4

2010-03-23

2014-04-27

Issues

0

8

8

Issue Closure Rate

100%

Pull Requests

Open Pull Requests

0

Closed Pull Requests

1

Merged Pull Requests

8

Pull Request Acceptance Rate

88%

Development

Primary Language

Ruby

Licenses

GPL-3

Average date of last 50 commits

2012-08-15

Reverse Dependencies

2

Dependencies

Development

rake

~> 10.3.1

~> 2.14.1

Project Readme

TactfulTokenizer¶ ↑

<img src=“https://badge.fury.io/rb/tactful_tokenizer.png” alt=“Gem Version” /> <img src=“https://travis-ci.org/zencephalon/Tactful_Tokenizer.png?branch=release” alt=“Build Status” /> <img src=“https://codeclimate.com/github/zencephalon/Tactful_Tokenizer.png” /> <img src=“https://coveralls.io/repos/zencephalon/Tactful_Tokenizer/badge.png?branch=release” alt=“Coverage Status” />

TactfulTokenizer is a Ruby library for high quality sentence tokenization. It uses a Naive Bayesian statistical model, and is based on Splitta, but has support for ‘?’ and ‘!’ as well as primitive handling of XHTML markup. Better support for XHTML parsing is coming shortly.

Additionally supports unicode text tokenization.

Usage¶ ↑

require "tactful_tokenizer"
m = TactfulTokenizer::Model.new
m.tokenize_text("Here in the U.S. Senate we prefer to eat our friends. Is it easier that way? <em>Yes.</em> <em>Maybe</em>!")
#=> ["Here in the U.S. Senate we prefer to eat our friends.", "Is it easier that way?", "<em>Yes.</em>", "<em>Maybe</em>!"]

The input text is expected to consist of paragraphs delimited by line breaks.

Installation¶ ↑

gem install tactful_tokenizer

Author¶ ↑

Copyright © 2010 Matthew Bunday. All rights reserved. Released under the GNU GPL v3.