NLP Pure
Natural language processing algorithms implemented in pure Ruby with minimal dependencies.
NOTE: this is not affiliated with, endorsed by, or in any way connected with Pure NLP, a trademark of John La Valle.
This project aims to provide functionality similar to Treat, open-nlp, and stanford-core-nlp but with fewer dependencies. The code is tested against English language but the algorithm implementations aim to be flexible for other languages.
Table of Contents
- Installation
- Usage
- Word Segmentation
- Sentence Segmentation
- Supported Ruby Versions
- Versioning
- Contributing
- License
- See Also
Installation
Add this line to your application’s Gemfile:
gem 'nlp-pure'
And then execute:
$ bundle
Or install it yourself as:
$ gem install nlp-pure
Usage
Simply require a library file and start using its interfaces! To preserve modularity and a small installation footprint, classes and modules are not recursively loaded up front.
Word Segmentation
$ bundle exec irb
irb(main):001:0> require 'nlp_pure/segmenting/default_word'
=> true
irb(main):002:0> NlpPure::Segmenting::DefaultWord.parse 'The quick brown fox jumps over the lazy dog.'
=> ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog."]
irb(main):003:0> NlpPure::Segmenting::DefaultWord.parse 'The New York-based company hired new staff.'
=> ["The", "New", "York", "based", "company", "hired", "new", "staff."]
irb(main):004:0> NlpPure::Segmenting::DefaultWord.parse 'The U.S.A. is a member of NATO.'
=> ["The", "U.S.A.", "is", "a", "member", "of", "NATO."]
irb(main):005:0> NlpPure::Segmenting::DefaultWord.parse "Mary had a little lamb,\nHis fleece was white as snow,\nAnd everywhere that Mary went,\nThe lamb was sure to go."
=> ["Mary", "had", "a", "little", "lamb,", "His", "fleece", "was", "white", "as", "snow,", "And", "everywhere", "that", "Mary", "went,", "The", "lamb", "was", "sure", "to", "go."]
Sentence Segmentation
M017-PDX:nlp-pure rp0616$ bundle exec irb
irb(main):001:0> require 'nlp_pure/segmenting/default_sentence'
=> true
irb(main):002:0> NlpPure::Segmenting::DefaultSentence.parse 'The U.S.A. is a member of NATO.'
=> ["The U.S.A. is a member of NATO."]
irb(main):003:0> NlpPure::Segmenting::DefaultSentence.parse 'Mary had a little lamb. The lamb\U+FFE2s fleece was white as snow. Everywhere that Mary went, the lamb was sure to go.'
=> ["Mary had a little lamb.", "The lambs fleece was white as snow.", "Everywhere that Mary went, the lamb was sure to go."]
irb(main):004:0> NlpPure::Segmenting::DefaultSentence.parse 'I am excited! Today is Friday.'
=> ["I am excited!", "Today is Friday."]
Supported Ruby Versions
This library aims to support and is tested against the following Ruby implementations:
If something doesn't work on one of these interpreters, it's a bug.
This library may inadvertently work (or seem to work) on other Ruby implementations, however support will only be provided for the versions listed above.
Versioning
This library aims to adhere to Semantic Versioning 2.0.0. Violations of this scheme should be reported as bugs. Specifically, if a minor or patch version is released that breaks backward compatibility, that version should be immediately yanked and/or a new version should be immediately released that restores compatibility. Breaking changes to the public API will only be introduced with new major versions. As a result of this policy, you can (and should) specify a dependency on this gem using the Pessimistic Version Constraint with two digits of precision. For example:
spec.add_dependency 'nlp-pure', '~> 0.1'
See Also
Search “nlp” at ruby-toolbox.com
- APIs
- Bindings and Toolkits
- Classification
- N-Grams
- Specific Languages
- Polish
- Stopwords
- Tokenization
- Word Counters