Zenlish
What is Zenlish ?
Zenlish = Zen + English
Zenlish will be a Controlled Natural Language based on English.
A Controlled Natural Language
is a subset of a natural language -here English- limited to specific problem domains.
What is the purpose of Zenlish ?
The goal of this project is to implement a toolkit for a subset of the English language. With Zenlish it should be possible for a Ruby application to interact with users with a language that is close enough to English.
Project status
The project is still in inception. Currently, zenlish is able to parse all
sentences from lessons 1-A up to 3-G from
Learn These Words First.
The parser is able to cope with syntactical ambiguities generating parse forests
instead of parse trees.
The intent is to deliver gem versions in small increments.
Zenlish as a library (gem)
Over time, the zenlish gem will contain:
- A tokenizer (tagging, lemmatizer)[TODO]
- A lexicon [STARTED]
- A context-free grammar [STARTED]
- A parser [STARTED]
- Feature unification (for number, gender agreement,...)[STARTED]
- A simplified ontology[TODO]
Some project metrics (v. 0.2.00)
Metric | Value |
---|---|
Number of lemmas in dictionary | 141 |
Coverage 100 commonest English words | 61% |
Number of production rules in grammar | 185 |
Number of lessons covered | 23 |
Number of sentences in spec files | 352 |
Installation...
...with Rubygem
Install the gem yourself as:
$ gem install zenlish
...with Bundler
Add this line to your application's Gemfile:
gem 'zenlish'
And then execute:
$ bundle
Some code snippets
Interacting with the dictionary:
require 'zenlish'
# Retrieving a "word" (more precisely, a lexeme) from the dictionary.
lexeme = Zenlish::Lang::Dictionary.get_lexeme('move')
# What is the Ruby class of a lexeme?
p lexeme.class # => Zenlish::Lex::Lexeme
# What is the word class of verb 'move'?
p lexeme.wclass.class # => Zenlish::WClasses::RegularVerb
# Here is some Zenlish text to analyze:
some_text = 'one person can move to the same place.'
p some_text
some_text.scan(/(?:\w+)|[\.,:"]/).each do |entry|
lexeme = Zenlish::Lang::Dictionary.get_lexeme(entry)
p lexeme.wclass.class
end
# Loop result should be:
# Zenlish::WClasses::Cardinal
# Zenlish::WClasses::CommonNoun
# Zenlish::WClasses::ModalVerbCan
# Zenlish::WClasses::RegularVerb
# Zenlish::WClasses::Preposition
# Zenlish::WClasses::DefiniteArticle
# Zenlish::WClasses::Adjective
# Zenlish::WClasses::CommonNoun
# Rley::Syntax::Terminal
Demo of lexeme inflections
# Demo inflection (aka declension, conjugation)
require 'zenlish'
# The Zenlish dictionary is more than a list of words...
dict = Zenlish::Lang::Dictionary
# What are the spellings of a given common noun?
noun_body = dict.get_lexeme('body')
p noun_body.all_inflections # => ["body", "bodies"]
# What are the word forms of a personal pronoun (3rd person)?
p_3rd_pn = dict.get_lexeme('it')
p p_3rd_pn.all_inflections # => ["she", "her", "he", "him", "it", "they", "them"]
# What are the distinct forms of a regular verb?
vb_touch = dict.get_lexeme('touch')
p vb_touch.all_inflections # => ["touch", "touching", "touched", "touches"]
# What are the forms of the (highly) irregular verb be?
vb_be = dict.get_lexeme('be', Zenlish::WClasses::IrregularVerbBe)
p vb_be.all_inflections # => ["am", "being", "was", "been", "are", "were", "is"]
More to come...
Principles behind the Zenlish language
Minimalism
The name of the language is a combination of 'Zen' and 'English'.
It reflects a desire to make Zenlish a simple language:
- The focus is put on a simplified syntax,
- A limited lexicon. Priority on most commonly used words.
Expressiveness
Zenlish should be rich enough to express ideas, facts in a fluid way (vs. contrived, artificial way). Litmus test: a Zenlish text should be easy to read to a English reading person.
Roadmap
Here a tentative roadmap:
A) Ability to parse sentences from Learn These Words First
STARTED. 24% complete
This website advocates the idea of a multi-layered dictionary.
At the core, there are about 300 essential words.
The choice of these words is inspired by the semantic primitives of NSM
(Natural Semantic Metalanguage).
The essential words are introduced in twelve lessons. Each lesson put the words
in exemplar sentences and pictures.
The milestone sub-goals are:
- To inject the 300 core words into Zenlish lexicon,
- Zenlish should be able to parse all the example sentences
B) Associate lexical features to terms in lexicon
STARTED The sub-goals are:
- To enrich the lexicon entries with lexical and syntactical features.
- Zenlish should be able to derive the declensions of nouns, conjugation of verbs,
- Also Zenlish should detect agreement errors
- Ideally, Zenlish should have a lemmatizer
C) Enrich lexicon entries with semantical features and relationships
The sub-goals are:
- To enrich the lexicon entries with lexical and syntactical features.
- Zenlish should be able to derive the declensions of nouns, conjugation of verbs,
- Also Zenlish should detect agreement errors
D) Build a generic ontology and map Zenlish text to it.
The sub-goals are:
- To have a simplified ontology that covers the concepts covered in the lesson sentences.
- Hopefully Zenlish should be answer to queries related to the lesson sentences.
E) Capability to parse a complete book
A good candidate book is "The Edge of the Sky" by Roberto Trotta (ISBN 978-0-465-04471-9 : hardcover, ISBN 978-0-465-04490-0 : ebook).
Professor Trotta challenged himself by writing a book on Cosmology with the 1000 most used words. More details here.
In order to achieve this goal, Zenlish should:
- Incorporate the 1000 words in its lexicon
- Have a grammar that allows the parsing of the sentences in the book.
F) Capability to interpret the meaning of a complete book
Probably, far-fetched. But it will be nice to launch query to Zenlish to check if it has some understanding of the text it reads (i.e. has a semantic representation).
Usage
TODO: Write usage instructions here
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/famished-tiger/Zenlish. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Zenlish project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.