Summary
Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance.
Goals
- Allow expression 100% in ruby
- Productivity through Simplicity and Understandability first
- Performance second
Example
require "babel_bridge"
class MyParser < BabelBridge::Parser
# foo rule: match "foo" optionally followed by the :bar rule
rule :foo, "foo", :bar?
# bar rule: match "bar"
rule :bar, "bar"
end
# create one more instances of your parser
parser = MyParser.new
parser.parse "foo" # matches "foo"
# => FooNode1 > "foo"
parser.parse "foobar" # matches "foobar"
# => FooNode1
# "foo"
# BarNode1 > "bar"
parser.parse "fribar" # fails to match
# => nil
parser.parse "foobarbar" # fails to match entire input
# => nil
More elaborate examples:
Features
# returns the BabelBridge::Rule instance for that rule
rule = MyParser[:foo]
# => rule :foo, "foo", :bar?
# nice human-readable view of the rule with extra info:
rule.to_s
# rule :foo, node_class: MyParser::FooNode
# variant_class: MyParser::FooNode1, pattern: "foo", :bar?
# returns the code necessary for generating the rule and all its variants
# (minus any class_eval code)
rule.inspect
# => rule :foo, "foo", :bar?
# returns the Node class for a rule
MyParser.node_class(:foo)
# => MyParser::FooNode
MyParser.node_class(:foo) do
# class_eval inside the rule's Node-class
end
# parses Text starting with the MyParser.root_rule
# The root_rule is defined automatically by the first rule defined, but can be set by:
# MyParser.root_rule=v
# where v is the symbol name of the rule or the actual rule object from MyParser[rule]
text = "foobar"
parser.parse(text)
# do a one-time parse with :bar set as the root-rule
text = "bar"
parser.parse(text, :rule => :bar)
# relax requirement to match entire input
parser.parse "foobar and then something", :partial_match => true
# parse failure
parser.parse "foo is not immediately followed by bar"
# human readable parser failure info
puts parser.parser_failure_info
Parser failure info output:
Parsing error at line 1 column 4 offset 3
Source:
...
foo<HERE> is not immediately followed by bar
...
Parser did not match entire input.
Parse path at failure:
FooNode1
Expecting:
"bar" BarNode1
NOTE: This is an evolving feature, this output is as-of 0.5.1 and may not match the current version.
Defining Rules
Inside the parser class, a rule is defined as follows:
class MyParser < BabelBridge::Parser
rule :rule_name, pattern
end
Where:
- :rule_name is a symbol
- pattern see Patterns below
You can also add new rules outside the class definition by:
MyParser.rule :rule_name, pattern
Patterns
Patterns are a list of pattern elements, matched in order:
Example:
rule :my_rule, "match", "this", "in", "order" # matches "matchthisinorder"
Pattern Elements
Pattern elements are basic-pattern-element or extended-pattern-element ( expressed as a hash). Internally, they are "compiled" into instances of PatternElement with optimized lambda functions for parsing.
Basic Pattern Elements (basic_element)
:my_rule # matches the Rule named :my_rule
:my_rule? # optional: optionally matches Rule :my_rule
:my_rule! # negative: success only if it DOESN'T match Rule :my_rule
"string" # matches the string exactly
/regex/ # matches the regex exactly
Advanced Pattern Elements
# success if basic_element could be matched, but the input is not consumed
could.match(pattern_element)
# negative (two equivelent methods)
dont.match(pattern_element)
match!(pattern_element)
# optional (two equivelent methods)
optionally.match(pattern_element)
match?(pattern_element)
# match 1 or more
many(pattern_element)
# match 1 or more of one basic_element delimited by another basic_element)
many(pattern_element, delimiter_pattern_element)
# match 0 or more
many?(pattern_element)
# An array of patterns tells BB to match those patterns in order ("and" matching)
[pattern_element_a, pattern_element_b, pattern_element_c, ...]
# match any one of the listed patterns ("or" matching)
any(pattern_element_a, pattern_element_b, pattern_element_c, ...)
# optionally match any of the patterns
any?(pattern_element_a, pattern_element_b, pattern_element_c, ...)
# don't match any of the patterns
any!(pattern_element_a, pattern_element_b, pattern_element_c, ...)
Custom Pattern Element Parser
Custom pattern elements are not generally needed, but for certain patterns, particularly context sensative ones, we provide a way to do it.
class MyParser < BabelBridge::Parser
# custom parser to match an all upper-case word followed by any number of characters before that word is repeated
rule :foo, (custom_parser do |parent_node|
offset = parent_node.next
src = parent_node.src
# Note, the \A anchors the search at the beginning of the string
if src[offset..-1].index(/\A[A-Z]+/) == 0
endpattern=$~.to_s
if i = src.index(endpattern, offset + endpattern.length)
range = offset..(i + endpattern.length)
BabelBridge::TerminalNode.new(parent_node, range, "endpattern")
end
end
end)
end
parser = parser
parser.parse "END this is in the middle END"
# => FooNode1 > "END this is in the middle END"
parser.parse "DRUID this is in the middle DRUID"
# => FooNode1 > "DRUID this is in the middle DRUID"
parser.parse "DRUID this is in the middle DRUI"
# => nil
Structure
- Each Rule defines a subclass of Node
- Each RuleVariant defines a subclass of the parent Rule's node-class
Therefor you can easily define code to be shared across all variants as well as define code specific to one variant.