antlr4-native
Create a Ruby native extension from (almost) any ANTLR4 grammar.
What is this thing?
This gem generates native Ruby extensions from ANTLR grammars, enabling Ruby developers to generate parsers for numerous programming languages, file formats, etc.
Who needs this?
If you're a Ruby programmer who wants to parse and traverse source code written in a plethora of programming languages, antlr4-native might be able to help you. A number of community-developed ANTLR grammars are available in ANTLR's grammars-v4 repo. Grab one, then use antlr4-native to generate a bunch of Ruby-compatible C++ code from it. The C++ code can be compiled and used as a native extension.
Rather than use antlr4-native directly, consider using its sister project, the antlr-gemerator, which can generate a complete rubygem from an ANTLR grammar.
Code Generation
Here's how to generate a native extension for a given lexer and parser (Python in this case), defined in two .g4 files:
require 'antlr4-native'
generator = Antlr4Native::Generator.new(
grammar_files: ['Python3Lexer.g4', 'Python3Parser.g4'],
output_dir: 'ext',
parser_root_method: 'file_input'
)
generator.generate
In the example above, the output directory is set to the standard Ruby native extensions directory, 'ext'. Antlr4-native will generate code into ext/<name>, where <name> is the name of the parser as defined in the grammar file(s). In this case, PythonParser.g4 contains:
parser grammar Python3Parser;
so antlr4-native will generate code into the ext/python3-parser directory.
Finally, the parser_root_method
option tells antlr4-native which context represents the root of the parse tree. This context functions as the starting point for visitors.
Using extensions in Ruby
Parsers contain several methods for parsing source code. Use #parse
to parse a string and #parse_file
to parse the contents of a file:
parser = Python3Parser::Parser.parse(File.read('path/to/file.py'))
# equivalent to:
parser = Python3Parser::Parser.parse_file('path/to/file.py')
Use the #visit
method on an instance of Parser
to make use of a visitor:
visitor = MyVisitor.new
parser.visit(visitor)
See the next section for more info regarding creating and using visitors.
Visitors
A visitor class is automatically created during code generation. Visitors are just classes with a bunch of special methods, each corresponding to a specific part of the source language's syntax. The methods are essentially callbacks that are triggered in-order as the parser walks over the parse tree. For example, here's a visitor with a method that will be called whenever the parser walks over a Python function definition:
class FuncDefVisitor < Python3Parser::Visitor
def visit_func_def(ctx)
puts ctx.NAME.text # print the name of the method
visit_children(ctx)
end
end
Make sure to always call #visit_children
at some point in your visit_*
methods. If you don't, the subtree under the current context won't get visited.
Finally, if you override #initialize
in your visitor subclasses, don't forget to call super
. If you don't, you'll get a nice big segfault.
Caveats
- Avoid retaining references to contexts, tokens, etc anywhere in your Ruby code. Contexts (i.e. the
ctx
variables in the examples above) and other objects that are created by ANTLR's C++ runtime are automatically cleaned up without the Ruby interpreter's knowledge. You'll almost surely see a segfault if you retain a reference to one of these objects and try to use it after the call toParser#visit
. - Due to an ANTLR limitation, parsers cannot be used in a multi-threaded environment, even if each parser instance is used entirely in the context of a single thread (i.e. parsers are not shared between threads). According to the ANTLR C++ developers, parsers should be threadsafe. Unfortunately firsthand experience has proven otherwise. Your mileage may vary.
- The description of this gem says "(almost) any ANTLR4 grammar" because many grammars contain target-specific code. For example, the Python3 grammar referenced in the examples above contains inline Java code that the C++ compiler won't understand. You'll need to port any such code to C++ before you'll be able to compile and use the native extension.
System Requirements
- A Java runtime (version 1.6 or higher) is required to generate parsers, since ANTLR is a Java tool. The ANTLR .jar file is distributed inside the antlr4-native gem, so there's no need to download it separately. You can download a Java runtime here.
- Ruby >= 2.3.
- A C compiler (like gcc or clang) that supports C++14. If Ruby is working on your machine then you likely already have this.
License
Licensed under the MIT license. See LICENSE.txt for details.
Authors
- Cameron C. Dutro: http://github.com/camertron