saxon-rb – An idiomatic Ruby wrapper for Saxon
saxon-rb
aims to be a fully-featured idiomatic wrapper for the Saxon XML
processing and transformation library. Built after several years experience with
writing, using, and maintaining the saxon-xslt
XSLT-focussed wrapper,
saxon-rb
aims to keep that ease-of-use for those most common cases, while
making the less-common cases easy as well.
Saxon provides a massive amount of valuable functionality beyond simple XSLT
compilation and invocation, and a lot of that is very hard to use from Ruby. The
facilities provided by it, and by the XSLT 2 and 3 specs, rely heavily on the
XDM values and types system, which saxon-rb
makes easy to work with.
Parameter creation and passing, are richer and more expressive; results from XSLT, or XPath, that aren't just result trees can be worked with directly in Ruby.
You can find Saxon HE at http://saxon.sourceforge.net/ and Saxonica at http://www.saxonica.com/
Saxon HE is (c) Michael H. Kay and released under the Mozilla MPL 1.0 (http://www.mozilla.org/MPL/1.0/)
Installation
Add this line to your application's Gemfile:
gem 'saxon-rb'
And then execute:
$ bundle
Or install it yourself as:
$ gem install saxon-rb
Simple usage
Parse an XML document
Using a default document builder from the default processor:
document_node = Saxon::Processor.create.document_builder.build(Saxon::Source.from_path('/path/to/your.xml'))
Or
document_node = Saxon.XML('/path/to/your.xml')
Transform an XML document with XSLT
transformer = Saxon::Processor.create.xslt_compiler.compile(Saxon::Source.from_path('/path/to/your.xsl'))
# Or
transformer = Saxon.XSLT('/path/to/your.xsl')
# Apply templates against a document
result_1 = transformer.apply_templates(document_node)
# Call a template without a context item to process
result_2 = transformer.call_template('main-template')
Run XPath queries against an XML document
processor = Saxon::Processor.create
xpath = processor.xpath_compiler.compile('//element[@attr = $a:var]')
matches = xpath.evaluate(document_node)
Migrating from saxon-xslt
(or Nokogiri)
saxon-xslt
wrapped Saxon and provided a Nokogiri-esque API. Nokogiri is built on XSLT 1 processors, and the APIs support XSLT 1 features, but won't allow XSLT 2/3 features (like setting initial tunnel parameters, starting processing by calling a named template, or a function). The main API for invoking XSLT in saxon-rb
needs to be different from Nokogiri's so that full use of XSLT 2/3 features is possible.
By default, the original saxon-xslt
API (on Saxon::XSLT::Stylesheet
) is not available. If you need those methods, then you can load the legacy API by requiring saxon/nokogiri
.
That gives you back the #transform
, #apply_to
, and #serialize
methods on the object you get back after compiling an XSLT: Saxon::XSLT::Executable
in saxon-rb
. They work the same way, and you should be able to drop in saxon-rb
as a replacement for XSLT processing.
require 'saxon-rb'
require 'saxon/nokogiri'
xslt = Saxon.XSLT('/path/to/my.xsl')
xslt.apply_to(Saxon.XML('/path/to/my.xml')) #=> "<result-xml/>"
Usage
XSLT
Using XSLT involves creating a compiler, compiling an XSLT document, and then using that compiled document to transform something.
Constructing a Compiler
The simplest way is to call #xslt_compiler
on a Saxon::Processor
instance.
processor = Saxon::Processor.create
# Simplest, default options
compiler = processor.xslt_compiler
In order to set compile-time options, declare static compile-time parameters then pass a block to the method using the DSL syntax (see the DSL RDoc for complete details):
compiler = processor.xslt_compiler {
static_parameters 'param' => 'value'
default_collation 'https://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive/'
}
The static context for a Compiler cannot be changed, you must create a new one with the context you want. We make it very simple to create a new Compiler based on an existing one. Declaring a parameter again overwrites the value.
new_compiler = compiler.create {
static_parameters 'param' => 'new value'
}
new_compiler.default_collation #=> "https://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive/"
If you wanted to remove a value, you need to start from scratch. You can, of course, extract any data you want from a compiler instance separately and use that to create a new one.
params = compiler.static_parameters
new_compiler = processor.xslt_compiler {
static_parameters params
}
new_compiler.default_collation #=> nil
Compiling an XSLT stylesheet
Once you have a compiler, call #compile
and pass in a {Saxon::Source} or an
existing {Saxon::XDM::Node}. Parameters and other run-time configuration options
can be set using a block in the same way as creating a compiler. You'll be returned a {Saxon::XSLT::Executable}.
source = Saxon::Source.create('my.xsl')
xslt = compiler.compile(source) {
initial_template_parameters 'param' => 'other value'
}
You can also pass in (or override) parameters at stylesheet execution time, but if you'll be executing the same stylesheet against many documents with the same initial parameters then setting them at compile time is simpler.
Executing an XSLT stylesheet
Once you have a compiled stylesheet, then it can be executed against a source document in a variety of ways.
First, you can use the traditional apply templates ,method, which was the only way in XSLT 1.
input = Saxon::Source.create('input.xml')
result = xslt.apply_templates(input)
Next, you can call a specific named template (new in XSLT 2).
result = xslt.call_template('template-name')
Note that there's no input document here. If your XSLT needs a global context item set when you invoke it via a named template, then you can do that, too:
input = processor.XML('input.xml')
result = xslt.call_template('template-name', {
global_context_item: input
})
Global and initial template parameters can be set at compiler creation time, compile time, or execution time. See Setting parameters for details.
To serialize the document you can, of course, just call #to_s
on the result:
result = xslt.apply_templates(input)
puts result.to_s #=> '<?xml version="1.0"...'
You can also serialize directly to a file path or to any IO instance.
result = xslt.apply_templates(input)
result.serialize('/path/to/output.xml')
result_2 = xslt.apply_templates(input)
result_2.serialize($stderr)
You can override serialization options that were set by <xsl:output/>
in your XSLT:
result = xslt.apply_templates(input)
result.serialize('/path/to/output.xml') {
output_property[:indent] = 'yes'
}
You can also obtain the result of the transform as an XDM Value:
result = xslt.apply_templates(input)
result.xdm_value #=> #<Saxon::XDM::Node...>
You also have easy access to provide an instance of a class implementing Saxon's net.sf.saxon.s9api.Destination
interface:
dom_document = javax.xml.parsers.DocumentBuilderFactory.newInstance.builder.newDocument
destination = Saxon::S9API::DOMDestination.new(dom_document)
result = xslt.apply_templates(input).to_destination(destination)
Setting parameters
There are four kinds of parameters you can set: Static parameters, which are
set at stylesheet compile time and cannot be changed after compilation. Global
parameters defined by top-level <xsl:parameter/>
are available throughout an
XSLT, and they can be set when the compiled XSLT is run. The other two kinds
of parameters relate to parameters passed to the first template run (either
the first template matched when called with #apply_templates
, or the named
template called with #call_template
). Initial template parameters are
essentially implied <xsl:with-parameter tunnel="no">
elements. Initial
template tunnel parameters are implied <xsl:with-parameter tunnel="yes">
elements.
# At compile time
xslt = compiler.compile(source) {
static_parameters 'static-param' => 'static value'
global_parameters 'param' => 'global value'
initial_template_parameters 'param' => 'other value'
initial_template_tunnel_parameters 'param' => 'tunnel value'
}
# At execution time
xslt.apply_templates(input, {
global_parameters: {'param' => 'global value'},
initial_template_parameters: {'param' => 'other value'},
initial_template_tunnel_parameters: {'param' => 'tunnel value'}
})
Multiple parameters can be set:
# At compile time
xslt = compiler.compile(source) {
global_parameters 'param-1' => 'a', 'param-2' => 'b'
}
# At execution time
xslt.apply_templates(input, {
global_parameters: {'param-1' => 'a', 'param-2' => 'b'}
})
Parameter names in XSLT are QNames, and values are an XDM Value. saxon-rb
will
convert Ruby values (see {Saxon::QName.resolve} and {Saxon::XDM.Value}). You can
also use explicit {Saxon::QName} or XDM values:
compiler.compile(source) {
global_parameters Saxon::QName.clark('{http://example.org/#ns}name') => Saxon::XDM.Value(1)
}
If you need to use parameter names which use a namespace prefix, you must use an explicit {Saxon::QName} to refer to it.
XPath
Using an XPath involves creating a compiler, compiling an XPath into an executable, and then running that XPath executable against an XDM node.
In order to use prefixed QNames in your XPaths, like +/ns:name/+, then you need to declare prefix/namespace URI bindings when you create a compiler.
It's also possible to make use of variables in your XPaths by declaring them at the compiler creation stage, and then passing in values for them as XPath run time.
processor = Saxon::Processor.create
xpath = processor.xpath_compiler {
namespace a: 'http://example.org/a'
variable 'a:var', 'xs:string'
}.compile('//a:element[@attr = $a:var]')
matches = xpath.evaluate(document_node, {
'a:var' => 'the value'
}) #=> Saxon::XDM::Value
The {XPath::Executable#evaluate} method returns an XDM Value containing the result sequence. For a result sequence with multiple items then it'll be a {Saxon::XDM::Value}. A single-item sequence will return an appropriate item instance - a {Saxon::XDM::Node} or a {Saxon::XDM::AtomicValue}.
You can also use the {XPath::Executable#as_enum} to return a lazy enumerator over the result.
Using your Saxon PE license and .jar
s instead of the bundled Saxon HE
Saxon 9.9 HE is bundled with the gem. To use Saxon PE or EE (the commercial
versions) you need to make the .jar
s available, and then create a licensed
Saxon::Configuration
object. To make the .jar
s available is simply a matter
of adding them to the CLASS_PATH
. The version of Saxon downloaded directly
provides several .jar
files. We provide a Saxon::Loader
method for adding
the .jar
s within the directory correctly. Saxon is distributed through Maven
as a single .jar
, which you can just add to the LOAD_PATH
/CLASS_PATH
. If
you're adding to the CLASS_PATH
directly, or calling Saxon::Loader.load!
,
then you need to do it before you try and use the library.
Loading a Saxon PE you downloaded directly from Saxonica
require 'saxon-rb'
Saxon::Loader.load!('/path/to/SaxonPE9-9-1-2J') # The folder that contains the .jars, like $SAXON_HOME
config = Saxon::Configuration.create_licensed('/path/to/saxon.lic')
processor = Saxon::Processor.create(config)
processor.xslt_compiler...
Loading a Saxon PE installed via Maven (e.g. with JBundler)
require 'jbundler'
require 'saxon-rb'
config = Saxon::Configuration.create_licensed('/path/to/saxon.lic')
processor = Saxon::Processor.create(config)
...
See https://github.com/mkristian/jbundler for more on loading Java deps from Maven.
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run
rake spec
to run the tests. You can also run bin/console
for an interactive
prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To
release a new version, update the version number in version.rb
, and then run
bundle exec rake release
, which will create a git tag for the version, push
git commits and tags, and push the .gem
file to
rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/fidothe/saxon-rb. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Saxon-rb project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.