0.0
No commit activity in last 3 years
No release in over 3 years
Wraps the saxon-xslt wrapper for Saxon 9 HE, providing a simple (one function) interface for running a schematron against an XML string or file
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 10.4

Runtime

~> 1.6
 Project Readme

Schematronium

Schematronium is a gem providing:

  1. A single-object, single-function API for compiling a schematron script, and running it over an XML file, returning a Nokogiri NodeSet of the resulting failed-asserts and successful-reports.
  2. A script (schematronium) to run a schematron over one or many XML files and return aggregate date in a TBD format. Mostly meant as an example of something you could to with it. Also shows how to turn off some parser features to prevent XXE vulnerabilities, which is VERY IMPORTANT if you are parsing XML you do not personally 100% control. Schematronium does NOT do this by default.

The goals of Schematronium are very similar to schematron-wrapper. The primary difference is that, where schematron-wrapper runs the saxon jar via backticks per file, Schematronium uses the jRuby-only saxon-xslt library to compile and run the schematron. This has the upshot of not incurring the penalty of JDK initialization per file, which tends to be a substantial cost savings over even a small number of files.

Requirements

  • jRuby - Schematron is tested with jRuby 9000, but may be suitable for use with earlier jRubies.
  • JDK requirement is essentially whatever your chosen jRuby demands.
  • saxon-xslt
  • Nokogiri

API

API docs are hosted here.

The API for Schematronium is very, very minimal.

checker = Schematronium.new("schematron_filename.sch")

failed_assert_nodeset = checker.check(filename_or_IO_object_supporting_read)

Processing the NodeSet into the report or output you desire is left as an exercise to the consumer.

Known issues

Redundant parsing

Right now, the Saxon::XML::Document object returned by saxon-xslt is pretty opaque. In order to get a reasonable API on the returned results, Schematronium is just rendering the returned doc to a string, then re-parsing with Nokogiri and using its API to pull out the failed-asserts and successful-reports.

It's possible that someone who knew more about XDMDocument (and Java XML-handling in general) than your humble author might be able to dispense with the use of Nokogiri, and thus reduce dependencies and (probably) memory/execution time.