srl_ruby
Welcome to the first Ruby implementation of a Simple Regex Language (SRL) parser and compiler.
It allows you to write highly-readable text patterns in SRL and then generate their Ruby Regexp
counterparts.
Ever wanted to write challenging regular expressions but were intimided by their arcane, cryptic syntax?
With srl_ruby you can easily design your patterns in SRL and let srl_ruby transform them into terse Regexp
.
Features:
- Command-line SRL-to-Ruby compiler with customizable output.
- Ruby API for integrating a SRL parser or compiler with your code.
- 100% pure Ruby with clean design (not a port from some other language)
- Minimal runtime dependency (Rley gem). Won't drag a bunch of gems...
- Compatibility: works with Ruby 2.3+ MRI, JRuby
- Portability: tested on both Linux and Windows,...
Installation
...with Bundler
Add this line to your application's Gemfile:
gem 'srl_ruby'
And then execute:
$ bundle
...with Rubygem
Or install it directly yourself with the command line:
$ gem install srl_ruby
Usage
Let's test the installation by launching the srl2ruby (SRL-to-Ruby) command-line compiler with the help option:
$ srl2ruby --help
It should output something similar to:
Usage: srl2ruby SRL_FILE [options]
Description:
Parses a SRL file and compiles it into a Ruby Regexp literal.
Simple Regex Language (SRL) website: https://simple-regex.com
Examples:
srl2ruby example.srl
srl2ruby -p 'begin with literally "Hello world!"'
srl2ruby example.srl -o example_re.rb -t srl_and_ruby.erb
Options:
-p, --pattern SRL_PATT One-liner SRL pattern.
-o, --output-file PATH Output to a file with specified name.
-t, --template-file TEMPLATE Use given ERB template for the Ruby code generation. srl2ruby looks for
the template file in current dir first then in its gem's /templates dir.
--version Display the program version then quit.
-?, -h, --help Display this help then quit.
A quick intro to SRL and srl2ruby
What is SRL?
SRL is a small language that lets you write pattern matching expressions in a readable syntax that bears some resemblance with English. For SRL documentation and examples, we cannot but recommend you to jump to the official SRL website.
Why SRL?
It is a well-known fact: regexes can be really hard to write and even harder to read ('decipher' verb is closer to reality).
Alas, the path of creating and maintaining regexes can be full of frustration.
There comes SRL. The intent is to let developers define self-documenting patterns with an easy syntax. And then let your computer translate SRL expressions into terse regular expressions.
Our first SRL pattern
Let's succumb to the traditional 'hello world' example. True, it is a contrived example that doesn't make justice to SRL expressiveness. On the other hand, it is a starting point good enough to learn the compile cycle.
As a first step, let's create a file named hello.srl
with just the following line:
begin with literally "Hello world!"
It should read as 'Match any text that begins with the exact text "Hello world!"'
Now, if one invokes the srl2ruby
compiler with the command line...
$ srl2ruby hello.srl
... one gets the following output:
Parsing file 'hello.srl'
/^Hello world!/
The last displayed line is the Ruby Regexp
representation of the above SRL line. It can be pasted as such in your Ruby code, like in the following Ruby snippet:
subject = 'Hello world! Welcome to SRL...'
puts 'It matches!' if subject =~ /^Hello world!/
As expected the snippet results in the message:
It matches!
Quick recap:
-
srl2ruby
expects a SRL file (typically with a .srl extension) - It parses the Simple Regex Language content...
- ... then generates the
Regexp
that is equivalent to the SRL input - Finally, it prints the results to the console.
Feature: with the command-line -o
option the compiler will send the output to a file with specified name.
Gears up
Let's admit it, our first example wasn't really impressive.
So, let's try with a more imposing example inspired from the official SRL website: an email validation pattern.
begin with any of (digit, letter, one of "._%+-") once or more,
literally "@",
any of (digit, letter, one of ".-") once or more,
( literally ".",
letter at least 2 times
) optional,
must end,
case insensitive
Assume that the previous SRL pattern was put in a file named email_validation.srl
and that we invoked srl2ruby
with the following command-line:
$ srl2ruby email_validation.srl
Then the output should be:
Parsing file 'email_validation.srl'
/(?i-mx:^(?:\d|[a-z]|[._%+\-])+@(?:\d|[a-z]|[.\-])+(?:\.[a-z]{2,})?$)/
The resulting regexp isn't for the fainted hearts: who's ready to maintain it? In addition, the above pattern covers only the most frequent cases.
If you were asked to cover more exotic cases, and knowing that it means an expression at least twice as complex, which version are you willing to update the SRL or the Regexp one?
Good to know: customizable output
In fact, if one wants to update or maintain a pattern, it would be practical to have the SRL expression and its equivalent Regexp next to each other in our Ruby source code.
Can the srl2ruby
compiler help there? The answer is ... yes.
First, it is good to know that the output of the srl2ruby
compiler can be tailored with an ERB template. For instance, the output of all the previous examples is relying on a default template called base.erb
. It is bundled in the srl_ruby
gem as another template called srl_and_ruby.erb
. This second template will emit the SRL code (in Ruby comments) followed by the Regexp literal.
So let's use it with our email validation example:
$ srl2ruby email_validation.srl --template-file srl_and_ruby.erb
The shorter option -t
syntax is also possible:
$ srl2ruby email_validation.srl -t srl_and_ruby.erb
The compiler's output contains now the original SRL expression in comments:
Parsing file 'email_validation.srl'
# SRL expression follows:
# begin with any of (digit, letter, one of "._%+-") once or more,
# literally "@",
# any of (digit, letter, one of ".-") once or more,
# ( literally ".",
# letter at least 2 times
# ) optional,
# must end,
# case insensitive
#
# ... and its Regexp equivalent:
/(?i-mx:^(?:\d|[a-z]|[._%+\-])+@(?:\d|[a-z]|[.\-])+(?:\.[a-z]{2,})?$)/
The above SRL code in comments can be safely inserted in a Ruby file.
Quick recap:
- SRL can be used to specify much more challenging patterns than our boring 'Hello world!'.
- The
srl2ruby
compiler uses a ERB template to format its output. - It is possible to choose a specific template via the
-t
option.
Feature: When given the name of a template via the -t
option, the compiler will look first for such a template in the current directory, then, if not found, in its templates
directory. This gives the opportunity to use customized local template files.
Time for yet another example
As an example, let's assume that we are asked to create a regular expression that matches the time in 12 hour clock format (say, hh:mm AM/PM).
In addition, the hour and minute values must be put (= captured) in a variable named hour
and min
respectively.
We will proceed in multiple iterations of increasing complexity.
However, for those that are always in a hurry and like movie spoils, here is the requested Regexp
:
/(?i-mx:^(?<hour>(?:(?:0?\d)|(?:1[01]))):(?<min>(?:0?|[1-5])\d)\s?[AP]M$)/
Want to jump directly to the latest iteration?...
Iteration 1
Here is a very naive SRL expression that matches the requested time format:
begin with digit twice,
literally ":",
digit twice
literally " ",
one of "AP", literally "M",
must end
If one compiles the above SRL expression with srl2ruby
as explained earlier in 'Our first SRL pattern' section, it will generate the following Regexp literal:
/^\d{2}:\d{2} [AP]M$/
When I want to test regular expressions, one of my favorite tool is the Rubular website. Tom Lovitt created a great Regexp editor and tester specifically for the Ruby community.
By the way, perhaps some lynx-eyed readers spotted a small "mistake" on the third line of the SRL snippet: it doesn't end with a comma.
My apologies... For style consistency this line should be written as:
digit twice,
In reality, SRL happily ignores comma. Well..., most of the time. There is one exception: for the any of
construct commas are used to separate alternatives (see example in Iteration 3).
Iteration 2
Tests won't take a long time to show that the previous pattern is much too 'lenient' and will accept grossly incorrect entries such as 45:67 PM.
For our next iteration, we keep note that:
- The first digit (from the left) can take the values 0 or 1 only.
- The third digit may run from 0 to 5 since the highest value for the minutes is 59.
Here is the improved SRL version:
begin with digit from 0 to 1,
digit,
literally ":",
digit from 0 to 5,
digit,
literally " ",
one of "AP", literally "M"
must end
srl2ruby
will swallow the SRL file and will spit out the next Regexp:
/^[0-1]\d:[0-5]\d [AP]M$/
Iteration 3
Erroneous values like 45:67 PM are no more accepted this time. That's definitively better... But other tests will reveal that our pattern is still too permissive since it accepts values like 17:23 PM. A hour value of 17 is OK in 24 hour format but here we fail meeting our requirements...
So, for our third try, we keep note that:
- If the first hour digit is 1, then the second digit can take the values 0 or 1 only.
Let's refactor our pattern:
begin with any of (
(literally "0", digit),
(literally "1", one of "01")
)
literally ":",
digit from 0 to 5,
digit,
literally " ",
one of "AP", literally "M"
must end
Remarks:
- The indentation isn't required by SRL, but I find that it contributes to the readability...
srl2ruby
will transform this into:
/^(?:(?:0\d)|(?:1[01])):[0-5]\d [AP]M$/
Iteration 4
This time the pattern works correctly. But in the meantime, our customer changed his requirements (of course, such things never happen in real life...). He asks for more flexibility in the pattern:
- If the most significant digit value is zero, it is optional (i.e. some clock models won't display it).
- The space between the minute value and the AM/PM indicator is now optional.
- The AM/PM indicator can sometimes be written in small letters (am/pm).
Let's go for another tour:
begin with any of (
(literally "0" optional, digit),
(literally "1", one of "01")
)
literally ":",
any of (
literally "0" optional,
digit from 1 to 5
),
digit,
whitespace optional,
one of "AP", literally "M"
must end,
case insensitive
Here is the Regexp counterpart generated by srl2ruby
:
/(?i-mx:^(?:(?:0?\d)|(?:1[01])):(?:0?|[1-5])\d\s?[AP]M$)/
Iteration 5
Are we done? No: we were asked to capture the values of hours and minutes.
SRL allows for named captures, so here is the updated version:
begin with capture(
any of (
(literally "0" optional, digit),
(literally "1", one of "01")
)
) as "hour",
literally ":",
capture(
any of (
literally "0" optional,
digit from 1 to 5
),
digit
) as "min",
whitespace optional,
one of "AP", literally "M"
must end,
case insensitive
srl2ruby
will swiftly swallow the above SRL pattern and generate the following Regexp:
/(?i-mx:^(?<hour>(?:(?:0?\d)|(?:1[01]))):(?<min>(?:0?|[1-5])\d)\s?[AP]M$)/
That Regexp is becoming insane...
Does this last Regexp really work?
Glad you asked... Here is a Ruby snippet that can be used to test the last generated Regexp:
# Next Regexp was copy-pasted from srl2ruby output
pattern = /(?i-mx:^(?<hour>(?:(?:0?\d)|(?:1[01]))):(?<min>(?:0?|[1-5])\d)\s?[AP]M$)/
text = '1:43am'
matching = pattern.match(text)
if matching
print 'Capture names: '; p(matching.names) # => Capture names: ["hour", "min"]
puts "Value of 'hour': #{matching[:hour]}" # => Value of 'hour': 1
puts "Value of 'min': #{matching[:min]}" # => Value of 'min': 43
else
puts "Text '#{text}' doesn't match."
end
Running this snippet, gives the following output:
Capture names: ["hour", "min"]
Value of 'hour': 1
Value of 'min': 43
As one can see, from the input '1:43am', the Regexp captured the hour and minute values in the appropriate capture variable. Mission accomplished...
srl_ruby API
The method SrlRuby#parse
accepts a Simple Regex Language string as input, and returns the corresponding regular expression as a Regexp
instance.
For instance, the following snippet...
require 'srl_ruby' # Load srl_ruby library
# Here is a multiline SRL expression that matches dates
# in yyyy-mm-dd format
some_srl = <<-END_SRL
any of (literally "19", literally "20"), digit twice,
literally "-",
any of (
(literally "0", digit),
(literally "1", one of "012")
),
literally "-",
any of (
(literally "0", digit),
(one of "12", digit),
(literally "3", one of "01")
)
END_SRL
# Next line launches the SRL parser, it returns the corresponding regex literal
result = SrlRuby.parse(some_srl)
puts 'Equivalent regexp: /' + result + '/'
... produces the following output:
Equivalent regexp: /(?:19|20)\d{2}-(?:(?:0\d)|(?:1[012]))-(?:(?:0\d)|(?:[12]\d)|(?:3[01]))/
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/famished-tiger/SRL-Ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the SrlRuby project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.