Lab42Match
Beyond Match Data: Modify Your Matches
... and avoid matching again
Here is your API
Wrap a Regex, Get a Match (object)
Match = Lab42::Match
let(:rgx) { %r{(\d+)\.(\d*)} }
subject { Match.new(rgx) }
Query it
Example: ... to get your wrapped Regex
back
expect( subject.rgx ).to eq(rgx)
... or discover that it is not matched yet
expect( subject ).not_to be_matched
... be aware of accessing data that is not there yet!
expect{ subject[0] }.to raise_error(Match::NotMatchedYet)
expect{ subject.capts }.to raise_error(Match::NotMatchedYet)
Example: And even the wrapped MatchData
object must not be accessed
expect{ subject.match }.to raise_error(Match::NotMatchedYet)
Context Attempt an Unsuccessful Match
However after a first matching attempt the state changes and even if there was no match your queries will now return nil instead of raising
subject.match("")
expect( subject ).to be_matched
expect( subject.capts ).to be_nil
expect( subject[0] ).to be_nil
Context A Successful Match
Now things get more interesting
let(:string){ "> 42.43 <" }
let(:match_data){ subject.rgx.match(string) }
subject.match(string)
Example: Again we are matched now
expect( subject ).to be_matched
Firstly let us proof that all the information of a Regex#match
result is accessible
Example: All MatchData
data is available
expect( subject.match ).to eq(match_data)
(0..2).each do |i|
expect( subject[i] ).to eq(match_data[i])
end
expect( subject.capts ).to eq(match_data.captures)
expect( subject.subject ).to eq(string)
So What?
Well the fun starts now, we can change all parts of our matches! Did you just say all?
Indeed I did, but let us start with the obvious ones, the captures:
Example: Increase Integer Part
replacement = subject.replace(1, "43")
expect( replacement.string ).to eq("> 43.43 <")
# With a block, demonstrating also that the original object has not been altered
expect( replacement.replace(1){ |old| old.to_i.succ}.string ).to eq("> 44.43 <")
expect( subject.replace(1){ |old| old.to_i.succ}.string ).to eq("> 43.43 <")
Context All The Parts
To demonstrate that, we need a little more complex string and regex
let(:rgx){ %r{\w+\s+(\d+)\s+(\d+)\s+\w+} }
let(:string){ "> Hello 42 43 World <" }
let(:matched){ Match.new(rgx, string) }
Here is the layout of our string, and where the parts are after a successful match
|> |Hello |42| |43| World| <|
^ ^ ^ ^ ^ ^ ^
| | | | | | |
| | | | | | +---------- part[6] corresponds to MatchData#post_match
| | | | | | symbolic: :last, :post or :suffix
| | | | | +----------------- part[5] corresponds to the matched part after the last capture
| | | | | symbolic: :last_match
| | | | +-------------------- part[4] corresponds to the last capture
| | | | symbolic: :last_capture
| | | +---------------------- part[3] corresponds to the matched part between the two captures
| | |
| | +------------------------- part[2] corresponds to the first capture
| | symbolic: :first_capture
| +-------------------------------- part[1] corresponds to the matched part before the first capture
| symbolic: :first_match
+----------------------------------- part[0] corresponds to MatchData#pre_match
symbolic: :first, :pre or :prefix
Example: Demonstrate parts
expect( matched.parts ).to eq([
"> ", "Hello ", "42", " ", "43", " World", " <"
])
It can be seen easily that the indices used for #replace
and to index the captures, that is 1 based can be
transformed to point to their corresponding parts by simply doubling them.
For the parts outside the captures convenient shortcuts will be provided, and only for the parts between captures you would need to do some calculations to access them.
But then oftentimes you will make a capture group in order to change the matched text.
Let us change some parts now to see what that does
Example: Change parts by numeric index
incremented = matched.replace_part(2, 43).replace_part(-1){|s| s.reverse}
expected = [
"> ", "Hello ", "43", " ", "43", " World", "< "
]
expect( incremented.parts ).to eq(expected)
expect( incremented.string ).to eq(expected.join)
The same can be achieved by using symbolic indices which are
first: 0
first_capture: 2
first_match: 1
last: -1
last_capture: -3
last_match: -2
post: -1
pre: 0
prefix: 0
suffix: -1
Therefore the following will hold
Example: Change parts by symbolic name
modified = matched
.replace_part(:first, ">>>")
.replace_part(:first_match){ |x| x[2] }
.replace_part(:first_capture, "43")
.replace_part(:last_capture, "42")
.replace_part(:last_match){ |x| x[-1] }
.replace_part(:suffix, "<<<")
expected_parts = [
">>>", "l", "43", " ", "42", "d", "<<<"
]
expect( modified.parts ).to eq(expected_parts)
From this it follows directly
Author
Copyright © 2020 Robert Dober mailto: robert.dober@gmail.com
LICENSE
Same as Elixir -- 😉 --, which is Apache License v2.0. Please refer to LICENSE for details.
SPDX-License-Identifier: Apache-2.0