Ruby json_stream_trigger
Gem
Instead of parsing a huge JSON files and loading it into memory, this library will stream the bytes through json-stream and only creates a small buffer for objects whose JSONPath matches a pattern you specify. When the object is completed, the specified block will be called.
Install with gem "json_stream_trigger"
in your Gemfile.
Example:
f = File.open('really_big_file.json')
stream = JsonStreamTrigger.new()
# Match each array item. If you wanted to whole array use $.data
stream.on('$.data[*]') do |json_string|
import JSON.parse(json_string, :quirks_mode => true)
end
# Will match for $.any.sub[*].item.meta
stream.on('$..meta') do |json_string|
save_meta JSON.parse(json_string, :quirks_mode => true)
end
# read in 1MB chunks
while chunk = f.read(1024)
stream << chunk
end
The captured JSON strinb buffer will be passed to the block. Note, Ruby's JSON library expects
JSON documents to be passed to it - not primatives - this is why :quirks_mode => true
has been added.
Path Details
The JSONPaths are similar to XPath notation. $
is the root,
single wild card keys can be done with $.*.version
,
or you can do muli-level wildcard with $.docs..name
.
More info on JSONPath
A few more examples:
{
meta: {version: 0.1},
docs: [
{id: 1},
{id: 2},
{id: 3},
{id: 4},
{
id: 5,
user: {
name: "Tyler"
}
}
]
}
on('$.docs[*].id') # triggers for id property of every item in docs array
on('$.docs') # returns full array of items
on('$.docs[*]') # triggers for each item in the array
on('$.docs[1].id') # returns value of ID 1
on('$.docs[*].*.name') # returns 'Tyler'
on('$..name') # matches any value who's key is 'name'
Tests
rake test