There's a lot of open issues
No release in over a year
Parses JSON files with JSONPath read by other file input plugins.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

~> 2.0
~> 0.5
 Project Readme

Jsonpath parser plugin for Embulk

The JSON with JSONPath parser plugin for the Embulk.

Overview

  • Plugin type: parser
  • Guess supported: yes (A JSON data size supports up to 32KB. #476)

Configuration

  • type: Specify this parser as jsonpath
  • columns: Specify column name and type. See below (array, required)
  • root: Specify data path with JSONPath. It must be Array object (string, default:'$') (detail)
  • stop_on_invalid_record: Stop bulk load transaction if a file includes invalid record (such as invalid timestamp) (boolean, default: false)
  • default_timezone: Default timezone of the timestamp (string, default: UTC)
  • default_timestamp_format: Default timestamp format of the timestamp (string, default: %Y-%m-%d %H:%M:%S.%N %z)
  • default_typecast: Specify whether to cast values automatically to the specified types or not (boolean, default: true)

columns

  • name: Name of the column (string, required)
  • type: Type of the column (string, required)
  • timezone: Timezone of the timestamp if type is timestamp (string, default: default_timestamp)
  • format: Format of the timestamp if type is timestamp (string, default: default_format)
  • typecast: Whether cast values or not (boolean, default: default_typecast)
  • path: JSON ppath for specific column. (string, default: null)

Example

Basic Usage

{
  "count": 100,
  "page": 1,
  "results": [
    {
      "name": "Hugh Rutherford",
      "city": "Mitchellfurt",
      "street_name": "Ondricka Island",
      "zip_code": "75232",
      "registered_at": "2015-09-09 05:28:45",
      "vegetarian": true,
      "age": 44,
      "ratio": 79.092
    },
    {
      "name": "Miss Carmella Bashirian",
      "city": "Madilynchester",
      "street_name": "Rhea Walks",
      "zip_code": "44398",
      "registered_at": "2014-07-01 04:25:27",
      "vegetarian": true,
      "age": 73,
      "ratio": 50.608
    }]
}
in:
  type: any file input plugin type
  parser:
    type: jsonpath
    root: "$.results"
    default_timezone: "Asia/Tokyo"
    columns:
      - { name: "name",          type: string }
      - { name: "city",          type: string }
      - { name: "street_name",   type: string }
      - { name: "zip_code",      type: string }
      - { name: "registered_at", type: timestamp, format: "%Y-%m-%d %H:%M:%S" }
      - { name: "vegetarian",    type: boolean }
      - { name: "age",           type: long }
      - { name: "ratio",         type: double }

Preview results

*************************** 1 ***************************
         name (   string) : Hugh Rutherford
         city (   string) : Mitchellfurt
  street_name (   string) : Ondricka Island
     zip_code (   string) : 75232
registered_at (timestamp) : 2015-09-08 20:28:45 UTC
   vegetarian (  boolean) : true
          age (     long) : 44
        ratio (   double) : 79.092
*************************** 2 ***************************
         name (   string) : Miss Carmella Bashirian
         city (   string) : Madilynchester
  street_name (   string) : Rhea Walks
     zip_code (   string) : 44398
registered_at (timestamp) : 2014-06-30 19:25:27 UTC
   vegetarian (  boolean) : true
          age (     long) : 73
        ratio (   double) : 50.608

Handle more complicated json

If you want to handle more complicated json, you can specify jsonpath to also path in columns section like as follows:

{
    "result" : "success",
    "students" : [
      { "names" : ["John", "Lennon"], "age" : 10 },
      { "names" : ["Paul", "Maccartney"], "age" : 10 }
    ]
}
root: $.students
columns:
  - {name: firstName, type: string, path: "names[0]"}
  - {name: lastName, type: string, path: "names[1]"}

In this case, names[0] will be firstName of schema and names[1] will be lastName.

Guess

This plugin supports minimal guess feature. You don't have to write parser: section in the configuration file. After writing in: section, you can let embulk guess parser: section using this command:

$ embulk gem install embulk-parser-jsonpath
$ embulk guess -g jsonpath config.yml -o guessed.yml

Example

If you want to guess the following JSON file, (This JSON data start with array) You don't have to need parser section.

[
  {
    "name": "Hugh Rutherford",
    "city": "Mitchellfurt",
    "street_name": "Ondricka Island",
    "zip_code": "75232",
    "registered_at": "2015-09-09 05:28:45",
    "vegetarian": true,
    "age": 44,
    "ratio": 79.092
  }
]
in:
  type: file
  path_prefix: example/hoge
out:
  type: stdout

However, If a JSON data doesn't start with array, You have to specify root parameter explicitly.

{
  "count": 100,
  "page": 1,
  "results": [
    {
      "name": "Hugh Rutherford",
      "city": "Mitchellfurt",
      "street_name": "Ondricka Island",
      "zip_code": "75232",
      "registered_at": "2015-09-09 05:28:45",
      "vegetarian": true,
      "age": 44,
      "ratio": 79.092
    }
  ]
}
in:
  type: file
  path_prefix: example/input
  parser:
    type: jsonpath
    root: "$.results"
out:
  type: stdout

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

Acknowledgment

I would like to express my special thanks to the developers of embulk-parser-jsonl and embulk-filter-typecast projects.

Almost codes copied from this project.