No commit activity in last 3 years
No release in over 3 years
Loads records from Elasticsearch. parallel query support. Allows connect to self signed ssl servers
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 1.10.6
>= 0.8.18
>= 10.0

Runtime

 Project Readme

Elasticsearch input plugin for Embulk Build Status Gem Version

Overview

  • Plugin type: input
  • Resume supported: yes
  • Cleanup supported: yes
  • Guess supported: no

Configuration

  • nodes: nodes (array, required)
    • host: host (string, required)
    • port: port (integer, required)
  • queries: lucene query array. (array, required)
  • index: index (string, required)
  • index_type: index_type (string)
  • request_timeout: request timeout (integer)
  • per_size: per size query. (integer, required, default: 1000)
  • limit_size: limit size unit query. (integer, default: unlimit)
  • num_threads: number of threads for queries. (integer, default: 1)
  • retry_on_failure: retry on failure. set 0 is retry forever. (integer, default: 5)
  • sort: sort order. (hash, default: nil)
  • scroll: scroll. to keep the search context. (string, default: '1m')
  • fields: fields (array, required)
    • name: name (string, required)
    • type: type (string, required)
    • metadata: metadata (boolean, default: false)
    • time_format: time_format (string)

Example

in:
  type: elasticsearch
  nodes:
    - {host: localhost, port: 9200}
  queries:
    - 'page_type: HP'
    - 'page_type: GP'
  index: crawl
  index_type: m_corporation_page
  request_timeout: 60
  per_size: 1000
  limit_size: 200000
  num_threads: 2
  sort:
    m_corporation_id: desc
    employee_range: asc
  fields:
    - { name: _id, type: string, metadata: true }
    - { name: _type, type: string, metadata: true }
    - { name: _index, type: string, metadata: true }
    - { name: _score, type: double, metadata: true }
    - { name: page_type, type: string }
    - { name: corp_name, type: string }
    - { name: corp_key, type: string }
    - { name: title, type: string }
    - { name: body, type: string }
    - { name: url, type: string }
    - { name: employee_range, type: long }
    - { name: m_corporation_id, type: long }
    - { name: cg_lv1, type: json }
    - { name: cg_lv2, type: json }
    - { name: cg_lv3, type: json }

Support Type

  • string
  • long
  • double
  • timestamp
  • json
  • boolean

test

setup

curl -o embulk.jar --create-dirs -L "http://dl.embulk.org/embulk-latest.jar"
chmod +x embulk.jar
./embulk.jar gem install bundler
./embulk.jar bundle install --path vendor/bundle

run test

./embulk.jar bundle exec rake test

Build

$ rake