Vertica output plugin for Embulk
Overview
- Plugin type: output
- Resume supported: no
- Cleanup supported: yes
- Dynamic table creating: yes
Configuration
- host: hostname (string, default: localhost)
- port: port number (integer, default: 5433)
- user: user name (string, required)
- password: password (string, default: '')
- database: database name (string, default: vdb)
- schema: schema name (string, default: public)
- table: table name (string, required)
- mode: "insert", or "replace". See bellow. (string, default: insert)
-
copy_mode: specifies how data is loaded into the database. See vertica documents for details. (
AUTO
,DIRECT
, orTRICKLE
. default:AUTO
) - pool: number of output threads, this number controls number of concurrency to issue COPY statements (integer, default: processor_count, that is, number of threads in input plugin)
- abort_on_error: stops the COPY command if a row is rejected and rolls back the command. No data is loaded. (bool, default: false)
-
compress: compress input (
GZIP
, orUNCOMPRESSED
, default:UNCOMPRESSED
) -
reject_on_materialized_type_error: uses
reject_on_materialized_type_error
option for fjsonparser(). This rejects rows if any of column types and value types do not fit, ex) double value into INT column fails. See vertica documents for details. (bool, default: false) - default_timezone: the default timezone for column_options (string, default is "UTC")
-
column_options: advanced: a key-value pairs where key is a column name and value is options for the column.
-
type: type of a column when this plugin creates new tables such as
VARCHAR(255)
,INTEGER NOT NULL UNIQUE
. This is used on creating intermediate tables (insert and truncate_insert modes) and on creating a new target table. (string, default: depends on input column type, see below)- boolean:
BOOLEAN
- long:
INT
(same withBIGINT
in vertica) - double:
FLOAT
(same withDOUBLE PRECISION
in vertica) - string:
VARCHAR
- timestamp:
TIMESTAMP
- boolean:
-
value_type: The types (embulk types) of values to convert (string, default: no conversion. See below for available types)
- boolean:
boolean
,string
- long:
boolean
,long
,double
,string
,timestamp
- double:
boolean
,long
,double
,string
,timestamp
- string:
boolean
,long
,double
,string
,timestamp
- timestamp:
boolean
,long
,double
,string
,timestamp
- boolean:
-
timestamp_format: timestamp format to convert into/from
timestamp
(string, default is "%Y-%m-%d %H:%M:%S %z") -
timezone: timezone to convert into/from
timestamp
(string, default isdefault_timezone
).
-
type: type of a column when this plugin creates new tables such as
- json_payload: Assuming first columns of records are json string, COPY each json into vertica directly using fjsonparser. This would improve performance by avoiding construction of json in this jruby plugin. ToDo: auto table creation is not supported for this mode yet (bool, default is false)
- resource_pool: Configure resource pool (string, default is nil)
Modes
-
insert:
- Behavior: This mode copies rows to an intermediate table first. If all those tasks run correctly, runs
INSERT INTO <target_table> SELECT * FROM <intermediate_table>
- Behavior: This mode copies rows to an intermediate table first. If all those tasks run correctly, runs
-
replace:
- Behavior: This mode copies rows to an intermediate table first. If all those tasks run correctly, swaps the target table and the intermediate table, and drops the old target table
-
drop_insert:
- Behavior: Same with insert mode excepting that it drops the target table first. This is aimed to use for development.
Example
out:
type: vertica
host: 127.0.0.1
user: dbadmin
password: xxxxxxx
database: vdb
schema: sandbox
table: embulk_test
copy_mode: DIRECT
abort_on_error: true
column_options:
id: {type: INT}
name: {type: VARCHAR(255)}
date: {type: DATE, value_type: timestamp, timezone: "+09:00"}
Development
Run example:
$ embulk bundle install --path vendor/bundle
$ embulk -J-O -R--dev run -b . -l debug example/example.yml
Release gem:
$ bundle exec rake release