Naginegi
Generate Embulk config and BigQuery schema from MySQL and PostgreSQL schema and run Embulk.
Installation
Add this line to your application's Gemfile:
gem 'naginegi'
And then execute:
$ bundle
Or install it yourself as:
$ gem install naginegi
Embulk setup
Naginegi
is utility for Embulk
.
You need to install Embulk
and install some gems like below.
embulk gem install embulk-input-mysql --version 0.10.1
embulk gem install embulk-input-postgresql --version 0.10.1
embulk gem install embulk-output-bigquery --version 0.6.4
embulk gem install embulk-parser-jsonl --version 0.2.0
embulk gem install embulk-formatter-jsonl --version 0.1.4
Usage
Require database.yml
and table.yml
.
database.yml
requires db_type
(mysql or postgresql).
RDBMS | db_type |
---|---|
MySQL | mysql |
PostgreSQL | postgresql |
Below is a sample config file.
database.yml
db01:
db_type: mysql
host: localhost
username: root
password: pswd
database: production
bq_dataset: mysql
db02:
db_type: postgresql
host: localhost
username: root
password: pswd
database: production
bq_dataset: pg
Caution: Embulk doesn't allow no password for MySQL
table.yml
db01:
tables:
- name: users
- name: events
- name: hobbies
db02:
tables:
- name: administrators
- name: configs
sample
Naginegi requires BigQuery parameters like below.
using json key file path
[sample.rb]
require 'naginegi'
config = {
'project_id' => 'BIGQUERY_PROJECT_ID',
'service_email' => 'SERVICE_ACCOUNT_EMAIL',
'auth_method' => 'json_key',
'json_keyfile' => 'JSON_KEYFILE_PATH',
'schema_dir' => '/var/tmp/embulk/schema',
'config_dir' => '/var/tmp/embulk/config'
}
client = Naginegi::EmbulkRunner.new
client.generate_config(config)
client.run(config)
ruby sample.rb
using key values
[sample.rb]
require 'naginegi'
json_key = {
"type" => "...",
"project_id" => "...",
"private_key_id" => "...",
"private_key" => "...",
...
}
config = {
'project_id' => 'BIGQUERY_PROJECT_ID',
'service_email' => 'SERVICE_ACCOUNT_EMAIL',
'auth_method' => 'json_key',
'json_key' => json_key,
'schema_dir' => '/var/tmp/embulk/schema',
'config_dir' => '/var/tmp/embulk/config'
}
client = Naginegi::EmbulkRunner.new
client.generate_config(config)
client.run(config)
ruby sample.rb
Features
process status
Naginegi
returns process status as boolean.
If all tables are succeed, then returns true
, else false
.
It is useful to control system flow.
process_status = Naginegi::EmbulkClient.new.run(config)
exit 1 unless process_status
narrow tables
You can narrow actual target tables from table.yml
for test or to retry.
If no target tables is given, Naginegi
will execute all tables.
# in case, all tables are ['users', 'purchases', 'items']
target_tables = ['users', 'purchases']
Naginegi::EmbulkClient.new.run(config, target_tables)
retry
You can set retry count.
If any table failed, only failed table will be retried until retry count.
If no retry count is given, Naginegi
dosen't retry.
# 2 times retry will execute
Naginegi::EmbulkClient.new.run(config, [], 2)
SQL condition
If you set condition
to a table in table.yml
, SQL is generated like below.
It is useful for large size table.
[table.yml]
production:
tables:
- name: users
- name: events
conditon: created_at < CURRENT_DATE()
SELECT * FROM users
SELECT * FROM events WHERE created_at < CURRENT_DATE()
daily snapshot
BigQuery supports table wildcard expression of a specific set of daily tables, for example, sales20150701
.
If you need daily snapshot of a table for BigQuery, use daily_snapshot
option to database.yml
or table.yml
like below.
daily_snapshot
option effects all tables in case of database.yml
.
On the other hand, only target table in table.yml
.
Daily part is determined by execute date.
[database.yml]
production:
host: localhost
username: root
password: pswd
database: production
bq_dataset: mysql
daily_snapshot: true
[table.yml]
production:
tables:
- name: users
- name: events
daily_snapshot: true
- name: hobbies
Only `events` is renamed to `eventsYYYYMMDD` for BigQuery.
Contributing
- Fork it ( https://github.com/[my-github-username]/naginegi/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request