Project

trinamo

0.0
No commit activity in last 3 years
No release in over 3 years
DDL Generator for Hive from YAML
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.11
~> 10.0
~> 3.0

Runtime

 Project Readme

Trinamo

Build Status Coverage Status

Trinamo generates HiveQL using YAML to mount tables of DynamoDB, S3 and local HDFS.

Installation

Add this line to your application's Gemfile:

gem 'trinamo'

And then execute:

$ bundle

Or install it yourself as:

$ gem install trinamo

Usage

Table Definition

Generate a template for DDL

  • RUN:
Trinamo::Converter.generate_ddl_template(out_file_path = 'ddl.yml')
  • OUTPUT:
tables:
  - name: comments
    s3_location: s3://path/to/s3/table/location
    s3_partition:
      - name: date
        type: string
    hash_key:
      - name: user_id
        type: bigint
    range_key:
      - name: comment_id
        type: bigint
    attributes:
      - name: title
        type: string
      - name: content
        type: string
      - name: rate
        type: double
  - name: authors
    hash_key:
      - name: author_id
        type: bigint
    attributes:
      - name: name
        type: string

Generate a template for hive options

  • RUN:
Trinamo::Converter.generate_options_template(out_file_path = 'ddl.yml')
  • OUTPUT:
options:
  dynamodb.throughput.read.percent: 0.5
  hive.exec.compress.output: true
  io.seqfile.compression.type: BLOCK
  mapred.output.compression.codec: com.hadoop.compression.lzo.LzoCodec

Then, modify table-definitions and hive-settings as you like.

Create DDLs in HiveQL

For Options

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:option)
  • OUTPUT:
SET dynamodb.throughput.read.percent = 0.5;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec;

For DynamoDB

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:dynamodb)
  • OUTPUT:
-- comments_ddb
CREATE EXTERNAL TABLE comments_ddb (
  user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
  'dynamodb.table.name' = 'comments',
  'dynamodb.column.mapping' = 'user_id:user_id,comment_id:comment_id,title:title,content:content,rate:rate'
);

-- authors_ddb
CREATE EXTERNAL TABLE authors_ddb (
  author_id BIGINT,name STRING
)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
  'dynamodb.table.name' = 'authors',
  'dynamodb.column.mapping' = 'author_id:author_id,name:name'
);

For S3

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:s3)
  • OUTPUT:
-- comments_s3
CREATE EXTERNAL TABLE comments_s3 (
  user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
) PARTITIONED BY (date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION 's3://path/to/s3/table/location';

For HDFS

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:hdfs)
  • OUTPUT:
-- comments_hdfs
CREATE TABLE comments_hdfs (
  user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
);

-- authors_hdfs
CREATE TABLE authors_hdfs (
  author_id BIGINT,name STRING
);

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/cignoir/trinamo.

License

The gem is available as open source under the terms of the MIT License.