ORCFILE¶ ↑
Ruby Gem for reading and writing Apache Optimized Row Columnar (ORC) files. This gem can also be paired using the factory_girl gem.
Installation¶ ↑
Must use jruby.
Add this line to your application’s Gemfile:
gem 'orc_file'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install orc_file
Usage¶ ↑
OrcFileWriter
¶ ↑
To write a file, you will need to initialize the OrcFileWriter class. This object needs a table schema, your dataset, the path to store the file, and an optional configuration hash.
OrcFileWriter.new(table_schema, data_set, path, *options={})
table_schema¶ ↑
The table_schema must be a hash containing the column name and datatype as the key-value pair.
Valid datatypes are:
-
integer
-
decimal
-
float
-
date
-
datetime
-
time
-
string
table_schema = {:id => :integer, :amount => :decimal, :rate => :float}
data_set¶ ↑
The data_set must contain a hash with the column name and data value as the key-value pair.
For one row in the dataset:
data_set = {:id => 1, :amount => 1000.01, :rate => 0.0005}
For multiple rows in the dataset:
dataset = [{:id => 1, :amount => 1000.01, :rate => 0.0005}, {:id => 2, :amount => 2500.5, :rate => 0.1}, {:id => 3, :amount => 10.12, :rate => 10.0134}]
path¶ ↑
The path should be the full file path or relative to your working directory. You must also specify the file name.
path = '/temp/orc_file.orc'
options¶ ↑
Options is an optional hash parameter containing 5 configurable settings for writing an ORC file.
`:stripe_size` defines the size of the stripe, defaulted as 67,108,864 bytes <br> `:row_index_stride` defines the number of rows between row index entries, defaulted as 10,000 <br> `:buffer_size` defines the orc buffer size, defaulted as 262,144 bytes <br> `:compression` defines the compression codec (NONE,ZLIB,SNAPPY,LZO), defaulted as ZLIB. <br>
Define the options parameter has a hash
options = {:stripe_size => 70000000, :compression => 'SNAPPY'}
write_to_orc
¶ ↑
Once you have the OrcFileWriter object initialized you must call write_to_orc to write out the file
OrcFileWriter.new(table_schema, data_set, path, options).write_to_orc
OrcFileReader
¶ ↑
To read a file, you will need to initialize the OrcFileReader class. This object needs a table schema, and the path of the file to be read.
OrcFileReader.new(table_schema, path)
table_schema¶ ↑
The table_schema must be a hash containing the column name and datatype as the key-value pair.
Valid datatypes are:
-
integer
-
decimal
-
float
-
date
-
datetime
-
time
-
string
table_schema = {:id => :integer, :amount => :decimal, :rate => :float}
path¶ ↑
The path should be the full file path or relative to your working directory. You must also specify the file name.
path = '/temp/orc_file.orc'