metacrunch-file
This is the official file package for the metacrunch ETL toolkit.
Note: For working examples on how to use this package check out our demo repository.
Installation
Include the gem in your Gemfile
gem "metacrunch-file", "~> 1.5.0"
and run $ bundle install
to install it.
Or install it manually
$ gem install metacrunch-file
Usage
Metacrunch::File::FileSource
This class provides a metacrunch source
implementation that can be used to read data from files in the file system into a metacrunch job. The class can be used to read regular files, compressed files (gzip), tar archives and compressed tar archives (gzip).
# my_job.metacrunch
# If you call this example like so
# $ metacrunch my_job.metacrunch *.xml
# ARGV will contain all the XML files in the current directory.
source Metacrunch::File::FileSource.new(ARGV)
# ... or you can set the filenames directly
source Metacrunch::File::FileSource.new(["my-data.xml", "my-other-data.xml", "..."])
Options
NONE.
The source yields objects of type Metacrunch::File::Entry
for every file it reads.
# my_job.metacrunch
transformation ->(file_entry) do
puts "** Got file entry (Metacrunch::File::Entry)"
puts " Filename: #{file_entry.filename}"
puts " From archive?: #{file_entry.from_archive?}"
puts " Name in archive: #{file_entry.archive_filename || '-'}"
puts " Contents: #{file_entry.contents}"
end
Metacrunch::File::FileDestination
This class provides a metacrunch destination
to write data to a file. Every data that gets passed to the destination is appended to the given file. If the data is an Array
every element of that array is appended to the file. Non existing files will be created automatically.
# my_job.metacrunch
destination Metacrunch::File::FileDestination.new("/tmp/my-data.txt" [, OPTIONS])
Options
-
override_existing_file
: Overrides an existing file if set totrue
. If set tofalse
an error is raised if the file already exists. Defaults tofalse
.
Metacrunch::File::CSVSource
This class provides a metacrunch source
for reading CSV files. It is a simple wrapper around smarter_csv gem.
# my_job.metacrunch
source Metacrunch::File::CSVSource.new("my.csv" [, OPTIONS])
Options
-
headers
: Whether or not the file contains headers as the first line. Important if the file does not contain headers, otherwise you would lose the first line of data. Defaults totrue
. -
col_sep
: Column separator. Defaults to,
. -
row_sep
: Row separator or record separator. Defaults to\n
. -
quote_char
: Quotation character. Defaults to"
. -
file_encoding
: Set the file encoding. Defaults toutf-8
.
Metacrunch::File::CSVDestination
This class provides a metacrunch desination
for writing CSV files. Because smarter_csv can only be used to read CSV, this class uses Ruby's build in CSV feature under the hood.
# my_job.metacrunch
destination Metacrunch::File::CSVDestination.new(
"result.csv", # filename
["Header 1", "Header 2", ...], # headers
[, OPTIONS]
)
Options
-
override_existing_file
: Overrides an existing file if set totrue
. If set tofalse
an error is raised if the file already exists. Defaults tofalse
. -
csv_options
: Set options for CSV generation ascol_sep
. Full list is here.
Metacrunch::File::XLSXDestination
This class provides a metacrunch destination
implementation to create simple Excel (xlsx) files.
To use this destination a transformation is required to format the data in a proper array that can be passed to the destination. When defining the destination you must provide an array of column names. Each data row passed to the destination must be an array of the same size as the column array.
# my_job.metacrunch
transformation ->(data) do
[data["foo"], data["bar"], ...]
end
destination Metacrunch::File::XLSXDestination.new(
"/tmp/my-data.xlsx", # filename
["Column 1", "Column 2", ...], # header columns
OPTIONS
)
Options
-
worksheet_title
: The name of the worksheet. Defaults toMy data
.
License
metacrunch-file is available at github under MIT license.