Low commit activity in last 3 years
A long-lived project that still receives updates
Commons Compress decoder plugin is an Embulk plugin that decodes files using Apache Commons Compress library. It is read by any file input plugins. Search the file input plugins by 'embulk-input file' keywords.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 1.0
>= 10.0
 Project Readme

Commons Compress decoder plugin for Embulk

Build Status

This decoder plugin for Embulk supports various archive formats using Apache Commons Compress library.

Overview

  • Plugin type: decoder
  • Load all or nothing: yes
  • Resume supported: no

Configuration

  • format: An archive format like tar, zip, and so on. (string, optional, default: "")
    • The format type is one of supported formats by by Apache Commons Compress.
    • Auto detect is used when there is no configuration. This can use for a single format. If a file format is solid compression like tar.gz, please set format config explicitly.
    • Some listing formats in Apache Commons Compress may not work in your environment. I could confirm the following formats work well. Your environment may be able to use other formats listed in the site.
  • decompress_concatenated: gzip, bzip2, and xz formats support multiple concatenated streams. The default value of this parameter is true. If you want to disable it, then set to false. See CompressorStreamFactory.setDecompressConcatenated() in ver.1.9 for more details.
  • match_name: Only the files in an archive which match to match_name are processed. match_name is set by regular expression.

Formats

  • archive format: ar, cpio, jar, tar, zip
    • These formats are archive formats. All files in an archive are processed by embulk.
  • compress format: bzip2, deflate, gzip
    • These formats are compress formats. Uncompressed file is processed by embulk.
  • solid compression format: Need to set format config parameter explicitly.
    • tgz, tar.gz
    • tbz, tbz2, tb2, tar.bz2
    • taz, tz, tar.Z

Example

  • Use auto detection. This can use for 1 format like tar and zip. If you would like to use a solid compression format like tar.gz, please set the format to your configuration file.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
  • Set a file format like tar explicit.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      format: tar
  • Set a solid compression format.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      format: tgz
  • Set decompress_concatenated to false if you would like to read the first concatenated gzip/bzip2 archive only.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      decompress_concatenated: false
  • Set match_name to extract only the files whose suffix is '.csv' from an archive.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      match_name: ".*\\.csv"

Build

$ ./gradlew gem

To build with integrationTest(It works on OSX or Linux)

$ ./gradlew -DenableIntegrationTest=true clean all

Versions

This plugin version 0.6.0 or later can use with Embulk 0.10.

Reference