0.0
No release in over 3 years
Low commit activity in last 3 years
The file_pipeline gem provides a framework for nondestructive application of file operation batches to files.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

file_pipeline¶ ↑

The file_pipeline gem provides a framework for nondestructive application of file operation batches to files.

Installation¶ ↑

gem install file_pipeline

Dependencies¶ ↑

The file operations included in the gem require ruby-vips for image manipulation and multi_exiftool for image file metdata extraction and manipulation.

While these dependencies should be installed automatically with the gem, ruby-vips depends on libvips, and multi_exiftool depends on Exiftool, which will not be installed automatically.

Usage¶ ↑

The basic usage is to create a new FilePipeline::Pipeline object and define any file operations that are to be performed, apply it to a FilePipeline::VersionedFile object initialized with the image to be processed, then finalize the versioned file.

require 'file_pipeline'

# create a new instance of Pipeline
my_pipeline = FilePipeline::Pipeline.new

# configure an operation to scale an image to 1280 x 960 pixels
my_pipeline.define_operation('scale', :width => 1280, :height => 960)

# create an instance of VersionedFile for the file '~/image.jpg'
image = FilePipeline::VersionedFile.new('~/image.jpg')

# apply the pipeline to the versioned file
my_pipeline.apply_to(image)

# finalize the versioned file, replacing the original
image.finalize(:overwrite => true)

Setting up a Pipeline¶ ↑

Pipeline objects can be set up to contain default file operations included in the gem or with custom file operations (see Custom file operations for instructions on how to create custom operations).

Basic set up with default operations¶ ↑

To define an operation, pass the class name of the operation in underscore notation without the containing module name, and any options to #define_operation.

The example below adds an instance of PtiffConversion with the :tile_width and :tile_height options each set to 64 pixels.

my_pipeline = FilePipeline::Pipeline.new
my_pipeline.define_operation('ptiff_conversion',
                             :tile_width => 64, :tile_height => 64)

Chaining is possible

my_pipeline = FilePipeline::Pipeline.new
my_pipeline.define_operation('scale', :width => 1280, :height => 1024)
           .define_operation('exif_restoration')

Alternatively, operations can be defined during initialization by passing a block to #new.

my_pipeline = FilePipeline::Pipeline.new do |pipeline|
  pipeline.define_operation('scale', :width => 1280, :height => 1024)
  pipeline.define_operation('exif_restoration')
end

When using the default operations included in the gem, it is sufficient to call #define_operation with the desired operations and options.

Using custom file operations¶ ↑

When file operations are to be used that are not included in the gem, place the source files for the class definitions in one or more directories and initialize the Pipeline object with the paths to those directories. The directories will be added to the source directories.

Directories are added to the source directories in reverse order, so that directories added later will have precedence when searching source files. The default operations directory included in the gem will be searched last. This allows overriding of operations without changing the code in existing classes.

If, for example, there are two directories with custom file operation classes, '~/custom_operations' and '~/other_operations', the new instance of Pipeline can be set up to look for source files first in '~/other_operations', then in '~/custom_operations', and finally in the included default operations.

The basename for source files must be the class name in underscore notation without the containing module name. If, for example, the operation is FileOperations::MyOperation, the source file basename has to be 'my_operation.rb'

my_pipeline = FilePipeline::Pipeline.new('~/custom_operations',
                                         '~/other_operations')
my_pipeline.define_operation('my_operation')

See Custom file operations for instructions for how to write file operations.

Nondestructive application to files¶ ↑

Pipelines work on versioned files, which allow for non-destructive application of all file operations.

To create a versioned file, initialize an instance with the path to the original file:

# create an instance of VersionedFile for the file '~/image.jpg'
image = FilePipeline::VersionedFile.new('~/image.jpg')

As long as no operations have been applied, this will have no effect in the file system. Only when the first operation is applied will VersionedFile create a working directory in the same directory as the original file. The working directory will have the name of the file basename without extension and the suffix '_versions' added.

Pipelines can be applied to a singe versioned file with the #apply_to method of the pipeline instance, or to an array of versioned files with the #batch_apply method of the pipeline instance.

Accessing file metadata and captured data.¶ ↑

Limitations: this currently only works for Exif metadata of image files.

VersionedFile provides access to a files metadata via the #metadata method of the versioned file instance.

Metadata for the original file, the current (latest) or an arbitrary version can be accessed:

image = FilePipeline::VersionedFile.new('~/image.jpg')

# access the metadata for the current version
image.metadata

Note that if no file operations have been applied by a pipeline object, this will return the metadata for the original, which in that case is the current (latest) version.

To explicitly get the metadata for the original file even if there are newer versions available, pass the :for_version option with the symbol :original:

# access the metadata for the original file
image.metadata(:for_version => :original)

Some file operations can comprise metadata; many image processing libraries will not preserve all Exif tags and their values when converting images to a different format, but only write a subset of tags to the file they create. In these cases, the ExifRestoration operation can be used to try to restore the tags that have been discarded. The operation uses Exiftool to write tags, and Exiftool will not write all tags. It will store any tags and their values that it could not write back to the file and return them as captured data.

Likewise, if the ExifRedaction is applied to delete sensitive tags (e.g. GPS location data), it will return all deleted exif tags and their values as captured data.

The #recovered_metadata of the versioned file instance will return a hash with all metadata that could not be restored:

delete_tags = ['CreatorTool', 'Software']

my_pipeline = FilePipeline::Pipeline.new do |pipeline|
  pipeline.define_operation('scale', width: 1280, height: 1024)
  pipeline.define_operation('exif_restoration')
end

image = FilePipeline::VersionedFile.new('~/image.jpg')
my_pipeline.apply_to(image)

# return metadata that could not be restored
image.recovered_metadata

This method will not return data that was intentionally deleted with e.g. the ExifRedaction file operation. For information on retrieving that, or other kinds of captured data, refer to the versioned file instance methods #captured_data, #captured_data_for, and #captured_data_with.

Finalizing files¶ ↑

Once all file operations of a pipeline object have been applied to a versioned file object, it can be finalized by calling the #finalize method of the instance.

Finalization will write the current version to the same directory that contains the original. It will by default preserve the original by adding a suffix to the basename of the final version. If the :overwrite option for the method is passed with true, it will delete the original and write the final version to the same basename as the original.

image = FilePipeline::VersionedFile.new('~/image.jpg')

# finalize the versioned file, preserving the original
image.finalize

# finalize the versioned file, replacing the original
image.finalize(:overwrite => true)

The work directory with all other versions will be deleted after the final version has been written.

Custom file operations¶ ↑

Module nesting¶ ↑

File operation classes must be defined in the FilePipeline::FileOperations module for automatic requiring of source files to work.

Implementing from scratch¶ ↑

Initializer¶ ↑

The #initialize method must take an options argument (a hash with a default value, or a double splat) and must be exposed through an #options getter method.

The options passed can be any to properly configure an instance of the class.

This requirement is imposed by the #define_operation instance method of Pipeline, which will automatically load and initialize an instance of the file operation with any options provided as a hash.

Examples¶ ↑
class MyOperation
  attr_reader :options

  # initializer with a default
  def initialize(options = {})
    @options = options
  end
end

class MyOperation
  attr_reader :options

  # initializer with a double splat
  def initialize(**options)
    @options = options
  end
end

Consider a file operation CopyrightNotice that whill add copyright information to an image file’s Exif metadata, the value for the copyright tag could be passed as an option.

copyright_notice = CopyrightNotice.new(:copyright => 'The Photographer')

The run method¶ ↑

File operations must implement a #run method that takes three arguments (or a splat) in order to be used in a Pipeline.

Arguments¶ ↑

The three arguments required for implementations of #run are:

  • the path to the file to be modified

  • the path to the directory to which new files will be saved.

  • the path to the original file, from which the first version in a succession of modified versions has been created.

The original file will only be used by file operations that require it for reference, e.g. to restore or recover file metadata that was compromised by other file operations.

Return value¶ ↑

If the operation modifies the file (i.e. creates a new version), the run method must return the path to the file that was created (perferrably in the directory). If it does not modify and no results are returned, it must return nil.

The method may return a Results object along with the path or nil. The results object should contain the operation itself, a success flag (true or false), and any logs or data returned by the operation.

If results are returned with the path to the created file, both values must be wrapped in an array, with the path as the first element, the results as the second. If the operation does not modify and therefore not return a path, the first element of the array must be nil.

Example¶ ↑
def run(src_file, directory, original)
  # make a path to which the created file will be written
  out_file = File.join(directory, 'new_file_name.extension')

  # create a Results object reporting success with no logs or data
  results = Results.new(self, true, nil)

  # create a new out_file based on src_file in directory
  # ...

  # return the path to the new file and the results object
  [out_file, results]
end

Captured data tags¶ ↑

Captured data tags can be used to filter captured data accumulated during successive file operations.

Operations that return data as part of the results should respond to :captured_data_tag and return one of the tag constants.

Example¶ ↑
# returns NO_DATA
def captured_data_tag
  CapturedDataTags::NO_DATA
end

Subclassing FileOperation¶ ↑

The FileOperation class is an abstract superclass that provides a scaffold to facilitate the creation of file operations that conform to the requirements.

It implements a #run method, that takes the required three arguments and returns the path to the newly created file and a Results object.

When the operation was successful, success will be true. When an exception was raised, that exeption will be rescued and returned as the log, and success will be false.

The standard #run method of the FileOperation class does not contain logic to perform the actual file operation, but will call an #operation method that must be defined in the subclass unless the subclass overrides the #run method.

If the operation is modifying (creates a new version), the #run method will generate the new path that is passed to the #operation method, and to which the latter will write the new version of the file. The new file path will need an appropriate file type extension. The default behavior is to assume that the extension will be the same as for the file that was passed in as the basis from which the new version will be created. If the operation will result in a different file type, the subclass should define a #target_extension method that returns the appropriate file extension (see Target file extensions).

Subclasses of FileOperation are by default modifying. If the operation is not modifying (does not create a new version of the file), the subclass must override the #modiies? method or override the #run method to ensure it does not return a file path (see Non-modifying operations).

Initializer¶ ↑

The initialize method must take an options argument (a hash with a default value or a double splat).

Options and defaults¶ ↑

The initializer can call super and pass the options hash and any defaults (a hash with default options). This will update the defaults with the actual options passed to initialize and assign them to the #options attribute. It will also transform any keys passed as strings into symbols.

If the initializer does not call super, it must assign the options to the @options instance variable or expose them through an #options getter method. It should transform keys into symbols.

If it calls super but must ensure some options are always set to a specific value, those should be set after the call to super.

Examples¶ ↑
# initializer without defaults callings super
def initialize(**options)
  super(options)
end

# initializer with defaults calling super
def initialize(**options)
  defaults = { :option_a => true, :option_b => false }
  super(options, defaults)
end

# initializer with defaults calling super, ensures :option_c => true
def initialize(**options)
  defaults = { :option_a => true, :option_b => false }
  super(options, defaults)
  @options[:option_c] = true
end

# initilizer that does not call super
def initialize(**options)
  @options = options
end

The operation method¶ ↑

The #operation method contains the logic specific to a given subclass of FileOperation and must be defined in that subclass unless the #run method is overwritten.

Arguments¶ ↑

The #operation method must accept three arguments:

  • the path to the file to be modified

  • the path for the file to be created by the operation.

  • the path to the original file, from which the first version in a succession of modified versions has been created.

The original file will only be used by file operations that require it for reference, e.g. to restore file metadata that was compromised by other file operations.

Return Value¶ ↑

The method can return anything that can be interpreted by LogDataParser, including nothing.

It will usually return any log outpout that the logic of #operation has generated, and/or data captured. If data is captured that is to be used later, the subclass should override the #captured_data_tag method to return the appropriate tag constant.

Examples¶ ↑
# creates out_file based on src_file, captures metadata differences
# between out_file and original, returns log messages and captured data
def operation(src_file, out_file, original)
  captured_data = {}
  log_messages = []

  # write the new version based on src_file to out_file
  # compare metadata of out_file with original, store any differences
  # in captures_data and append any log messages to log_messages

  [log_messages, captured_data]
end

# takes the third argument for the original file but does not use it
# creates out_file based on src_file, returns log messages
def operation(src_file, out_file, _)
  src_file, out_file = args
  log_messages = []

  # write the new version based on src_file to out_file

  log_messages
end

# takes arguments as a splat and destructures them to avoid having the
# unused thirs argumen
# creates out_file based on src_file, returns nothing
def operation(*args)
  src_file, out_file = args

  # write the new version based on src_file to out_file

  return
end

Non-modifying operations¶ ↑

If the operation will not create a new version, the class must redefine the #modifies? method to return false:

# non-modiyfing operation
def modifies?
  false
end

Target file extensions¶ ↑

If the file that the operation creates is of a different type than the file the version is based upon, the class must define the #target_extension method that returns the appropriate file type extension.

In most cases, the resulting file type will be predictable (static), and in such cases, the method can just return a string with the extension.

An alternative would be to provide the expected extension as an #option to the initializer.

Examples¶ ↑
# returns always '.tiff.
def target_extension
  '.tiff'
end

# returns the extension specified in #options +:extension+
# my_operation = MyOperation(:extension => '.dng')
def target_extension
  options[:extension]
end