file_pipeline¶ ↑
The file_pipeline
gem provides a framework for nondestructive application of file operation batches to files.
Installation¶ ↑
gem install file_pipeline
Dependencies¶ ↑
The file operations included in the gem require ruby-vips for image manipulation and multi_exiftool for image file metdata extraction and manipulation.
While these dependencies should be installed automatically with the gem, ruby-vips depends on libvips, and multi_exiftool depends on Exiftool, which will not be installed automatically.
Usage¶ ↑
The basic usage is to create a new FilePipeline::Pipeline object and define any file operations that are to be performed, apply it to a FilePipeline::VersionedFile object initialized with the image to be processed, then finalize the versioned file.
require 'file_pipeline' # create a new instance of Pipeline my_pipeline = FilePipeline::Pipeline.new # configure an operation to scale an image to 1280 x 960 pixels my_pipeline.define_operation('scale', :width => 1280, :height => 960) # create an instance of VersionedFile for the file '~/image.jpg' image = FilePipeline::VersionedFile.new('~/image.jpg') # apply the pipeline to the versioned file my_pipeline.apply_to(image) # finalize the versioned file, replacing the original image.finalize(:overwrite => true)
Setting up a Pipeline¶ ↑
Pipeline objects can be set up to contain default file operations included in the gem or with custom file operations (see Custom file operations for instructions on how to create custom operations).
Basic set up with default operations¶ ↑
To define an operation, pass the class name of the operation in underscore notation without the containing module name, and any options to #define_operation.
The example below adds an instance of PtiffConversion with the :tile_width
and :tile_height
options each set to 64 pixels.
my_pipeline = FilePipeline::Pipeline.new my_pipeline.define_operation('ptiff_conversion', :tile_width => 64, :tile_height => 64)
Chaining is possible
my_pipeline = FilePipeline::Pipeline.new my_pipeline.define_operation('scale', :width => 1280, :height => 1024) .define_operation('exif_restoration')
Alternatively, operations can be defined during initialization by passing a block to #new.
my_pipeline = FilePipeline::Pipeline.new do |pipeline| pipeline.define_operation('scale', :width => 1280, :height => 1024) pipeline.define_operation('exif_restoration') end
When using the default operations included in the gem, it is sufficient to call #define_operation
with the desired operations and options.
Using custom file operations¶ ↑
When file operations are to be used that are not included in the gem, place the source files for the class definitions in one or more directories and initialize the Pipeline object with the paths to those directories. The directories will be added to the source directories.
Directories are added to the source directories in reverse order, so that directories added later will have precedence when searching source files. The default operations directory included in the gem will be searched last. This allows overriding of operations without changing the code in existing classes.
If, for example, there are two directories with custom file operation classes, '~/custom_operations'
and '~/other_operations'
, the new instance of Pipeline can be set up to look for source files first in '~/other_operations'
, then in '~/custom_operations'
, and finally in the included default operations.
The basename for source files must be the class name in underscore notation without the containing module name. If, for example, the operation is FileOperations::MyOperation
, the source file basename has to be 'my_operation.rb'
my_pipeline = FilePipeline::Pipeline.new('~/custom_operations', '~/other_operations') my_pipeline.define_operation('my_operation')
See Custom file operations for instructions for how to write file operations.
Nondestructive application to files¶ ↑
Pipelines work on versioned files, which allow for non-destructive application of all file operations.
To create a versioned file, initialize an instance with the path to the original file:
# create an instance of VersionedFile for the file '~/image.jpg' image = FilePipeline::VersionedFile.new('~/image.jpg')
As long as no operations have been applied, this will have no effect in the file system. Only when the first operation is applied will VersionedFile create a working directory in the same directory as the original file. The working directory will have the name of the file basename without extension and the suffix '_versions'
added.
Pipelines can be applied to a singe versioned file with the #apply_to method of the pipeline instance, or to an array of versioned files with the #batch_apply method of the pipeline instance.
Accessing file metadata and captured data.¶ ↑
Limitations: this currently only works for Exif metadata of image files.
VersionedFile provides access to a files metadata via the #metadata method of the versioned file instance.
Metadata for the original file, the current (latest) or an arbitrary version can be accessed:
image = FilePipeline::VersionedFile.new('~/image.jpg') # access the metadata for the current version image.metadata
Note that if no file operations have been applied by a pipeline object, this will return the metadata for the original, which in that case is the current (latest) version.
To explicitly get the metadata for the original file even if there are newer versions available, pass the :for_version
option with the symbol :original
:
# access the metadata for the original file image.metadata(:for_version => :original)
Some file operations can comprise metadata; many image processing libraries will not preserve all Exif tags and their values when converting images to a different format, but only write a subset of tags to the file they create. In these cases, the ExifRestoration operation can be used to try to restore the tags that have been discarded. The operation uses Exiftool to write tags, and Exiftool will not write all tags. It will store any tags and their values that it could not write back to the file and return them as captured data.
Likewise, if the ExifRedaction is applied to delete sensitive tags (e.g. GPS location data), it will return all deleted exif tags and their values as captured data.
The #recovered_metadata of the versioned file instance will return a hash with all metadata that could not be restored:
delete_tags = ['CreatorTool', 'Software'] my_pipeline = FilePipeline::Pipeline.new do |pipeline| pipeline.define_operation('scale', width: 1280, height: 1024) pipeline.define_operation('exif_restoration') end image = FilePipeline::VersionedFile.new('~/image.jpg') my_pipeline.apply_to(image) # return metadata that could not be restored image.recovered_metadata
This method will not return data that was intentionally deleted with e.g. the ExifRedaction file operation. For information on retrieving that, or other kinds of captured data, refer to the versioned file instance methods #captured_data, #captured_data_for, and #captured_data_with.
Finalizing files¶ ↑
Once all file operations of a pipeline object have been applied to a versioned file object, it can be finalized by calling the #finalize method of the instance.
Finalization will write the current version to the same directory that contains the original. It will by default preserve the original by adding a suffix to the basename of the final version. If the :overwrite
option for the method is passed with true
, it will delete the original and write the final version to the same basename as the original.
image = FilePipeline::VersionedFile.new('~/image.jpg') # finalize the versioned file, preserving the original image.finalize # finalize the versioned file, replacing the original image.finalize(:overwrite => true)
The work directory with all other versions will be deleted after the final version has been written.
Custom file operations¶ ↑
Module nesting¶ ↑
File operation classes must be defined in the FilePipeline::FileOperations module for automatic requiring of source files to work.
Implementing from scratch¶ ↑
Initializer¶ ↑
The #initialize
method must take an options
argument (a hash with a default value, or a double splat) and must be exposed through an #options
getter method.
The options passed can be any to properly configure an instance of the class.
This requirement is imposed by the #define_operation instance method of Pipeline, which will automatically load and initialize an instance of the file operation with any options provided as a hash.
Examples¶ ↑
class MyOperation attr_reader :options # initializer with a default def initialize(options = {}) @options = options end end class MyOperation attr_reader :options # initializer with a double splat def initialize(**options) @options = options end end
Consider a file operation CopyrightNotice
that whill add copyright information to an image file’s Exif metadata, the value for the copyright tag could be passed as an option.
copyright_notice = CopyrightNotice.new(:copyright => 'The Photographer')
The run
method¶ ↑
File operations must implement a #run
method that takes three arguments (or a splat) in order to be used in a Pipeline.
Arguments¶ ↑
The three arguments required for implementations of #run
are:
-
the path to the file to be modified
-
the path to the directory to which new files will be saved.
-
the path to the original file, from which the first version in a succession of modified versions has been created.
The original file will only be used by file operations that require it for reference, e.g. to restore or recover file metadata that was compromised by other file operations.
Return value¶ ↑
If the operation modifies the file (i.e. creates a new version), the run
method must return the path to the file that was created (perferrably in the directory). If it does not modify and no results are returned, it must return nil
.
The method may return a Results object along with the path or nil
. The results object should contain the operation itself, a success flag (true
or false
), and any logs or data returned by the operation.
If results are returned with the path to the created file, both values must be wrapped in an array, with the path as the first element, the results as the second. If the operation does not modify and therefore not return a path, the first element of the array must be nil
.
Example¶ ↑
def run(src_file, directory, original) # make a path to which the created file will be written out_file = File.join(directory, 'new_file_name.extension') # create a Results object reporting success with no logs or data results = Results.new(self, true, nil) # create a new out_file based on src_file in directory # ... # return the path to the new file and the results object [out_file, results] end
Captured data tags¶ ↑
Captured data tags can be used to filter captured data accumulated during successive file operations.
Operations that return data as part of the results should respond to :captured_data_tag
and return one of the tag constants.
Example¶ ↑
# returns NO_DATA def captured_data_tag CapturedDataTags::NO_DATA end
Subclassing FileOperation¶ ↑
The FileOperation class is an abstract superclass that provides a scaffold to facilitate the creation of file operations that conform to the requirements.
It implements a #run method, that takes the required three arguments and returns the path to the newly created file and a Results object.
When the operation was successful, success will be true
. When an exception was raised, that exeption will be rescued and returned as the log, and success will be false
.
The standard #run
method of the FileOperation class does not contain logic to perform the actual file operation, but will call an #operation method that must be defined in the subclass unless the subclass overrides the #run
method.
If the operation is modifying (creates a new version), the #run
method will generate the new path that is passed to the #operation
method, and to which the latter will write the new version of the file. The new file path will need an appropriate file type extension. The default behavior is to assume that the extension will be the same as for the file that was passed in as the basis from which the new version will be created. If the operation will result in a different file type, the subclass should define a #target_extension
method that returns the appropriate file extension (see Target file extensions).
Subclasses of FileOperation are by default modifying. If the operation is not modifying (does not create a new version of the file), the subclass must override the #modiies?
method or override the #run
method to ensure it does not return a file path (see Non-modifying operations).
Initializer¶ ↑
The initialize
method must take an options
argument (a hash with a default value or a double splat).
Options and defaults¶ ↑
The initializer can call super
and pass the options
hash and any defaults (a hash with default options). This will update the defaults with the actual options passed to initialize
and assign them to the #options attribute. It will also transform any keys passed as strings into symbols.
If the initializer does not call super
, it must assign the options to the @options
instance variable or expose them through an #options
getter method. It should transform keys into symbols.
If it calls super
but must ensure some options are always set to a specific value, those should be set after the call to super
.
Examples¶ ↑
# initializer without defaults callings super def initialize(**options) super(options) end # initializer with defaults calling super def initialize(**options) defaults = { :option_a => true, :option_b => false } super(options, defaults) end # initializer with defaults calling super, ensures :option_c => true def initialize(**options) defaults = { :option_a => true, :option_b => false } super(options, defaults) @options[:option_c] = true end # initilizer that does not call super def initialize(**options) @options = options end
The operation
method¶ ↑
The #operation
method contains the logic specific to a given subclass of FileOperation and must be defined in that subclass unless the #run
method is overwritten.
Arguments¶ ↑
The #operation
method must accept three arguments:
-
the path to the file to be modified
-
the path for the file to be created by the operation.
-
the path to the original file, from which the first version in a succession of modified versions has been created.
The original file will only be used by file operations that require it for reference, e.g. to restore file metadata that was compromised by other file operations.
Return Value¶ ↑
The method can return anything that can be interpreted by LogDataParser, including nothing.
It will usually return any log outpout that the logic of #operation
has generated, and/or data captured. If data is captured that is to be used later, the subclass should override the #captured_data_tag
method to return the appropriate tag constant.
Examples¶ ↑
# creates out_file based on src_file, captures metadata differences # between out_file and original, returns log messages and captured data def operation(src_file, out_file, original) captured_data = {} log_messages = [] # write the new version based on src_file to out_file # compare metadata of out_file with original, store any differences # in captures_data and append any log messages to log_messages [log_messages, captured_data] end # takes the third argument for the original file but does not use it # creates out_file based on src_file, returns log messages def operation(src_file, out_file, _) src_file, out_file = args log_messages = [] # write the new version based on src_file to out_file log_messages end # takes arguments as a splat and destructures them to avoid having the # unused thirs argumen # creates out_file based on src_file, returns nothing def operation(*args) src_file, out_file = args # write the new version based on src_file to out_file return end
Non-modifying operations¶ ↑
If the operation will not create a new version, the class must redefine the #modifies?
method to return false
:
# non-modiyfing operation def modifies? false end
Target file extensions¶ ↑
If the file that the operation creates is of a different type than the file the version is based upon, the class must define the #target_extension
method that returns the appropriate file type extension.
In most cases, the resulting file type will be predictable (static), and in such cases, the method can just return a string with the extension.
An alternative would be to provide the expected extension as an #option to the initializer.
Examples¶ ↑
# returns always '.tiff. def target_extension '.tiff' end # returns the extension specified in #options +:extension+ # my_operation = MyOperation(:extension => '.dng') def target_extension options[:extension] end