Proselytism
Document converter, text and image extractor using OpenOffice headless server (JOD or PYOD converter), pdf_tools and net_pbm
Handled formats for document conversion : odt, doc, rtf, sxw, docx, txt, html, htm, wps, pdf
Note
This gem has been originally written as a RoR 3.2 engine running on Ruby 1.8.7.
It is framework agnostic and has been tested on Ubuntu and MacOSX.
Installation
Install the required external librairies :
# aptitude install netpbm
# aptitude install xpdf
# aptitude install libreoffice
Add this line to your application's Gemfile:
gem 'proselytism'
Note : for ruby 1.9 use the branch 1.9
gem 'proselytism', :git => "git://github.com/itkin/proselytism.git", :branch => "1.9"
And then execute:
$ bundle
##Configuration
- With a YAML config file:
rails g proselytism:config
As a Rails engine, Proselytism automatically load /config/proselytism.yml (if the file exists) and set its config params depending on the current rails env.
- With an initializer (optional for Rails App) :
You can override the configuration file params by adding a custom initializer to /config/initializers . By default Proselytism will log in a separate log file, if you want to use the rails logger
#/config/initializers/proselytism.rb
Proselytism.config do |config|
config.logger = Rails.logger
end
To generate a full config initializer:
rails g proselytism:initializer
Usage
Proselytism.convert source_file_path, :to => :pdf do |converted_file_path|
end
Proselytism.extract_text source_file_path do |extracted_text|
end
Proselytism.extract_images source_file_path do |image_files_paths|
end
Proselytism creates its converted files in temporary folders.
- If you pass a block to the method above the folders are automatically deleted after the block is yield, so use or copy the file content within the block
- If you don't pass a block, the mentioned folder and its content remains permanently, so don't forget to safely remove it yourself
pdf_file_path = Proselytism.convert source_file_path, :to => :pdf
#my code
FileUtils.remove_entry_secure File.dirname(pdf_file_path)
Add your own converters
Add your own converter by extending Proselytism::Converters::Base
- Your converter will be automatically selected and used related to the params given to the :from and :to methods
- Add a perform method which
- calls the execute method with your custom command
- returns the converted file(s) path(s)
Proselytism::Converters::Base takes care of
- raising error (if the command execution fail)
- logging the command output
class MyConverter < Proselytism::Converters::Base
class Error < parent::Base::Error; end
form :ext1, :ext2
to :ext3, :ext4
def perform(origin, options={})
destination = destination_file_path(origin, options)
command = "mycommand #{origin} #{destination} 2>&1"
execute command
destination
end
end
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request