docsplit_images
Docsplit images is used to convert a document file (pdf, xls, xlsx, ppt, pptx, doc, docx, etc...) to a list of images combining with famous paperclip gem at https://github.com/thoughtbot/paperclip
Installation
Install Docsplit gem dependency (Referring from http://documentcloud.github.com/docsplit/)
1. Install GraphicsMagick. Its ‘gm’ command is used to generate images. Either compile it from source, or use a package manager:
[aptitude | port | brew] install graphicsmagick
2. Install Poppler. On Linux, use aptitude, apt-get or yum:
aptitude install poppler-utils poppler-data
On Mac, you can install from source or use MacPorts:
sudo port install poppler | brew install poppler
3. (Optional) Install Ghostscript:
[aptitude | port | brew] install ghostscript
Ghostscript is required to convert PDF and Postscript files.
4. (Optional) Install Tesseract:
[aptitude | port | brew] install [tesseract | tesseract-ocr]
Without Tesseract installed, you'll still be able to extract text from documents, but you won't be able to automatically OCR them.
5. (Optional) Install pdftk. On Linux, use aptitude, apt-get or yum:
aptitude install pdftk
On the Mac, you can download a [http://www.pdflabs.com/docs/install-pdftk/](recent installer for the binary). Without pdftk installed, you can use Docsplit, but won't be able to split apart a multi-page PDF into single-page PDFs.
6. (Optional) Install OpenOffice. On Linux, use aptitude, apt-get or yum:
aptitude install openoffice.org openoffice.org-java-common
On Mac, download and install http://www.openoffice.org/download/index.html.
Install Gem
gem 'docsplit_images', :git => 'https://github.com/jameshuynh/docsplit_images.git', tag: "v0.2.4"
Setting Up
From terminal, type the command to install
bundle
rails g docsplit_images <table_name> <attachment_field_name>
# e.g. rails generate docsplit_images asset document
rake db:migrate
In your model:
class Asset < ActiveRecord::Base
...
attr_accessible :mydocument
has_attached_file :mydocument
docsplit_images_conversion_for :mydocument, {size: "800x"}
...
end
Processing Images
docsplit_images requires sidekiq to be turned on the process.
[bundle exec] sidekiq
While it is processing using https://github.com/collectiveidea/delayed_job, you can check if it is processing by accessing attribute is_processing_image
asset.is_processing_image?
Total number of pages
- If your document file is not PDF, this will be non-zero after the internal conversion to PDF has been completed.
asset.number_of_images_entry
Checking the number of images which has been completed
asset.number_of_completed_images
Checking the overall conversion progress
asset.images_conversion_progress
# => 0.45 (which is 45%)
Accessing list of images using document_images_list
document_images_list
will return a list of URL of images converting from the document
asset.document_images_list
# => ["/system/myfile_revisions/files/000/000/019/images/SBA_Admin_workflow_1.png", "/system/myfile_revisions/files/000/000/019/images/SBA_Admin_workflow_2.png", ...]
Contributing to docsplit_images
- Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
- Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
- Fork the project.
- Start a feature/bugfix branch.
- Commit and push until you are happy with your contribution.
- Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
- Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
Copyright
Copyright (c) 2012 jameshuynh. See LICENSE.txt for further details.