docsplit_images

Docsplit images is used to convert a document file (pdf, xls, xlsx, ppt, pptx, doc, docx, etc...) to a list of images combining with famous paperclip gem at https://github.com/thoughtbot/paperclip

Installation

Install Docsplit gem dependency (Referring from http://documentcloud.github.com/docsplit/)

1. Install GraphicsMagick. Its ‘gm’ command is used to generate images. Either compile it from source, or use a package manager:

[aptitude | port | brew] install graphicsmagick

2. Install Poppler. On Linux, use aptitude, apt-get or yum:

aptitude install poppler-utils poppler-data

On Mac, you can install from source or use MacPorts:

sudo port install poppler | brew install poppler

3. (Optional) Install Ghostscript:

[aptitude | port | brew] install ghostscript

Ghostscript is required to convert PDF and Postscript files.

4. (Optional) Install Tesseract:

[aptitude | port | brew] install [tesseract | tesseract-ocr]

Without Tesseract installed, you'll still be able to extract text from documents, but you won't be able to automatically OCR them.

5. (Optional) Install pdftk. On Linux, use aptitude, apt-get or yum:

aptitude install pdftk

On the Mac, you can download a [http://www.pdflabs.com/docs/install-pdftk/](recent installer for the binary). Without pdftk installed, you can use Docsplit, but won't be able to split apart a multi-page PDF into single-page PDFs.

6. (Optional) Install OpenOffice. On Linux, use aptitude, apt-get or yum:

aptitude install openoffice.org openoffice.org-java-common

On Mac, download and install http://www.openoffice.org/download/index.html.

Install Gem

gem 'docsplit_images', :git => 'https://github.com/jameshuynh/docsplit_images.git', tag: "v0.2.4"

Setting Up

From terminal, type the command to install

bundle
rails g docsplit_images <table_name> <attachment_field_name>
# e.g. rails generate docsplit_images asset document
rake db:migrate

In your model:

class Asset < ActiveRecord::Base
  ...
  attr_accessible :mydocument
  has_attached_file :mydocument
  docsplit_images_conversion_for :mydocument, {size: "800x"}
  ...
end

Processing Images

docsplit_images requires sidekiq to be turned on the process.

[bundle exec] sidekiq

While it is processing using https://github.com/collectiveidea/delayed_job, you can check if it is processing by accessing attribute is_processing_image

asset.is_processing_image?

Total number of pages

If your document file is not PDF, this will be non-zero after the internal conversion to PDF has been completed.

asset.number_of_images_entry

Checking the number of images which has been completed

asset.number_of_completed_images

Checking the overall conversion progress

asset.images_conversion_progress
# => 0.45 (which is 45%)

Accessing list of images using `document_images_list`

document_images_list will return a list of URL of images converting from the document

asset.document_images_list
# => ["/system/myfile_revisions/files/000/000/019/images/SBA_Admin_workflow_1.png", "/system/myfile_revisions/files/000/000/019/images/SBA_Admin_workflow_2.png", ...]

Contributing to docsplit_images

Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
Fork the project.
Start a feature/bugfix branch.
Commit and push until you are happy with your contribution.
Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

docsplit_images

Development

Runtime

docsplit_images

Installation

Install Docsplit gem dependency (Referring from http://documentcloud.github.com/docsplit/)

1. Install GraphicsMagick. Its ‘gm’ command is used to generate images. Either compile it from source, or use a package manager:

2. Install Poppler. On Linux, use aptitude, apt-get or yum:

3. (Optional) Install Ghostscript:

4. (Optional) Install Tesseract:

5. (Optional) Install pdftk. On Linux, use aptitude, apt-get or yum:

6. (Optional) Install OpenOffice. On Linux, use aptitude, apt-get or yum:

Install Gem

Setting Up

Processing Images

Total number of pages

Checking the number of images which has been completed

Checking the overall conversion progress

Accessing list of images using `document_images_list`

Contributing to docsplit_images

Copyright

docsplit_images

Development

Runtime

docsplit_images

Installation

Install Docsplit gem dependency (Referring from http://documentcloud.github.com/docsplit/)

1. Install GraphicsMagick. Its ‘gm’ command is used to generate images. Either compile it from source, or use a package manager:

2. Install Poppler. On Linux, use aptitude, apt-get or yum:

3. (Optional) Install Ghostscript:

4. (Optional) Install Tesseract:

5. (Optional) Install pdftk. On Linux, use aptitude, apt-get or yum:

6. (Optional) Install OpenOffice. On Linux, use aptitude, apt-get or yum:

Install Gem

Setting Up

Processing Images

Total number of pages

Checking the number of images which has been completed

Checking the overall conversion progress

Accessing list of images using document_images_list

Contributing to docsplit_images

Copyright

Accessing list of images using `document_images_list`