Pdf::Reader::Markup
A markup extension for the PDF::Reader library.
As well as continuing to support fetching a collection of lines for an individual page in a PDF file, this adds the method formatted_lines which uses HTML-style tags to mark up bold and italic text.
Installation
Add this line to your application's Gemfile:
gem 'pdf-reader-markup'
And then execute:
$ bundle
Or install it yourself as:
$ gem install pdf-reader-markup
Usage
Require the gem in the source file that contains the PDF-handling code:
require 'pdf/reader/markup'
You should now be able to use the custom MarkupPage handler to get back matching plaintext and formatted lines for each page:
pdf = PDF::Reader.new("./spec/sample docs/Dorian_Gray_excerpt.pdf")
page = PDF::Reader::MarkupPage.new(pdf.pages[1])
# slightly modified version of the lines() method
lines_of_plaintext = page.lines()
#the new formatted_line() method
lines_with_markup = page.formatted_lines()
# and not forgetting content() which will return the all the lines as
# a solid block of text
entire_page_text = page.content()
# and its formatted equivalent markup
entired_page_markup = page.markup()
Note that you can still access the original PDF::Reader methods within the
same project by using PDF::Reader::PageTextReceiver
and walking the page,
giving access to the standard content and lines as functionality.
You can also, if you prefer, use the
Reader::MarkupPage::PageBoldItalicReceiver
receiver directly rather than
using the PDF::Reader::MarkupPage wrapper.