Shrimple
Launches PhantomJS to render web sites or local files (or have Phantom do pretty much anything). Shrimple started as a set of patches for Shrimp.
Development Ceased
While this gem still excels at parallel bulk rendering of PDFs, especially in the background, I don't have this need anymore. My applications have switched to on-demand rendering using ActiveJob and calling PhantomJS directly.
Feel free to use this gem if you want, and I'm happy to support it. I just won't be developing it anymore.
Installation
Install PhantomJS, add this line to your application's Gemfile:
gem 'shrimple'
then execute bundle
.
Right now we only work with the Phantom 1.9 series. 1.8 is too old and 2.0 is too new.
Usage
Render to a file:
require 'shrimple'
s = Shrimple.new( page: { paperSize: { format: 'A4' }} )
s.render_pdf('http://bl.ocks.org/mbostock', to: '/tmp/output.pdf')
Render to a variable by omitting the destination:
result = Shrimple.new.render_text('http://thingsididlastnight.com')
result.stdout # <== TODO: is the stdout name too arcane?
=> "Your Mom\n"
Render in the background (demonstrates both callbacks and waiting):
s = Shrimple.new(background: true)
s.onSuccess = ->(result) { File.write('/tmp/thumbs.png', result.stdout) }
s.onError = ->(result) { File.write('/tmp/thumbs.err', result.stderr) }
result = s.render_png('https://www.google.com/search?tbm=isch&q=rameses%20b%20wallpaper')
puts "waiting..." # printed immediately
result.wait # blocks until the render process exits
puts "That took #{result.stop_time - result.start_time} seconds."
Configuration
Shrimple supports all configuration options provided by PhantomJS, including unanticipated ones added in the future.
Options specified later override those specified earlier. Options passed directly to render only affect that particular call -- they are not remembered.
Here are some examples of passing options to Shrimple calls:
s = Shrimple.new( page: { zoomFactor: 0.5 }, timeout: 10 )
s.page.paperSize = { border: '3cm', format: 'A4', orientation: 'landscape' }
s.render_pdf('http://joeyh.name/blog/', to: '/tmp/joey.pdf', background: true)
PhantomJS Options
See default_config.rb for the known options all listed in one place.
-
Options passed to PhantomJS's command line are set with
config
:
s.config.loadImages = false
Phantom requires these to be in JSON notation:proxyType
instead of--proxy-type
. -
Options for PhantomJS's web page module are set with
page
:
s.page.paperSize.orientation = 'landscape'
-
Options for PhantomJS's render call are set, of course, with
render
:
s.render = { format: 'jpeg', quality: 85 }
Shrimple Options
-
background If true, the PhantomJS process will be spawned in the background and the render call returns immediately
background: false
-
timeout The time in seconds after which the PhantomJS executable is killed.
timeout: 0.5
-
output / to Specifies the destination file. If you don't specify a destination then the output is buffered into memory and can be retrieved with
result.stdout
.to
is just a more readable synonym foroutput
.
to: '/tmp/tt.gif'
-
stderr The path to save phantom's stderr. Normally it's buffered into memory and can be retrieved at any time with
result.stderr
. There's no harm in calling it multiple times to monitor the process's output. -
onSuccess A Ruby proc to be called when the render succeeds.
onSuccess = ->(result) { ftp.put(result.stdout) }
-
onError A Ruby proc called when the render fails or is killed.
onError = ->(result) { page_admin(result.stderr, result.options.to_hash) }
-
input specifies the source file to render. Normally you'd pass this as the first argument to render. Use this option if you want to specify the input file once and render it multiple times. You must specify a valid URL. Use
file://test_file.html
to specify a file on the local filesystem. -
execuatable a path to the phantomjs exectuable to use. Shrimple searches pretty hard for installed phantomjs executables so there's usually no need to specify this.
-
renderer the render.js script to pass to Phantom. Probably only useful for testing.
Examples
Here's a render pipeline that retrieves assets from a database, renders them, and uploads them to an FTP site. It keeps MAX_PROCESSES simultaneous Phantom processes running, and ensures no more than MAX_FTP_BACKLOG PDF files are waiting to be uploaded.
The pipeline stays as full as possible without violating its constraints.
# TODO: this could use some code review
MAX_PROCESSES = 4
MAX_FTP_BACKLOG = 8
# FTP runs in a separate thread, plucking files and uploading them
ftp_queue = SizedQueue.new(MAX_FTP_BACKLOG)
# TODO: either make this clearer or use pseudocode?
ftp_thread = Thread.new do
open_ftp_connection do |ftp|
done = false
while !done || (done && !ftp_queue.empty?)
name,data = ftp_queue.pop
if value == :done
done = true
else
send_file(ftp, name, data)
end
end
end
end
renderer = Shrimple.new
renderer.onSuccess = Proc.new do |result|
# If there's no room in the queue, this call blocks until there is.
ftp_queue.push([result.options.asset_file, result.stdout])
end
# finally, send each asset down the pipeline
Asset.find_each do |asset|
if Shrimple.processes.count >= MAX_PROCESSES
Shrimple.processes.wait_next # block until a slot opens up
end
# render the pdf into memory
renderer.render_pdf(asset.url, asset_file: asset.file)
end
ftp_thread.join # ensure all files are uploaded before returning
Changes to Shrimp
- Added background mode (even works in JRuby >1.7.4).
- Allows configuring pretty much anything: proxies, userName/password, scrollPosition, jpeg quality, etc.
- Prevents potential shell attacks by ensuring options aren't passed on the command line.
- Better error handling.
Copyright
Shrimp, the original project, is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed under the MIT License (see LICENSE.txt).
Shrimple is also Copyright © 2013 Scott Bronson and may be redistributed under the same terms.