Dhalang is a Ruby wrapper for Google's Puppeteer.
Features
- Generate PDFs from webpages
- Generate PDFs from HTML ( external images/stylesheets supported )
- Capture screenshots from webpages
- Scrape HTML from webpages
Prerequisites
- Node ≥ 18
- Puppeteer ≥ 22
- Unix shell ( Dhalang will not work on Windows shells )
Installation
Add this line to your application's Gemfile:
gem 'Dhalang'
And then execute:
$ bundle update
Install puppeteer or puppeteer-core in your application's root directory:
$ npm install puppeteer
or
$ npm install puppeteer-core
Usage
PDF of a website url
Dhalang::PDF.get_from_url("https://www.google.com")
It is important to pass the complete url, leaving out https://, http:// or www. will result in an error.
PDF of a HTML string
Dhalang::PDF.get_from_html("<html><head></head><body><h1>examplestring</h1></body></html>")
PNG screenshot of a website
Dhalang::Screenshot.get_from_url("https://www.google.com", :png)
JPEG screenshot of a website
Dhalang::Screenshot.get_from_url("https://www.google.com", :jpeg)
WEBP screenshot of a website
Dhalang::Screenshot.get_from_url("https://www.google.com", :webp)
HTML of a website
Dhalang::Scraper.html("https://www.google.com")
Above methods either return a string containing the PDF/JPEG/PNG/WEBP in binary or the scraped HTML.
Custom options
To override the default options that are set by Dhalang you can pass as last argument a hash with the custom options you want to set.
For example to set custom margins for PDFs:
Dhalang::PDF.get_from_url("https://www.google.com", {margin: { top: 100, right: 100, bottom: 100, left: 100}})
For example to only take a screenshot of the visible part of the page:
Dhalang::Screenshot.get_from_url("https://www.google.com", :webp, {fullPage: false})
A list of all possible PDF options that can be set, can be found at: https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.pdfoptions.md
A list of all possible screenshot options that can be set, can be found at: https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.screenshotoptions.md
The default Puppeteer options contain the options
headerTemplate
andfooterTemplate
. Puppeteer expects these to be HTML strings. By default, the Dhalang gem passes all options as arguments in anode ...
shell command. In case the HTML strings are too long they might surpass the maximum argument length of the host. For example, on Linux theMAX_ARG_LEN
is 128kB. Therefore, you can also pass the headers and footers as file path using the optionsheaderTemplateFile
andfooterTemplateFile
. These non-Puppeteer-options will be used to populate the Puppeteer-optionsheaderTemplate
andfooterTemplate
.For example:
Dhalang::PDF.get_from_url("https://www.google.com", {headerTemplateFile: '/tmp/header.html', footerTemplateFile: '/tmp/footer.html'})
Below table lists more configuration parameters that can be set:
Key | Description | Default |
---|---|---|
isHeadless | Indicates if Chromium should be launched headless (useful for debugging) | true |
slowMo | Amount of milliseconds to slow down Puppeteer operations (useful for debugging) | 0 |
browserWebsocketUrl | Websocket url of remote chromium browser to use | None |
navigationTimeout | Amount of milliseconds until Puppeteer while timeout when navigating to the given page | 10000 |
printToPDFTimeout | Amount of milliseconds until Puppeteer while timeout when calling Page.printToPDF | 0 (unlimited) |
navigationWaitForSelector | If set, Dhalang will wait for the specified selector to appear before creating the screenshot or PDF | None |
navigationWaitForXPath | If set, Dhalang will wait for the specified XPath to appear before creating the screenshot or PDF | None |
userAgent | User agent to send with the request | Default Puppeteer one |
isAutoHeight | When set to true the height of generated PDFs will be based on the scrollHeight property of the document body | false |
viewPort | Custom viewport to use for the request | Default Puppeteer one |
httpAuthenticationCredentials | Custom HTTP authentication credentials to use for the request | None |
chromeOptions | A array of options that can be passed to puppeteer in addition to the mandatory ['--no-sandbox', '--disable-setuid-sandbox']
|
[] |
Examples of using Dhalang
To return a PDF from a Rails controller you can do the following:
def example_controller_method
binary_pdf = Dhalang::PDF.get_from_url("https://www.google.com")
send_data(binary_pdf, filename: 'pdfofgoogle.pdf', type: 'application/pdf')
end
To return a screenshot from a Rails controller you can do the following:
def example_controller_method
binary_png = Dhalang::Screenshot.get_from_url("https://www.google.com", :png)
send_data(binary_png, filename: 'screenshotofgoogle.png', type: 'image/png')
end