Mercury Web Parser
A simple Ruby wrapper for the Mercury Web Parser API
Installation
Add this line to your application's Gemfile:
gem 'mercury_web_parser'
And then execute:
$ bundle
Or install it yourself as:
$ gem install mercury_web_parser
Configuration
You must first obtain an API token from the fine folks at Mercury in order to make requests to their Web Parser API.
Single token usage
MercuryWebParser.api_token = API_TOKEN
or set multiple options with a block:
MercuryWebParser.configure do |parser|
parser.api_token = API_TOKEN
end
Multiple tokens or multithreaded usage:
client = MercuryWebParser::Client.new(api_token: API_TOKEN)
Usage
Parse
Parse a webpage and return its main content:
article = MercuryWebParser.parse("http://sethgodin.typepad.com/seths_blog/2016/11/all-we-have-is-each-other.html")
=> #<MercuryWebParser::Article title="Seth's Blog", author=nil, date_published=nil, dek=nil, lead_image_url="http://www.sethgodin.com/sg/images/og.jpg", content="<div id=\"alpha-inner\" class=\"pkg\"> <div class=\"module-typelist module\">...", next_page_url="http://sethgodin.typepad.com/seths_blog/2016/11/choose-better.html", url="http://sethgodin.typepad.com/seths_blog/2016/11/all-we-have-is-each-other.html", domain="sethgodin.typepad.com", excerpt="", word_count=462, direction="ltr", total_pages=4, pages_rendered=4>
article.title
article.content
article.author
article.date_published
article.lead_image_url
article.dek
article.next_page_url
article.url
article.domain
article.excerpt
article.word_count
article.direction
article.total_pages
article.rendered_pages
Inspiration
Clone of readability_parser gem