Metapage
A tiny class for extracting title, description and some further information from given urls using open graph and regular meta tags.
Why? For example this can be used for enriching urls submitted in a chat application.
Features
- Fetch open graph info for a page, with fallback to regular meta tags to give something for most HTML urls
- Bulk-fetch info for any urls contained in a given text snippet
- Checks if the given URL's host resolves via Google DNS and
is not on a private netowrk subnet to slow down clever people
from entering private urls like
http://localhost:3000/secret
to explore your network.
Installing
Add gem 'metapage'
to your Gemfile
or gem install metapage
on your command line.
Usage
Fetch a specific URL. Returns nil
if the content is not html or an image or loading fails due to invalid url, http response or timeout.
pp Metapage.fetch("https://github.com/protonet/metapage").to_h
{:id=>"58329eab7cf73bb0b29123a63ae150cc59dcf2e3",
:title=>"protonet/metapage",
:description=>
"metapage - A tiny class for extracting title, description and some further information from given urls",
:image_url=>"https://avatars1.githubusercontent.com/u/375656?v=3&s=400",
:type=>"object",
:canonical_url=>"https://github.com/protonet/metapage",
:site_name=>"GitHub",
:media_type=>"text",
:content_type=>"text/html"}
For images, the resulting content is much more minimal:
pp Metapage.fetch("https://s-media-cache-ak0.pinimg.com/736x/e3/ce/b3/e3ceb3fe3224e104ad0f019117b8e1f0.jpg").to_h
{:id=>"7bd710044d61b55c63d1d0089632a6417f370f53",
:title=>nil,
:description=>nil,
:image_url=>
"https://s-media-cache-ak0.pinimg.com/736x/e3/ce/b3/e3ceb3fe3224e104ad0f019117b8e1f0.jpg",
:type=>"image",
:canonical_url=>
"https://s-media-cache-ak0.pinimg.com/736x/e3/ce/b3/e3ceb3fe3224e104ad0f019117b8e1f0.jpg",
:site_name=>nil,
:media_type=>"image",
:content_type=>"image/jpeg"}
Extract urls from a given string and fetch the metadata for them. Only returns successfully retrieved results.
msg = "The text is http://github.com/colszowka/simplecov and links to http://hamburg.onruby.de?foo=bar but also to invalid http://fooooooooonoexist.com"
Metapage.extract(msg).map(&:title)
#=> ['colszowka/simplecov', 'Hamburg on Ruby - Heimathafen der Hamburger Ruby Community']
Both Metapage.fetch
and Metapage.extract
have equivalent bang methods fetch!
and extract!
that will bubble HTTP or parsing exceptions instead of returning
nil or silently ignoring invalid urls.
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
- Fork it ( https://github.com/protonet/metapage/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request