Nitlink is a nice, nitpicky gem for parsing Link headers, which sticks as closely as possible to Mark Nottingham's parsing algorithm (from his most recent redraft of RFC 5988). That means it's particularly good at handling weird edge cases, UTF-8 encoded parameters, URI resolution, boolean parameters and more. It also plays nicely with a bunch of popular HTTP client libraries, has an extensive test suite, and zero external dependencies.
Tested with Ruby versions from 1.9.3 up to 2.4.1. Ruby 2.0+ is fully supported, 1.9.3 has fully functional parsing, but the support for third-party clients is somewhat limited (because, for example, Net::HTTPResponse
doesn't expose the request URI in 1.9.3).
Contents
- Installation
- Usage
- Feature comparison
- API
Nitlink::Parser
Nitlink::LinkCollection
Nitlink::Link
Nitlink::HashWithIndifferentAccess
- Changelog
- Developing Nitlink
- Contributing
- Future features
- License
- Author
Installation
Install the gem from RubyGems:
gem install nitlink
Or add it to your Gemfile and run bundle install
gem 'nitlink', '~> 1.1'
And you're ready to go!
require 'httparty'
require 'nitlink/response'
HTTParty.get('https://www.w3.org/wiki/Main_Page').links.by_rel('last').target
=> #<URI::HTTPS https://www.w3.org/wiki/index.php?title=Main_Page&oldid=100698>
Usage
The most basic way to use Nitlink is to directly pass in a HTTP response from Net::HTTP
:
require 'nitlink'
require 'net/http'
require 'awesome_print' # <- not required, just for this demo
link_parser = Nitlink::Parser.new
response = Net::HTTP.get_response(URI.parse 'https://api.github.com/search/code?q=addClass+user:mozilla')
links = link_parser.parse(response)
ap links
# =>
[
[0] #<Nitlink::Link:0x7fcd09019158
context = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user:mozilla>,
relation_type = "next",
target = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>,
target_attributes = {}
>,
[1] #<Nitlink::Link:0x7fcd09011fe8
context = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user:mozilla>,
relation_type = "last",
target = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>,
target_attributes = {}
>
]
links
is actually a Nitlink::LinkCollection
- an enhanced array which makes it convenient to grab a link based on its relation_type
:
links.by_rel('next').target.to_s
#=> 'https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2'
Third-party clients
Nitlink also supports a large number of third-party HTTP clients:
- Curb
- Excon
- Faraday
- http.rb
- httpclient
- HTTParty
- OpenURI (part of the standard lib)
- Patron
- REST Client
- Typhoeus
- Unirest - ⚠️ Deprecated, will be removed in Nitlink 2.0, see here.
You can pass a HTTP response from one of these libraries straight into the #parse
method:
response = HTTParty.get('https://api.github.com/search/code?q=addClass+user:mozilla')
links = link_parser.parse(response)
For the extra lazy, you can instead require nitlink/response
which decorates the various response objects from third-party clients with a new #links
method, which returns the parsed Link headers from that response. nitlink/response
must be required after the third-party client. (Note: Net::HTTPResponse
also gets decorated, even though it's not technically third-party).
require 'httparty'
require 'nitlink/response'
ap HTTParty.get('https://api.github.com/search/code?q=addClass+user:mozilla').links
# =>
[
[0] #<Nitlink::Link:0x7fcd09019158
context = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user:mozilla>,
# ....
response.links
is just syntactic sugar for calling Nitlink::Parser.new.parse(response)
Response as a hash
You can also pass the relevant response data as a hash (with keys as strings or symbols):
links = link_parser.parse({
request_uri: 'https://api.github.com/search/code?q=addClass+user:mozilla',
status: 200,
headers: { 'Link' => '<https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>; rel="next", <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last"' }
})
Non-GET
requests
For fully correct behavior, when the making a request using a HTTP method other than GET
, specify the method type as the second argument of #parse
:
response = HTTParty.post('https://api.github.com/search/code?q=addClass+user:mozilla')
links = link_parser.parse(response, 'POST')
This allows Nitlink to correctly set the context
of links (resources fetched by a method other than GET
or HEAD
generally have an anonymous context) - but otherwise everything works OK if you don't specify this.
Example: paginating Github search
Here we make an initial call to the Github API's search endpoint then iterate through the pages of results using Link headers:
require 'nitlink'
require 'net/http'
link_parser = Nitlink::Parser.new
first_page = HTTParty.get('https://api.github.com/search/code?q=onwheel+user:mozilla')
links = link_parser.parse(first_page)
results = first_page.parsed_response['items']
while links.by_rel('next')
response = HTTParty.get(links.by_rel('next').target)
results += first_page.parsed_response['items']
links = link_parser.parse(response)
end
Feature comparison
A few different Link header parsers (in various languages) already exist. Some of them are quite lovely ☺️ ! Nitlink does its best to be as feature complete as possible; as far as I know it's the first library to cover all the areas the spec (RFC 5988) sets out:
Feature | Nitlink | parse-link-header | link_header | li | weblinking | link-headers | backbone-paginator | http-link | node-http-link-header |
---|---|---|---|---|---|---|---|---|---|
Encoded params (per RFC 5987) | ✅ |
❌ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
URI resolution | ✅ |
❌ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
❌ |
Establish link context | ✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
Ignore quoted separators | ✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
Parse "weird" headers† | ✅ |
❌ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
❌ |
Proper escaping | ✅ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
✅ |
❌ |
Boolean attributes | ✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
Ignore duplicate params | ✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
Multiple relation types | ✅ |
✅ |
❌ |
❌ |
❌ |
✅ |
❌ |
❌ |
❌ |
† i.e. can it parse weird looking, but technically valid headers like <http://example.com/;;;,,,>; rel="next;;;,,, next"; a-zA-Z0-9!#$&+-.^_|~=!#$%&'()*+-./0-9:<=>?@a-zA-Z[]^_{|}~; title*=UTF-8'de'N%c3%a4chstes%20Kapitel
?
API
Nitlink::Parser
#parse(response, method = 'GET')
=> Nitlink::LinkCollection
Accepts the following arguments:
-
response
(required) - The HTTP response whoseLink
header you wish to parse. Can be any one of:- An instance of
Net::HTTPResponse
and its subclasses - An instance of
Curl::Easy
,Excon::Response
,Faraday::Response
,HTTP::Message
,HTTP::Response
,HTTParty::Response
,Patron::Response
,RestClient::Response
orTyphoeus::Response
(orUnirest::HttpResponse
although this is deprecated, and will be removed in Nitlink 2.0) - An instance of
StringIO
orTempfile
created byOpenURI
'sKernel#open
method - A
Hash
containing:-
request_uri
(String
orURI
) - the URI of the requested resource -
status
- the numerical status code of the response (e.g.200
) -
headers
(Hash
orString
) - headers can either be provided as a Hash of HTTP header fields (with keys being the field names and the values being the field values) or a raw HTTP header string (each field separated by CR-LF pairs). Only theLink
andContent-Location
headers are used by Nitlink. Nitlink treats field names case-insensitively.{ headers: { 'Content-Location' => 'http://example.com' 'Link' => '</page/2>; rel=next' } } # Or { headers: "Content-Location: http://example.com\r\nLink: </page/2>; rel=next" }
-
- An instance of
-
method
(optional,String
) - The HTTP method used to make the request. Defaults to'GET'
. This is used to establish the correct identity (per RFC 7231, Section 3.1.4.1)
Returns a Nitlink::LinkCollection
containing Nitlink::Link
objects:
-
When the response contains no
Link
header an empty collection is returned -
Links without a relation type (
rel
) specified are omitted -
The links' parameters are serialized into the
Nitlink::Link
'starget_attributes
. For more details of the serialization seeNitlink::Link#target_attributes
. -
Where a link has more than one relation type, one entry per relation type is appended:
ap parser.parse({ request_uri: 'http://example.com', status: 200, headers: { 'Link' => '</readme>; rel="about version-history"' } }) [ [0] #<Nitlink::Link:0x7fcda9330be8 context = #<URI::HTTP http://example.com>, relation_type = "about", target = #<URI::HTTP http://example.com/readme>, target_attributes = {} >, [1] #<Nitlink::Link:0x7fcda9330bc0 context = #<URI::HTTP http://example.com>, relation_type = "version-history", target = #<URI::HTTP http://example.com/readme>, target_attributes = {} > ]
If the Link
header does not begin with "<"
, or "<"
isn't followed by ">"
it's considered malformed and unparseable - in which case a Nitlink::MalformedLinkHeaderError
is thrown. If response
is an instance of a class which Nitlink doesn't know how to handle (e.g. from an unsupported third-party client) a Nitlink::UnknownResponseTypeError
is thrown.
Nitlink::LinkCollection
An extension of Array
with additional convenience methods for handling links based on their relation type.
#by_rel(relation_type)
=> Nitlink::Link
or nil
Accepts the following argument:
-
relation_type
(required,String
orSymbol
) - a single relation type which the returned link should represent (e.g.by_rel('terms-of-service')
would find a link pointing to legal terms).
Returns a single Nitlink::Link
object whose relation_type
attribute matches the relation type provided, or nil
if the collection doesn't contain a matching link. If two links exist which match the provided relation type (this should never happen in practice), the first matching link in the collection is returned.
Raises an ArgumentError
if the relation_type
is blank.
#to_h(options = { with_indifferent_access: true })
=> Nitlink::HashWithIndifferentAccess
or Hash
Also aliased as to_hash
.
Accepts the following arguments:
-
options
(optional,Hash
) - Whenoptions[:with_indifferent_access]
is truthy (as it is by default) the method returns aNitlink::HashWithIndifferentAccess
where each key is a relation type and each value is aNitlink::Link
. Whenoptions[:with_indifferent_access]
is falsy it returns the equivalentHash
with string keys.
An empty collection will return an empty Nitlink::HashWithIndifferentAccess
/Hash
. If two links exist which match a given relation type, the value will be the first link in the collection.
Nitlink::Link
A Struct
representing a single link with a specific relation type. It has four attributes:
-
context
- the context of the link -
target
- where the linked resource is located -
relation_type
- the relation type, which identifies the semantics of the link -
target_attributes
- a set of key/value pairs that give additional information about the link
#<Nitlink::Link:0x7fcda89489a0
context = #<URI::HTTP http://example.com>,
target = #<URI::HTTP http://example.com/readme>,
relation_type = "about",
target_attributes = {
"title" => "About us"
}
>
#context
=> URI
or nil
Returns the context of the link as a URI
object. Usually this will be the same as the request URI, but may be modified by the anchor
parameter or Content-Location
header. Additionally some HTTP request methods or status codes result in an "anonymous" link context being assigned (represented by nil
).
#target
=> URI
Returns the target of the link as a URI
object. If the URI given in the Link
header is relative, Nitlink resolves it (based on the request URI).
#relation_type
=> String
A single relation type, describing the kind of relationship this link represents. For example, "prev"
would indicate that the target resource immediately precedes the context. It could also be an extension relation type (an absolute URI serialized as a string).
Relation types are always case-normalized to lowercase.
#target_attributes
=> Hash
Captures the values of the parameters that aren't used to construct the context
or target
(i.e. other than rel
and anchor
) title
, for example.
Parameters ending in *
are decoded per RFC 5987, bis-03. Where decoding fails, the parameter is omitted.
Boolean parameters (e.g. crossorigin
) have their values set to nil
. Any backslash escaped characters within quoted parameter values are unescaped. The names of attributes are case-normalized to lowercase. Only the first occurrences of media
, title
, title*
or type
parameters are parsed, subsequent occurrences are ignored.
If no additional parameters exist, target_attributes
is an empty hash.
ap parser.parse({
request_uri: 'http://example.com',
status: 200,
headers: { 'Link' => %q{</about>; rel=about; title="About us"; title*=utf-8'en'About%20%C3%BCs; crossorigin} }
})
#=>
[
[0] #<Nitlink::Link:0x7fcda9274bc8
context = #<URI::HTTP http://example.com>,
relation_type = "about",
target = #<URI::HTTP http://example.com/about>,
target_attributes = {
"title" => "About us",
"title*" => "About üs",
"crossorigin" => nil
}
>
]
Nitlink::HashWithIndifferentAccess
Implements a hash where keys :foo and "foo" are considered to be the same. It's closely modeled on Thor's implementation (Thor::CoreExt::HashWithIndifferentAccess
), except without the magic predicates.
Instances are largely interchangeable with ActiveSupport::HashWithIndifferentAccess
, but doesn't have the additional methods not present on Hash
, like stringify_keys
, symbolize_keys
, regular_writer
etc.
Changelog
Nitlink follows semantic versioning.
1.1.0 (1 September 2017)
- Remove dependency on
hashwithindifferentaccess
gem, implement nativeHashWithIndifferentAccess
class modeled on Thor's implementation (see #1) - Add
with_indifferent_access
option to#to_h
, and alias#to_h
as#to_hash
- Deprecate Unirest support (see #3)
- Improve compatibility and build process for Ruby 1.9.3 and 2.0
- Fix minor Curb regression
1.0.0 (7 November 2016)
- Initial release
Developing Nitlink
- Clone the git repo
git clone git://github.com/alexpeattie/nitlink.git
- Install dependencies
cd nitlink
bundle install
You can skip installing the various third-party HTTP clients Nitlink supports, to get up and running faster (some specs will fail)
bundle install --without clients
- Run the test suite
bundle exec rspec
You can also generate a Simplecov coverage report by setting the COVERAGE
environment variable:
COVERAGE=true bundle exec rspec
Contributing
Pull requests are very welcome! Please try to follow these simple rules if applicable:
- Fork it (https://github.com/alexpeattie/nitlink/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Future features
- Validate (non-extension) relation types against those listed in the official Link Relation Type Registry
- Check the format of known parameters (e.g.
type
should be in the formatfoo/bar
) - Detect the language of the linked resource (from
hreflang
or language information in an encodedtitle*
) - Convert extension relation types to
URI
s - Support for parameter fallback
- Add option to ignore links with anchor parameters
- Better handling of duplicate parameters
License
Nitlink is released under the MIT license. (See License.md)
Author
Alex Peattie / alexpeattie.com / @alexpeattie