Syndication 0.6 ¶ ↑
This module provides classes for parsing web syndication feeds in RSS and Atom formats.
To parse RSS, use Syndication::RSS::Parser.
To parse Atom, use Syndication::Atom::Parser.
If you want my advice on which to generate, my order of preference would be:
-
Atom 1.0
-
RSS 1.0
-
RSS 2.0
My reasoning is simply that I hate having to sniff for HTML (see Syndication::RSS).
License¶ ↑
Syndication is Copyright 2005-2011 mathew <meta@pobox.com>, and is licensed under the same terms as Ruby.
Requirements ¶ ↑
Built and tested using Ruby 1.9.2. Needs only the standard library.
Rationale ¶ ↑
Ruby already has an RSS library as part of the standard library, so you might be wondering why I decided to write another one.
I started out trying to document the standard rss module, but found the code rather impenetrable. It was also difficult to see how it could be made documentable via Rdoc.
Then I tried writing code to use the standard RSS library, and discovered that it had a number of (what I consider to be) defects:
-
It didn’t support RSS 2.0 with extensions (such as iTunes podcast feeds), and it wasn’t clear to me how to extend it to do so.
-
It didn’t support RSS 0.9.
-
It didn’t support Atom.
-
The API is different depending on what kind of RSS feed you are parsing.
I asked around, and discovered that I wasn’t the only person dissatisfied with the RSS library. Since fixing the problems would have resulted in breaking existing code that used the RSS module, I opted for an all-new implementation.
This is the result. The first release was version 0.4, which was actually my fourth attempt at putting together a clean, simple, universal API for RSS and Atom parsing. (The first three never saw public release.)
Features ¶ ↑
Here are what I see as the key improvements over the rss module in the Ruby standard library:
-
Supports all RSS versions, including RSS 0.9, as well as Atom.
-
Provides a unified API/object model for accessing the decoded data, with no need to know what format the feed is in.
-
Allows use of extended RSS 2.0 feeds.
-
Simple API, fully documented.
-
Test suite with over 220 test assertions.
-
Commented source code.
-
Less source code than the standard library rss module.
-
Faster than the standard library (at least, in my tests).
Other features:
-
Optional support for RSS 1.0 Dublin Core, Syndication and Content modules, Apple iTunes Podcast elements, and Google Calendar.
-
Content module decodes CDATA-escaped or encoded HTML content for you.
-
Supports namespaces, and encoded XHTML/HTML in Atom feeds.
-
Dates decoded to Ruby DateTime objects. Note, however, that this is slow, so parsing is only performed if you ask for the value.
-
Simple to extend to support your own RSS extensions, uses reflection.
-
Uses REXML fast stream parsing API for speed, or built-in TagSoup parser for invalid feeds.
-
Non-validating, tries to be as forgiving as possible of structural errors.
-
Remaps namespace prefixes to standard values if it recognizes the module’s URL.
In the interests of balance, here are some key disadvantages over the standard library RSS support:
-
No support for generating RSS feeds, only for parsing them. If you’re using Rails, you can use RXML; if not, you can use rss/maker. My feeling is that XML generation isn’t a wheel that needs reinventing.
-
Different API, not a drop-in replacement.
-
Incomplete support for Atom 0.3 draft. (Anyone still using it?)
-
No support for base64 data in Atom feeds (yet).
-
No Japanese documentation.
-
No XSL output options.
-
Slower if there are dates in the feed and you ask for their values.
Other options ¶ ↑
There are, of course, other Ruby RSS/Atom libraries out there. The ones I know about:
simple-rss ¶ ↑
rubyforge.org/projects/simple-rss
Pros:
-
Much smaller than syndication or rss.
-
Completely non-validating.
-
Backwards compatible with rss in standard library.
Cons:
-
Doesn’t use a real XML parser.
-
No support for namespaces.
-
Incomplete Atom support (e.g. can’t get name and e-mail of <atom:person>
elements as separate fields, you still have to decode XHTML data yourself)
-
No documentation.
For the record, I started work on my library long before simple-rss was announced.
feedtools¶ ↑
rubyforge.org/projects/feedtools/
This one solves most of the same problems as Syndication; however the two were developed in parallel, in ignorance of each other.
Feedtools builds in database caching and persistance, and HTTP fetching. Personally, I don’t think those belong in a feed parsing library–they are easily implemented using other standard libraries if you want them.
Pros:
-
Lots of test cases.
-
Used by lots of Rails people.
-
Knows about many more namespaces.
-
Can generate feeds.
Cons:
-
Skimpy documentation.
-
Uses HTree then XPath parsing, rather than a single stream parse.
-
Tries to unify RSS and Atom APIs, at the expense of Atom functionality.
(Which could also be a pro, depending on your viewpoint.)
Design philosophy ¶ ↑
Here’s my design philosophy for this module:
-
The interface should be via standard Ruby objects and methods; e.g.
feed.channel.item.title, rather than (say) a dictionary hash.
-
It should be easier to parse RSS via the module than to hack something
together using REXML, even if all you want is a list of titles and URLs.
-
It should be easy to add support for new RSS extensions without needing
to know anything about reflection or other advanced topics. Just define a mixin with a bunch of appropriately-named methods, and you’re done.
-
The code should be simple to understand.
-
Even so, good complete documentation is extremely important.
-
Be lenient in what you accept.
-
Be conservative in what you generate.
-
Get well-formed feeds parsing reliably, then worry about broken feeds.
-
Atom will hopefully be the future. Provide full support for RSS, but don’t
hold Atom back by trying to force it into an RSS data model.
Future plans ¶ ↑
Here are some possible improvements:
-
RSS and Atom generation.
Create objects, then call Syndication::FeedMaker to generate XML in various flavors. This probably won’t happen until an XML generator is picked for the Ruby standard library.
-
Faster date parsing.
It turns out that when I asked for parsed dates in my test code, the profiler showed Date.parse chewing up 25% of the total CPU time used. A more specific ISO8601 specific date parser could cut that down drastically.
-
Additional Google Data support.
I just wanted to be able to display my upcoming calendar dates, but clearly there is a lot more that could be implemented. Unfortunately, recurring events don’t seem to have a clean XML representation in Google’s data feeds yet.
Feedback ¶ ↑
There are doubtless things I could have done better. Comments, suggestions, etc are welcome; e-mail <meta@pobox.com>.