Syndication 0.6 ¶ ↑

This module provides classes for parsing web syndication feeds in RSS and Atom formats.

To parse RSS, use Syndication::RSS::Parser.

To parse Atom, use Syndication::Atom::Parser.

If you want my advice on which to generate, my order of preference would be:

Atom 1.0
RSS 1.0
RSS 2.0

My reasoning is simply that I hate having to sniff for HTML (see Syndication::RSS).

License¶ ↑

Requirements ¶ ↑

Built and tested using Ruby 1.9.2. Needs only the standard library.

Rationale ¶ ↑

Ruby already has an RSS library as part of the standard library, so you might be wondering why I decided to write another one.

I started out trying to document the standard rss module, but found the code rather impenetrable. It was also difficult to see how it could be made documentable via Rdoc.

Then I tried writing code to use the standard RSS library, and discovered that it had a number of (what I consider to be) defects:

It didn’t support RSS 2.0 with extensions (such as iTunes podcast feeds), and it wasn’t clear to me how to extend it to do so.
It didn’t support RSS 0.9.
It didn’t support Atom.
The API is different depending on what kind of RSS feed you are parsing.

I asked around, and discovered that I wasn’t the only person dissatisfied with the RSS library. Since fixing the problems would have resulted in breaking existing code that used the RSS module, I opted for an all-new implementation.

This is the result. The first release was version 0.4, which was actually my fourth attempt at putting together a clean, simple, universal API for RSS and Atom parsing. (The first three never saw public release.)

Features ¶ ↑

Here are what I see as the key improvements over the rss module in the Ruby standard library:

Supports all RSS versions, including RSS 0.9, as well as Atom.
Provides a unified API/object model for accessing the decoded data, with no need to know what format the feed is in.
Allows use of extended RSS 2.0 feeds.
Simple API, fully documented.
Test suite with over 220 test assertions.
Commented source code.
Less source code than the standard library rss module.
Faster than the standard library (at least, in my tests).

Other features:

Optional support for RSS 1.0 Dublin Core, Syndication and Content modules, Apple iTunes Podcast elements, and Google Calendar.
Content module decodes CDATA-escaped or encoded HTML content for you.
Supports namespaces, and encoded XHTML/HTML in Atom feeds.
Dates decoded to Ruby DateTime objects. Note, however, that this is slow, so parsing is only performed if you ask for the value.
Simple to extend to support your own RSS extensions, uses reflection.
Uses REXML fast stream parsing API for speed, or built-in TagSoup parser for invalid feeds.
Non-validating, tries to be as forgiving as possible of structural errors.
Remaps namespace prefixes to standard values if it recognizes the module’s URL.

In the interests of balance, here are some key disadvantages over the standard library RSS support:

No support for generating RSS feeds, only for parsing them. If you’re using Rails, you can use RXML; if not, you can use rss/maker. My feeling is that XML generation isn’t a wheel that needs reinventing.
Different API, not a drop-in replacement.
Incomplete support for Atom 0.3 draft. (Anyone still using it?)
No support for base64 data in Atom feeds (yet).
No Japanese documentation.
No XSL output options.
Slower if there are dates in the feed and you ask for their values.

Other options ¶ ↑

There are, of course, other Ruby RSS/Atom libraries out there. The ones I know about:

simple-rss ¶ ↑

rubyforge.org/projects/simple-rss

Pros:

Much smaller than syndication or rss.
Completely non-validating.
Backwards compatible with rss in standard library.

Cons:

Doesn’t use a real XML parser.
No support for namespaces.
Incomplete Atom support (e.g. can’t get name and e-mail of <atom:person>

elements as separate fields, you still have to decode XHTML data yourself)

No documentation.

For the record, I started work on my library long before simple-rss was announced.

feedtools¶ ↑

rubyforge.org/projects/feedtools/

This one solves most of the same problems as Syndication; however the two were developed in parallel, in ignorance of each other.

Feedtools builds in database caching and persistance, and HTTP fetching. Personally, I don’t think those belong in a feed parsing library–they are easily implemented using other standard libraries if you want them.

Pros:

Lots of test cases.
Used by lots of Rails people.
Knows about many more namespaces.
Can generate feeds.

Cons:

Skimpy documentation.
Uses HTree then XPath parsing, rather than a single stream parse.
Tries to unify RSS and Atom APIs, at the expense of Atom functionality.

(Which could also be a pro, depending on your viewpoint.)

Design philosophy ¶ ↑

Here’s my design philosophy for this module:

The interface should be via standard Ruby objects and methods; e.g.

feed.channel.item.title, rather than (say) a dictionary hash.

It should be easier to parse RSS via the module than to hack something

together using REXML, even if all you want is a list of titles and URLs.

It should be easy to add support for new RSS extensions without needing

to know anything about reflection or other advanced topics. Just define a mixin with a bunch of appropriately-named methods, and you’re done.

The code should be simple to understand.
Even so, good complete documentation is extremely important.
Be lenient in what you accept.
Be conservative in what you generate.
Get well-formed feeds parsing reliably, then worry about broken feeds.
Atom will hopefully be the future. Provide full support for RSS, but don’t

hold Atom back by trying to force it into an RSS data model.

Future plans ¶ ↑

Here are some possible improvements:

RSS and Atom generation.

Create objects, then call Syndication::FeedMaker to generate XML in various flavors. This probably won’t happen until an XML generator is picked for the Ruby standard library.

Faster date parsing.

It turns out that when I asked for parsed dates in my test code, the profiler showed Date.parse chewing up 25% of the total CPU time used. A more specific ISO8601 specific date parser could cut that down drastically.

Additional Google Data support.

I just wanted to be able to display my upcoming calendar dates, but clearly there is a lot more that could be implemented. Unfortunately, recurring events don’t seem to have a clean XML representation in Google’s data feeds yet.

Feedback ¶ ↑

There are doubtless things I could have done better. Comments, suggestions, etc are welcome; e-mail <meta@pobox.com>.