Project

html2rss

0.04
A long-lived project that still receives updates
Supports JSON content, custom HTTP headers, and post-processing of extracted content.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

html2rss logo

Gem Version Yard Docs Retro Badge: valid RSS CI

html2rss is a Ruby gem that generates RSS 2.0 feeds from websites by scraping HTML or JSON content with CSS selectors or auto-detection.

This gem is the core of the html2rss-web application.

๐ŸŒ Community & Resources

Resource Description Link
๐Ÿ“š Documentation & Feed Directory Complete guides, tutorials, and browse 100+ pre-built feeds html2rss.github.io
๐Ÿ’ฌ Community Discussions Get help, share ideas, and connect with other users GitHub Discussions
๐Ÿ“‹ Project Board Track development progress and upcoming features View Project Board
๐Ÿ’– Support Development Help fund ongoing development and maintenance Sponsor on GitHub

Quick Start Options:

โœจ Features

  • ๐ŸŽฏ CSS Selector Support - Extract content using familiar CSS selectors
  • ๐Ÿค– Auto-Detection - Automatically detect content using Schema.org, JSON state, and semantic HTML
  • ๐Ÿ”„ Multiple Request Strategies - Faraday for static sites, Browserless for JS-heavy sites
  • ๐Ÿ› ๏ธ Post-Processing - Template rendering, HTML sanitization, time parsing, and more
  • ๐Ÿงช Comprehensive Testing - 95%+ test coverage with RSpec
  • ๐Ÿ“š Full Documentation - YARD documentation and comprehensive guides

๐Ÿš€ Quick Start

For installation and usage instructions, please visit the project website.

๐Ÿ’ป Try in Browser

You can develop html2rss directly in your browser using GitHub Codespaces:

Open in GitHub Codespaces

The Codespace comes pre-configured with Ruby 3.4, all dependencies, and VS Code extensions ready to go!

๐Ÿ“š Documentation

The full documentation for the html2rss gem is available on the project website.

๐Ÿค Contributing

Please see the contributing guide for details on how to contribute.

๐Ÿ—๏ธ Architecture

Core Components

  1. Config - Loads and validates configuration (YAML/hash)
  2. RequestService - Fetches pages using Faraday or Browserless
  3. Selectors - Extracts content via CSS selectors with extractors/post-processors
  4. AutoSource - Auto-detects content using Schema.org, JSON state blobs, semantic HTML, and structural patterns
  5. RssBuilder - Assembles Article objects and renders RSS 2.0

Data Flow

Config -> Request -> Extraction -> Processing -> Building -> Output

๐Ÿงช Testing

  • RSpec for comprehensive testing
  • 95%+ code coverage with SimpleCov
  • VCR for HTTP interaction testing
  • RuboCop for code style enforcement
  • Reek for code smell detection

๐Ÿ”ง Development Tools

  • Ruby LSP for IntelliSense and language features
  • Debug for modern debugging and exploration
  • YARD for documentation generation
  • GitHub Actions for CI/CD

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ’– Sponsoring

If you find html2rss useful, please consider sponsoring the project.