0.0
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
couchdb_to_sql provides a DSL that allows complex CouchDB documents to be converted into rows in a RDBMS' table. The stream of events received from the CouchDB changes feed will trigger documents to be fed into a matching filter block and saved in the database.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

>= 0
~> 12.0
~> 0.15
~> 3.2

Runtime

~> 2.0
~> 1.0, >= 1.0.5
>= 4.36.0
 Project Readme

Build Status Gem Version

couchdb_to_sql

Utility to listen to a CouchDB changes feed and automatically insert, update, or delete rows into an SQL database from matching key-value conditions of incoming documents.

couchdb_to_sql is heavily indebted to samlown's original couch_tap gem. We have added functionality needed for our particular use case, while still trying to keep it reasonably flexible and not too hardwired to the ember-pouch use case.

While CouchDB is awesome, business people probably won't be quite as impressed when they want to play around with the data. Regular SQL is generally accepted as being easy to use and much more widely supported by a larger range of commercial tools.

couchdb_to_sql will listen to incoming documents on a CouchDB server's _changes feed in continuous mode, and automatically update rows of the SQL database tables defined in the conversion schema. The changes feed uses a sequence number allowing synchronization to be started and stopped at will.

Sequel is used to provide the connection to the database. This library can also be used for migrations, which is important for frequently changing schemas.

couchdb_to_sql takes a simple two-step approach converting documents to rows. When a change event is received for a matching document definition, each associated row is completely deleted. If the change is anything other than a delete event, the rows will be re-created with the new data. This makes things much easier when trying to deal with multi-level documents (i.e. documents of documents) and one-to-many table relationships.

A couchdb_to_sql Project

couchdb_to_sql requires a configuration or filter definition that will allow incoming document changes to be identified and dealt with. The configuration file can either be hand-written or generated dynamically. (For our particular use case with ember-pouch, we have chosen to generate it based on the Ember model metadata. The script for this is unfortunately not open source at this time.)

The following example attempts to outline most of the key features of the DSL.

# The couchdb database from which to request the changes feed
changes "http://user:pass@host:port/invoicing" do
  # # Optional flag which can be enabled to take advantage of Postgres 9.5's support for INSERT CONFLICT, e.g. upserts.
  # # Note: this only deals with the _couchdb_to_sql_sequences metadata table, not the actual CouchDB documents themselves.
  # upsert_mode

  # # Optional flag which can be enabled if ember-pouch is being used to populate the CouchDB database. ember-pouch uses a
  # # specially crafted format of the CouchDB documents, where all the data is placed in 'data' node and the 'id' follows a
  # # particular format. This flag makes couchdb_to_sql presume that all CouchDB documents for the given stream follow this format.
  # ember_pouch_mode

  # # Optional flag which can be enabled to enable a stricter mode, where processing will abort if an unhandled document is
  # # encountered.
  # fail_on_unhandled_document

  # # Optional path to a file containing a JSON array of sequences to skip. The 'seq' value of incoming documents will be compared
  # # to the values in this array.
  # skip_seqs_file 'skiplist.json'

  # The target database to which changes will be streamed.
  database "postgres://user:pass@localhost:5432/invoicing"

  # Simple automated copy, each property's value in the matching CouchDB document will be copied to the table field with the
  # same name.
  document 'type' => 'User' do
    table :users
  end

  document 'type' => 'Invoice' do

    table :invoices, :key => :invoice_id do

      # Copy columns from fields with different name
      column :updated_at, :updated_on
      column :created_at, :created_on

      # Manually set a value from document or fixed variable
      column :date, doc['date'].to_json
      column :added_at, Time.now

      # Set column values from a block.
      column :total do
        doc['items'].inject(0){ |sum,item| sum + item['total'] }
      end
    end
  end
end

DSL Summary

changes

Defines which CouchDB database should be used to request the changes feed.

After loading the rest of the configuration, the service will connect to the database using Event Machine. As new changes come into the system, they will be managed in the background.

connection

The Sequel URL used to connect to the destination database. Behind the scenes, couchdb_to_sql will check for a table named couchdb_sequence that contains a single row for the current changes sequence id, much like a migration id typically seen in a Rails database.

As changes are received from CouchDB, the current sequence will be updated to match.

document

When a document is received from the changes feed, it will be passed through each document stanza looking for a match. Take the following example:

document :type => 'Invoice' do |doc|
  # ...
end

This will match all documents whose type property is equal to "Invoice". The document itself will be made available as a hash through the doc block variable.

document stanzas may be nested if required to provide further levels of filtering.

table

Each table stanza lets couchdb_to_sql know that all or part of the current document should be inserted into it. By default, the matching table's schema will be read and any field names that match a property in the top-level of the document will be inserted automatically.

One of the limitations of couchdb_to_sql is that all tables must have an id field as their primary key. In each row, the id's value will be copied from the _id of the document being imported. This is the only way that deleted documents can be reliably found and removed from the relational database.

column

foreign_key

Notes on deleted documents

CouchDB documents being deleted are not deleted in the SQL database, because this is typically not what you want to do from a data integrity/etc. point of view. Instead, it is marked as deleted.

For this to work, the following two columns must exist in the table (example given is from PostgreSQL):

fieldops_reports=# \d spare_parts
                       Table "public.spare_parts"
       Column       |           Type           |       Modifiers
--------------------+--------------------------+------------------------
 spare_part_id      | text                     | not null
 id                 | text                     |
 _deleted           | boolean                  | not null default false
 _deleted_timestamp | timestamp with time zone |

(spare_part_id is the primary key which will hold the CouchDB id. id holds the "Ember ID" in case you are using ember-pouch mode. _deleted* are the fields which indicate if the record is deleted, and if so, when it was marked as deleted.)

Testing

Run tests using rake, or individual tests as follows:

$ rake test TEST=test/unit/changes_test.rb

If you have disabled the "admin party" in CouchDB, you might have to manually specify the CouchDB URL. Like this:

$ COUCHDB_URL='http://admin:admin@127.0.0.1:5984/' bundle exec rake test

If you want to run tests towards a PostgreSQL database instead of CouchDB:

$ TEST_SQL_URL='postgres://localhost/couchdb_to_sql_test' bundle exec rake test

Useful environment variables

  • SEQUEL_LOG_LEVEL=debug - set to enable logging of all SQL queries executed.

Releasing a new version

  • Merge all relevant pull requests
  • Bump the version in the VERSION file. Follow Semantic Versioning principles. Do not prepend the version with a v. You don't need to commit or push after this step, it gets done automatically by the next step.
  • git release v1.0.x (brew install git-extras if you are missing the git release command.)
  • bundle exec rake build release (builds the .gem file and pushes it to Rubygems.org)
  • changelog-rs --latest to regenerate the changelog which can then be copy-pasted to the releases page. curl https://sh.rustup.rs -sSf | sh && cargo install changelog-rs if you don't have it installed. More info on its web page.