0.0
No commit activity in last 3 years
No release in over 3 years
Take snapshots of database tables and compute the differences between two snapshots.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

 Project Readme

Table Differ

Take snapshots of database tables, restore them, and compute the differences between two snapshots.

Build Status Gem Version

Installation

The usual, add this line to your application's Gemfile:

gem 'table_differ'

Synopsis

You can run this scenario by replace Attachment with any model from your own application.

snapshot = Attachment.create_snapshot

# Make some changes. Run Attachment.delete_all, pretty much anything.
Attachment.first.update_attributes!(name: 'newname')

# compute the changes
added,removed,changed = Attachment.diff_snapshot(snapshot)
  => [[], [], [<Attachment 1>]]

# examine the changes
changed.first.attributes[:name]
  => 'newname'
changed.first.original_attributes[:name]
  => 'oldname'

# restore everything to the way it was
Attachment.restore_snapshot(snapshot)
Attachment.delete_snapshot(snapshot)

Usage

First, include TableDiffer functionality in your models:

ActiveRecord::Base.send(:include, TableDiffer)

or just the models that need snapshotting:

class Attachment  < ActiveRecord::Base
  include TableDiffer
  ...
end

Now your models will respond to the following methods.

Create Snapshot

Property.create_snapshot
Property.create_snapshot 'import_0012'

If you don't specify a name then one will be specified for you. Whatever naming scheme you use, the names should sort alphabetically. Otherwise Table Differ won't be able to figure out which is the most recent snapshot.

List Snapshots

Property.snapshots
=> ['property_import_0011', 'property_import_0012']

Restore Snapshot

Property.restore_snapshot 'import_0012'

Delete Snapshots

Property.delete_snapshot  'import_0012'

Or multiple snapshots:

Property.delete_snapshots  ['import_01', 'import_02']
Property.delete_snapshots  # deletes all Property snapshots

# more complex: delete all snapshots more than one week old
week_old_name = Property.snapshot_name(1.week.ago)
Property.delete_snapshots { |name| name < week_old_name }

Compute Differences

Use diff_snapshot:

added,removed,changed = Attachment.diff_snapshot # changes between table and latest snapshot
  => [[], [], [<Attachment 1>]]      # no records were added or removed, but one was changed
changed.first.original_attributes    # returns the original value for each changed field
  => {"name" => 'oldname'}

You can name the snapshots you want to diff:

add,del,ch = Property.diff_snapshot(old: 'import_0012')    # between snapshot and table
add,del,ch = Property.diff_snapshot(old: 'cc', new: 'cd')  # between two snapshots

With no arguments, diff_snapshot returns the differences between the current table and the most recent snapshot (determined alphabetically).

Return Value

diff_snapshot returns three collections of ActiveRecord objects:

  • added contains the records that have been added since the snapshot was taken
  • removed contains the records that were removed
  • changed contains any record where one or more columns have changed.

Table Differ doesn't follow foreign keys for that would be madness. If you want to discover changes in related tables, you should snapshot and diff them individually.

Records in added and changed are regular ActiveRecord objects -- you can modify their attributes and save them. Records in removed aren't database-backed records (obviously) and should be treated as read-only.

Changed records include an original_attributes hash. For example, if you changed a record's name from 'Nexus' to 'Nexii':

record.attributes
=> { 'id' => 1, 'name' => 'Nexii' }
record.original_attributes
=> { 'id' => 1, 'name' => 'Nexus' }

Single Table Inheritance works correctly.

Ignoring Noisy Columns

By default, every column will be considered in the diff. You can pass columns to ignore like this:

Property.diff_snapshot ignore: %w[ id created_at updated_at ]

Ignoring the Primary Key column

If you ignore the primary key, Table Differ can no longer compute which columns have changed. This is not a problem, but changed records will appear as a remove followed by an add. The changed array will always be empty so there's no need to specify it:

added,removed = Attachment.diff_snapshot(ignore: 'id')

Also, if you ignore the primary key, the returned records can't be used directly (normally Table Differ returns full ActiveRecord objects in added and changed). If you want to make changes to the database, you'll need to copy the attributes to another record One that was loaded from the database normally and still knows its ID.

Using unique_by to fake a Primary Key

All is not lost! If there are other fields that uniquely identify the records, you can specify them in the unique_by option. This will cause changes to be computed properly, and real ActiveRecord objects to be returned. This does require one database lookup per changed object, however, so it may be a bottleneck if there are a huge number of changes.

# Normally ingoring the ID prevents diff from being able to compute the changed records.
# If we can use one or more fields to uniquely identify the object,
# then changesets can be computed and full ActiveRecord objects will be returned.
added,removed,changed = Contact.diff_snapshot(ignore: 'id', unique_by: [:property_id, :contact_id])

Internals

Table Differ creates a full copy of the table whenever it creates a snapshot. Make sure you have enough disk space!

Table Differ creates and restores snapshots with a single SELECT statement, and it diffs the tables 100% server-side using two SELECT statements (plus one select per changed record if you're using unique_by). It should be fast enough.

We don't touch indexes. Snapshotting and restoring will not affect the indexes that you've created. If your table has a lot of indexes, restoring a snapshot might take a long time.

Alternatives

  • Stellar appears to do the same thing. Written in Python.

Contributing

Table Differ is licensed under pain-free MIT. Issues and pull requests on Github.