Historiographer
Losing data sucks. Every time you update or destroy a record in Rails, you lose the old data.
Historiographer fixes this problem in a better way than existing auditing gems.
Existing auditing gems for Rails suck
The Audited gem has some serious flaws.
-
The
versions
table quickly grows too large to query -
It doesn't provide the indexes you need from your primary tables
-
It doesn't provdie out-of-the-box snapshots
How does Historiographer solve these problems?
Historiographer introduces the concept of history tables: append-only tables that have the same structure and indexes as your primary table.
If you have a posts
table:
id | title |
---|---|
1 | My Great Post |
2 | My Second Post |
You'll also have a post_histories_table
:
id | post_id | title | history_started_at | history_ended_at | history_user_id |
---|---|---|---|---|---|
1 | 1 | My Great Post | '2019-11-08' | NULL | 1 |
2 | 2 | My Second Post | '2019-11-08' | NULL | 1 |
If you change the title of the 1st post:
Post.find(1).update(title: "Title With Better SEO", history_user_id: current_user.id)
You'll expect your posts
table to be updated directly:
id | title |
---|---|
1 | Title With Better SEO |
2 | My Second Post |
But also, your histories
table will be updated:
id | post_id | title | history_started_at | history_ended_at | history_user_id |
---|---|---|---|---|---|
1 | 1 | My Great Post | '2019-11-08' | '2019-11-09' | 1 |
2 | 2 | My Second Post | '2019-11-08' | NULL | 1 |
1 | 1 | Title With Better SEO | '2019-11-09' | NULL | 1 |
A few things have happened here:
- The primary table (
posts
) is updated directly - The existing history for
post_id=1
is timestamped when itshistory_ended_at
, so that we can see when the post had the title "My Great Post" - A new history record is appended to the table containing a complete snapshot of the record, and a
NULL
history_ended_at
. That's because this is the current history. - A record of who made the change is saved (
history_user_id
). You can join to your users table to see more data.
Snapshots
Snapshots are particularly useful for two key use cases:
1. Time Travel & Auditing
When you need to see exactly what your data looked like at a specific point in time - not just individual records, but entire object graphs with all their associations. This is invaluable for:
- Debugging production issues ("What did the entire order look like when this happened?")
- Compliance requirements ("Show me the exact state of this patient's record on January 1st")
- Auditing complex workflows ("What was the state of this loan application when it was approved?")
2. Machine Learning & Analytics
When you need immutable snapshots of data for:
- Training data versioning
- Feature engineering
- Model validation
- A/B test analysis
- Ensuring reproducibility of results
Taking Snapshots
You can take a snapshot of a record and all its associated records:
post = Post.find(1)
post.snapshot(history_user_id: current_user.id)
This will:
- Create a history record for the post
- Create history records for all associated records (comments, author, etc.)
- Link these history records together with a shared
snapshot_id
You can retrieve the latest snapshot using:
post = Post.find(1)
snapshot = post.latest_snapshot
# Access associated records from the snapshot
snapshot.comments # Returns CommentHistory records
snapshot.author # Returns AuthorHistory record
Snapshots are immutable - you cannot modify history records that are part of a snapshot. This guarantees that your historical data remains unchanged, which is crucial for both auditing and machine learning applications.
Snapshot-Only Mode
If you want to only track snapshots and not record every individual change, you can configure Historiographer to operate in snapshot-only mode:
Historiographer::Configuration.mode = :snapshot_only
In this mode:
- Regular updates/changes will not create history records
- Only explicit calls to
snapshot
will create history records - Each snapshot still captures the complete state of the record and its associations
This can be useful when:
- You only care about specific points in time rather than every change
- You want to reduce the number of history records created
- You need to capture the state of complex object graphs at specific moments
- You're versioning training data for machine learning models
- You need to maintain immutable audit trails at specific checkpoints
Single Table Inheritance (STI)
Historiographer fully supports Single Table Inheritance, both with the default type
column and with custom inheritance columns.
Default STI with type
column
class Post < ActiveRecord::Base
include Historiographer
end
class PrivatePost < Post
end
# The history classes follow the same inheritance pattern:
class PostHistory < ActiveRecord::Base
include Historiographer::History
end
class PrivatePostHistory < PostHistory
end
History records automatically maintain the correct STI type:
private_post = PrivatePost.create(title: "Secret", history_user_id: current_user.id)
private_post.snapshot
# History records are the correct subclass
history = PostHistory.last
history.is_a?(PrivatePostHistory) #=> true
history.type #=> "PrivatePostHistory"
Custom Inheritance Columns
You can also use a custom column for STI instead of the default type
:
class MLModel < ActiveRecord::Base
self.inheritance_column = :model_type
include Historiographer
end
class XGBoost < MLModel
self.table_name = "ml_models"
end
# History classes use the same custom column
class MLModelHistory < MLModel
self.inheritance_column = :model_type
self.table_name = "ml_model_histories"
end
class XGBoostHistory < MLModelHistory
end
Migration for custom inheritance column:
create_table :ml_models do |t|
t.string :name
t.string :model_type # Custom inheritance column
t.jsonb :parameters
t.timestamps
t.index :model_type
end
create_table :ml_model_histories do |t|
t.histories # Includes all columns from parent table
end
The custom inheritance column works just like the default type
:
model = XGBoost.create(name: "My Model", history_user_id: current_user.id)
model.snapshot
# History records maintain the correct subclass
history = MLModelHistory.last
history.is_a?(XGBoostHistory) #=> true
history.model_type #=> "XGBoostHistory"
STI and Snapshots: Perfect for Model Versioning
Single Table Inheritance combined with Historiographer's snapshot feature is particularly powerful for versioning machine learning models and other complex systems that need immutable historical records. Here's why:
- Type-Safe History: When you snapshot an ML model, both the model and its parameters are preserved with their exact implementation type. This ensures that when you retrieve historical versions, you get back exactly the right subclass with its specific behavior:
# Create and configure an XGBoost model
model = XGBoost.create(
name: "Customer Churn Predictor v1",
parameters: { max_depth: 3, eta: 0.1 },
history_user_id: current_user.id
)
# Take a snapshot before training
model.snapshot
# Update the model after training
model.update(
name: "Customer Churn Predictor v2",
parameters: { max_depth: 5, eta: 0.2 },
history_user_id: current_user.id
)
# Later, retrieve the exact pre-training version
historical_model = MLModel.latest_snapshot
historical_model.is_a?(XGBoostHistory) #=> true
historical_model.parameters #=> { max_depth: 3, eta: 0.1 }
- Implementation Versioning: Different model types often have different parameters, preprocessing steps, or scoring methods. STI ensures these differences are preserved in history:
class XGBoost < MLModel
def predict(data)
# XGBoost-specific prediction logic
end
end
class RandomForest < MLModel
def predict(data)
# RandomForest-specific prediction logic
end
end
# Your historical records maintain these implementation differences
old_model = MLModel.latest_snapshot
old_model.predict(data) # Uses the exact prediction logic from that point in time
- Reproducibility: Essential for ML workflows where you need to reproduce results or audit model behavior:
# Create model and snapshot at each significant stage
model = XGBoost.create(name: "Risk Scorer v1", history_user_id: current_user.id)
# Snapshot after initial configuration
model.snapshot(metadata: { stage: "configuration" })
# Snapshot after training
model.update(parameters: trained_parameters)
model.snapshot(metadata: { stage: "post_training" })
# Snapshot after validation
model.update(parameters: validated_parameters)
model.snapshot(metadata: { stage: "validated" })
# Later, you can retrieve any version to reproduce results
initial_version = model.histories.find_by(metadata: { stage: "configuration" })
trained_version = model.histories.find_by(metadata: { stage: "post_training" })
This combination of STI and snapshots is particularly valuable for:
- Model governance and compliance
- A/B testing different model types
- Debugging model behavior
- Reproducing historical predictions
- Maintaining audit trails for regulatory requirements
Namespaced Models
When using namespaced models, Rails handles foreign key naming differently than with non-namespaced models. For example, if you have a model namespaced like this:
module EasyML
class Dataset
self.table_name = "easy_ml_datasets"
end
end
Rails will expect foreign keys to be formatted using just the model name (without the namespace) like this:
:dataset_id
Therefore, when creating history migrations for namespaced models, you need to specify the foreign key name explicitly:
class CreateEasyMLDatasetHistories < ActiveRecord::Migration
def change
create_table :easy_ml_dataset_histories do |t|
t.histories(foreign_key: :dataset_id) # instead of using the table name — easy_ml_dataset_id
end
end
end
This ensures that the foreign key relationships are properly established between your namespaced models and their history tables.
Getting Started
Whenever you include the Historiographer
gem in your ActiveRecord model, it allows you to insert, update, or delete data as you normally would.
class Post < ActiveRecord::Base
include Historiographer
end
class PostHistory < ActiveRecord::Base
self.table_name = "post_histories"
include Historiographer::History
end
History Modes
Historiographer supports two modes of operation:
- :histories mode (default) - Records history for every change to a record
- :snapshot_only mode - Only records history when explicitly taking snapshots
You can configure the mode globally:
# In an initializer
Historiographer::Configuration.mode = :histories # Default mode
# or
Historiographer::Configuration.mode = :snapshot_only
Or per model using historiographer_mode
:
class Post < ActiveRecord::Base
include Historiographer
historiographer_mode :snapshot_only # Only record history when .snapshot is called
end
class Comment < ActiveRecord::Base
include Historiographer
historiographer_mode :histories # Record history for every change (default)
end
The class-level mode setting takes precedence over the global configuration. This allows you to:
- Have different history tracking strategies for different models
- Set most models to use snapshots while keeping detailed history for critical models
- Optimize storage by only tracking detailed history where needed
For example:
# Global setting for most models
Historiographer::Configuration.mode = :snapshot_only
class Order < ActiveRecord::Base
include Historiographer
# Uses global :snapshot_only mode
end
class Payment < ActiveRecord::Base
include Historiographer
historiographer_mode :histories # Override to record histories of every change
end
Create A Migration
You need a separate table to store histories for each model.
So if you have a Posts model:
class CreatePosts < ActiveRecord::Migration
def change
create_table :posts do |t|
t.string :title, null: false
t.boolean :enabled
end
add_index :posts, :enabled
end
end
You should create a model named posts_histories:
require "historiographer/postgres_migration"
class CreatePostHistories < ActiveRecord::Migration
def change
create_table :post_histories do |t|
t.histories
end
end
end
The t.histories
method will automatically create a table with the following columns:
-
id
(because every model has a primary key) -
post_id
(because this is the foreign key) -
title
(because it was on the original model) -
enabled
(because it was on the original model) -
history_started_at
(to denote when this history became the canonical version) -
history_ended_at
(to denote when this history was no longer the canonical version, if it has stopped being the canonical version) -
history_user_id
(to denote the user that made this change, if one is known)
Additionally it will add indices on:
- The same columns that had indices on the original model (e.g.
enabled
) -
history_started_at
,history_ended_at
, andhistory_user_id
Models
The primary model should include Historiographer
:
class Post < ActiveRecord::Base
include Historiographer
end
class PostHistory < ActiveRecord::Base
self.table_name = "post_histories"
include Historiographer::History
end
You should also make a PostHistory
class if you're going to query PostHistory
from Rails:
class PostHistory < ActiveRecord::Base
self.table_name = "post_histories"
end
The Posts
class will acquire a histories
method, and the PostHistory
model will gain a post
method:
p = Post.first
p.histories.first.class
# => "PostHistory"
p.histories.first.post == p
# => true
Creating, Updating, and Destroying Data:
You can just use normal ActiveRecord methods, and all will record histories:
Post.create(title: "My Great Title", history_user_id: current_user.id)
Post.find_by(title: "My Great Title").update(title: "A New Title", history_user_id: current_user.id)
Post.update_all(title: "They're all the same!", history_user_id: current_user.id)
Post.last.destroy!(history_user_id: current_user.id)
Post.destroy_all(history_user_id: current_user.id)
The histories
classes have a current
method, which only finds current history records. These records will also be the same as the data in the primary table.
p = Post.first
p.current_history
PostHistory.current
What to do when generated index names are too long
Sometimes the generated index names are too long. Just like with standard Rails migrations, you can override the name of the index to fix this problem. To do so, use the index_names
argument to override individual index names:
require "historiographer/postgres_migration"
class CreatePostHistories < ActiveRecord::Migration
def change
create_table :post_histories do |t|
t.histories index_names: {
title: "my_index_name",
[:compound, :index] => "my_compound_index_name"
}
end
end
end
== Mysql Install
For contributors on OSX, you may have difficulty installing mysql:
gem install mysql2 -v '0.4.10' --source 'https://rubygems.org/' -- --with-ldflags=-L/usr/local/opt/openssl/lib --with-cppflags=-I/usr/local/opt/openssl/include
== Copyright
Copyright (c) 2016-2020 brettshollenberger. See LICENSE.txt for further details.