Syncify
Syncify is a gem used to sync ActiveRecord records and their associations from one remote environment to your local environment.
Consider this hypothetical problem: You have a gigantic production database with complex associations between your models, including polymorphic associations. The database includes sensitive data that shouldn't really be in your development or staging environments. But, there's something wrong in production and you need production data to be able to debug it. It's not practical, efficient, safe, or generally advisable to restore a backup of the production database locally.
How do you get that data safely from production to an environment where you can make use of it? This is the problem that Syncify aims to address.
Installation
Add this line to your application's Gemfile:
gem 'syncify'
And then execute:
$ bundle
Or install it yourself as:
$ gem install syncify
Usage
Syncify doesn't require Rails, just ActiveRecord, but it's a reasonable foundation for the following examples.
Also, you can sync from any environment to whatever your current environment is. So, you could sync from your staging environment to your client test environment or from staging to development. Heck, you could go from staging to prod if you'd like.
For the purposes of this documentation we'll assume that you're syncing data in a Rails app from production to your local development environment.
Syncify has a pretty simple API. There's just one method, run!
. Here's a really basic example where we're syncing a single Widget
from production to the current environment. The current environment could be any environment, but we'll assume it's development for this documentation.
Syncify::Sync.run!(klass: Widget, id: 123, remote_database: :production)
Boom! You've copied Widget
123 to local dev. The widget will have all the same values including its id
, created_at
, updated_at
, etc values. The above example assumes that the Widget
doesn't have any foreign keys that don't exist locally.
Syncify also accepts a where
argument in the form of any valid active record where
expression (a hash or a string). Here's the same as above, but with a where
:
Syncify::Sync.run!(klass: Widget, where: { id: 123 }, remote_database: :production)
Or...
Syncify::Sync.run!(klass: Widget, where: 'widgets.id = 123', remote_database: :production)
Now, let's say a Widget
belongs_to
a Manufacturer
. Furthermore, let's say a Manufacturer
has_many
Widget
s. We'll pretend we have this data in the prod database:
widgets
:
id | name | manufacturer_id |
---|---|---|
123 | Lubricated Stainless Steel Helical Insert | 78 |
124 | Magnetic Contact Alarm Switches | 79 |
125 | Idler Sprocket for Double-Strand ANSI Roller Chain | 78 |
126 | Rod End Bolt Blank | 79 |
127 | Press-Fit Drill Bushing | 78 |
manufacturers
:
id | name |
---|---|
78 | South Seas Trading Company |
79 | Blandco Manufacturing |
If your database uses foreign keys and the production Widget
's Manufacturer
doesn't exist locally then the example above would fail. To get around this we can specify an association to also sync when syncing the Widget
.
Syncify::Sync.run!(klass: Widget,
id: 123,
association: :manufacturer,
remote_database : :production)
Running the above example will copy two records into your local database:
- The
Widget
with id 123 (Lubricated Stainless Steel Helical Insert) - The
Manufacturer
with id 78 (South Seas Trading Company)
It's important to note that Syncify does not recursively follow associations (though see below for how to discover associations programmatically). You'll note that not all of the the manufacturer's widgets were synced, only the one we specified.
The association
attribute passed into the run!
method can be any valid value that you might use when joining records with ActiveRecord. The above effectively becomes:
Widget.eager_load(:manufacturer).find(123)
Because of this, you can pass any sort of ActiveRecord association into Syncify's run!
method.
Now let's imagine that a Manufacturer
has_many
Plant
s which belong_to
a State
. Here are some example rows in these tables:
plants
:
id | name | manufacturer_id | city | state_id |
---|---|---|---|---|
64 | Rapid Run | 78 | Lansing | 13 |
65 | Ye Olde Factory | 78 | Naples | 15 |
66 | Catco | 79 | Balston | 43 |
states
:
id | name |
---|---|
13 | Michigan |
15 | Florida |
43 | Virginia |
We could sync a Manufacturer
along with its widgets, factories, and the state the factory is in with this example:
Syncify::Sync.run!(klass: Manufacturer,
id: 78,
association: [
:widgets,
{ factories: :state }
],
remote_database: :production)
You can really go wild with the associations; well beyond what you could normally run with an ActiveRecord query!
When Syncify was first released, I had an app with a hash defining a ton of associations across dozens of models that was more than 150 lines long. When I ran this sync it would identify more than 500 records and syncs them all to local dev in about 30 seconds. I've since updated to use association discovery (documented below) and sync much more data. It takes longer, but it's still very fast.
Polymorphic Associations
Syncify also works with (and across) Polymorphic associations! To sync across polymorphic associations you need to specify an association using the Syncify::PolymorphicAssociation
class. This is put in place in your otherwise-normal associations.
Let's imagine that we run an online store that sells both physical and digital goods. A given invoice then might have line items that refer to either type of good.
Here's our model:
-
Customer
has_many :invoices
-
Invoice
belongs_to :customer
has_many :line_items
-
LineItem
belongs_to :invoice
belongs_to :product, polymorphic: true
-
DigitalProduct
has_many :line_items, as: :product
belongs_to :category
-
PhysicalProduct
has_many :line_items, as: :product
belongs_to :distributor
-
Category
has_many :digital_products
-
Distributor
has_many :physical_products
There's a lot going on above, and I'll spare you the example database tables. You can use your imagination! 😉
Let's say we want to sync a particular LineItem
. With ActiveRecord queries, you can't simply eager_load
across a polymorphic association, much less to any sub-associations (EG: :category
or :distributor
). With Syncify you can.
Here's an example. For simplicity's sake it assumes that the database doesn't use foreign keys. (Don't worry, we'll do a more complex example next!):
Syncify::Sync.run!(klass: LineItem,
id: 42,
association: {
product: {
DigitalProduct => {},
PhysicalProduct => {}
}
},
remote_database: :production)
Assuming that line item 42's product is a DigitalProduct
, this example would have synced the LineItem
and its DigitalProduct
and nothing else.
Let's focus in on the association:
{
product: {
DigitalProduct => {},
PhysicalProduct => {}
}
}
We know the LineItem
has a polymorphic association named :product
(this is documented above). This association is saying that, for the LineItem
's product
polymorphic association, when the product is a DigitalProduct
, sync it with the specified associations (in this case none). When the product is a PhysicalProduct
, sync it with the specified associations (again, none in this case).
Now let's say that we want to sync a specific Customer
and all of their invoices and the related products. IE: the whole kit and caboodle. Here's how you can do it:
Syncify::Sync.run!(klass: Customer,
id: 999,
association: {
invoices: {
line_items: {
product: {
DigitalProduct => :category,
PhysicalProduct => :distributor
}
}
}
},
remote_database: :production)
This will sync a customer, all of their invoices, and all of those invoice's line items. It goes on to sync all of the line item's products, whether digital or physical, as well as the digital product's category and the physical product's distributor.
Discovering Associations Programmatically
The process of specifying associations, as outlined above, might be fairly tedious, especially if you have a hierarchy of dozens of interrelated models. For this reason, Syncify also includes a class that can discover associations, Syncify::IdentifyAssociations
. Like Syncify::Sync
, this class has one method, run!
. You can use it like this:
associations = Syncify::IdentifyAssociations.run!(klass: Customer)
This will inspect the local Customer
class, discover its associations, and then drill down through those associations to discover nested associations. It proactively cuts out associations that are inverses of other associations, and endeavors to eradicate association loops. So, looking at the customer/invoices/products example above, it will recognize the association from Customer
to Invoice
, but not from Invoice
to Customer
, since it's the inverse of the first association. It also skips over has_many through:
associations, since those must be covered by another association.
Using the example above, the associations identified would look like this:
{
invoices: {
line_items: {
product: {
DigitalProduct => :category,
PhysicalProduct => :distributor
}
}
}
}
Important Note: Polymorphic associations are discovered by querying the database for associated types. So, in the example above, the
IdentifyAssociations
class sees theLineItem#products
association and queries theline_items
table for the set of distinct values in theproduct_type
column. It uses that to continue discovery. So, if you're trying to discover associations, but your database is empty, you won't be able to traverse these polymorphic associations.
The example above can be see in the specs at spec/lib/syncify/identify_associations_spec.rb.
So, you can combine the Sync
class and the IdentifyAssociations
class to make your live even easier:
Syncify::Sync.run!(
klass: Customer,
id: 999,
association: Syncify::IdentifyAssociations.run!(klass: Customer),
remote_database: :production
)
Using IdentifyAssociations
Remotely
Under some circumstances you may want to discover the associations from the remote database. For example, maybe you don't have data in your local database to be able to discover polymorphic associations. For situations like this, IdentifyAssociations
accepts a remote_database
argument, just like Sync
.
Syncify::IdentifyAssociations.run!(klass: Customer, remote_database: :production)
And here it is with Sync
in all its glory:
Syncify::Sync.run!(
klass: Customer,
id: 999,
association: Syncify::IdentifyAssociations.run!(klass: Customer, remote_database: :production),
remote_database: :production
)
Association Hints
Sometimes, you might not want to automatically discover associations, but not all of them. In these situations you can use hints. A hint is a class that can be used to filter out associations conditionally.
The Hint
class defines the interface for hints. It's a no-op hint that doesn't filter anything. If you need to create your own hints you can extend Hint
.
All hints have two methods:
Method | Description |
---|---|
applicable?(candidate_association) |
This method take a Rails association (not a Syncify association, which isn't documented here) and returns a boolean value indicating whether or not the hint is applicable for the specified association. Basically, this is what Syncify uses to determine whether or to check if an association is allowed. |
allowed?` | This method returns true or false, indicating if a particular association is allowed to be traversed or not. |
You are most likely to use the BasicHint
class. This class has a constructor that accepts the following arguments:
Argument | Type | Default | Description |
---|---|---|---|
from_class |
Class or array of classes | nil | If provided, the from_class argument declares that the hint applies to associations from the specified class or classes. |
association |
Symbol or array of symbols or regex | nil | If provided, the association argument declares that the hint applies to associations with the specified name or names or names matching the specified regular expression. |
to_class |
Class or array of classes | nil | If provided, the to_class argument declares that the hint applies to associations to the specified class or classes. |
allowed |
Boolean (required) | This argument indicates that if the hint is applicable to a particular association that it is or is not allowed, meaning that the IdentifyAssociations class will or will not ignore it. |
Hints can be specified for use by IdentifyAssociations
like so:
Syncify::IdentifyAssociations.run!(
klass: Customer,
hints: [
Syncify::Hint::BasicHint.new(....),
Syncify::Hint::BasicHint.new(....)
]
Hints are applied in the order specified and the first one that matches "wins". So, if you have an association that is explicitly disallowed by a hint before another hint allows it, the first hint wins and the association is ignored.
With that out of the way, let's assume you have an an association where there are lots of associated records. For example, maybe you're Amazon and you have a Store
class which has many Product
s. Obviously, amazon has a bazillion products. We might not want to sync all of these products when syncing a Store
. You could filter that out with hints in a couple ways.
Don't sync by association name:
Syncify::Hint::BasicHint.new(association: :products, allowed: false)
Don't sync by the target class name:
Syncify::Hint::BasicHint.new(to_class: Product, allowed: false)
Perhaps some models always exist locally and remotely. In that case, you could create a hint to never sync them:
Syncify::Hint::BasicHint.new(
to_class: [
Config,
Account,
Country,
SiteDomain,
Offer
]
)
Perhaps you have some classes where none of their associations ever need to be synced. For example, maybe you collect stats on some objects, but the stats aren't needed locally, or there's so many records that it's not practical to sync them all:
Syncify::Hint::BasicHint.new(
from_class: [
Account,
DailyStat,
LifetimeDailyStat,
DomainDailyStat,
PaymentAccount,
User,
],
allowed: false
)
Note that in the above example we're disallowing all associations from Account
. But, let's imagine that Account
has 50 associations and we do want to sync two of them. Since hints are applied in the order specified, and the first hint that matches is the hint that is applied, you could specifically allow two of the associations from Account
, but disallow all others like this:
Syncify::Hint::BasicHint.new(from_class: Account, association: [:example1, :example2], allowed: true)
Syncify::Hint::BasicHint.new(from_class: Account, allowed: false)
If the BasicHint
class isn't sufficient for your needs, you can always create your own hints by extending Hint
and implementing the applicable?
and allowed?
methods.
Callbacks
Sometimes production databases contain sensitive data that you really don't want to have end up in other environments. Or, maybe you want to disassociate production data from third party production APIs. Or maybe you want to download images before you actually create image records locally. Syncify handles this by providing a callback mechanism.
Syncify's workflow is basically this:
- Using the specified class and its associations, Syncify identifies all of the records we need to sync to the local environment. Effectively, all of the records are loaded from the remote environment into a set in memory.
- Syncify calls an optional
callback
proc you can pass into therun!
method. - Syncify actually bulk inserts all of the identified records into the local database.
By providing a callback
proc, you can take some sort of action after all of the remote data has been identified, but before you write it locally. This includes modifying the remote data (in memory, not actually in the remote database).
Here's an example that masks personally identifiable information for users:
Syncify::Sync.run!(klass: User,
id: 40,
remote_database: :production,
callback:
proc do |identified_records|
user = identified_records.find { |record| record.class == User }
user.first_name = "#{user.first_name.first}#{'*' * (user.first_name.size - 1)}"
user.last_name = "#{user.last_name.first}#{'*' * (user.last_name.size - 1)}"
end
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/dhughes/syncify.
License
The gem is available as open source under the terms of the MIT License.