ActiveID: binary UUIDs for ActiveRecord
Rationale
If you search for “uuid” in RubyGems, you’ll get 142 results (as of January 2019)… Most are outdated, all are far from our needs. Yes, we think we need another one.
ℹ️
|
ActiveID has evolved from a popular, but no longer maintained ActiveUUID gem, and its forks. From 2018 the gem was entirely rewritten to support newer Rails releases, and was finally detached as a fork in 2020 to prevent confusion from users. We thank Nate for bringing ActiveID to life! |
Storing UUIDs as binaries in MySQL and SQLite3
UUIDs are 16 bytes long, however their human-readable string representation takes 36 characters. As a consequence, storing UUIDs in a human-readable format is space inefficient. What is worse, whereas table row size is seldom a big concern, size of table indices is significant — the bigger part of given index fits in RAM, the faster it works. And UUID columns are commonly indexed…
ℹ️
|
Another performance boost for UUIDs version 1 can be achieved by bits rearrangement. This is not implemented yet, see issue #43. |
This gem brings an easy-to-use ability to efficiently store UUIDs in databases which do not provide a dedicated UUID data type (i.e. MySQL, MariaDB, SQLite3, etc.).
Database-agnosticism
This gem provides a uniform API for storing UUIDs in database, be it MariaDB, MySQL, SQLite3 (in binary or string format), or PostgreSQL (native data type). This is especially important when using it as a dependency of another gem.
Monkey patching is optional
No core feature relies on Rails monkey patching. Monkey patches can interfere with other gems, and lead to issues. Nevertheless, some convenient features (currently, migration methods only) are provided via monkey patching. Enabling them is entirely optional, and their absence can be workarounded easily.
Strings are not perfect for UUIDs
Although UUIDs are commonly represented as strings, it is beneficial to introduce a dedicated class for following reasons:
-
not every sequence of 16 bytes (or 32 hexadecimal digits) makes a valid UUID
-
UUIDs are not opaque, they have their inner structure which can be accessed (reading timestamp from time-based UUIDs is especially useful)
-
using string equality operator for UUID comparison may give wrong results (up-cased or lowercased strings, with dashes or without)
It is somewhat similar case to URIs, which also can be represented as plain
strings, but having a dedicated URI
class is quite convenient.
ActiveID uses UUID
class from UUIDTools gem to represent
UUIDs.
Usage
💡
|
You may want to explore examples directory, in which typical use cases are covered in bit more detail. |
Installation
Depending on you want to apply monkey patches or not, require either
activeid/all
(with monkey patches) or activeid
(without them).
For example, if you are using Gemfile
:
gem "activeid" # without monkey patches
# or
gem "activeid", require: "activeid/all" # with monkey patches
Depending on your needs, you can also pick monkey patches selectively — just
take a look at the contents of lib/activeid/all.rb
. However, currently
it is not very useful, as there is very little to choose from.
Adding UUIDs to models
ActiveID relies on ActiveRecord’s attributes API. Two attribute types are
defined: StringUUID
and BinaryUUID
.
StringUUID serializes UUIDs as 36 characters long strings. It is compatible
with textual SQL types, e.g. VARCHAR(36)
, and more importantly, with
PostgreSQL-specific UUID
type.
BinaryUUID serializes UUIDs as 16 bytes long binaries, which can be stored
in binary columns, e.g. BLOB(16)
in SQLite3 or VARBINARY(16)
in MySQL.
However, it is not compatible with PostgreSQL at all due to syntax differences.
See "Choosing between string and binary serialization" section for a brief
explanation of pros and cons of both approaches.
Whichever attribute type you prefer to use, an ActiveID::Model
module must
be included in model.
For example, following example model stores two UUID attributes: id
,
and thread_id
as binaries.
class Work < ActiveRecord::Base
include ActiveID::Model
attribute :id, ActiveID::Type::BinaryUUID.new
attribute :author_id, ActiveID::Type::BinaryUUID.new
belongs_to :author
end
Database migrations
A convenience #uuid
method is added via monkey patching to Active Record’s
Table
and TableDefinition
classes.
-
In MySQL adapter, it stands for a
VARBINARY(16)
column. -
In SQLite3 adapter, it stands for a
BLOB(16)
column. -
In PostgreSQL adapter, it is shadowed by a stock Rails method
::ActiveRecord::ConnectionAdapters::PostgreSQL::ColumnMethods:uuid
, which stands for aUUID
column.
If you want to use UUID column in your primary key, pass :id ⇒ false
option
to create_table
method and :primary_key ⇒ true
to column definition.
For example:
class CreateWorks < ActiveRecord::Migration
def change
create_table :works, id: false, force: true do |t|
t.uuid :id, primary_key: true
t.uuid :author_id, index: true
t.string :title
t.timestamps
end
end
end
Alternatively, if monkey patches are disabled, #uuid
method can be substituted
with #binary
in MySQL and SQLite3 adapters. Following snippet is equivalent
to the above one in these two adapters. Please note :limit ⇒ 16
, which is
passed as an option.
class CreateWorks < ActiveRecord::Migration
def change
create_table :works, id: false, force: true do |t|
t.binary :id, limit: 16, primary_key: true
t.binary :author_id, limit: 16, index: true
t.string :title
t.timestamps
end
end
end
Registering UUID types in Active Record’s type registry
For convenience, Active UUID types can be added to Active Record’s type registry. Then you can reference them in your models with a symbol. See Rails API docs for detailed information.
For example, following will register ActiveID::Type::BinaryUUID
at :uuid
symbol for all adapters except for PostgreSQL, in which this symbol is already
taken:
ActiveRecord::Type.register(
:uuid,
ActiveID::Type::BinaryUUID,
)
With above set, only symbol needs to be specified in attribute declaration, as in following example:
class Author < ActiveRecord::Base
include ActiveID::Model
attribute :id, :uuid
end
It is also possible to override :uuid
in PostgreSQL adapter:
ActiveRecord::Type.register(
:uuid,
ActiveID::Type::StringUUID,
adapter: :postgresql,
override: true,
)
🔥
|
Overriding standard attribute types may cause other gems to behave abnormally. |
Using UUIDs as primary keys
When model’s primary key is a UUID, Active UUID automatically generates its value as a version 1, 4, or 5 UUID:
-
Version 1 UUIDs store timestamp of their creation, and are monotonically increasing in time. This is very advantageous in some use cases.
-
Version 4 UUIDs are pseudo-randomly generated.
-
Version 5 UUIDs are generated deterministically via SHA-1 hashing from values of specified attributes, and UUID namespace. They are well-suited for natural keys.
UUIDs of all versions can be explicitly assigned to attributes.
Random primary keys (version 4 UUIDs)
If model’s primary key is a UUID, a version 4 UUID is generated by default. For example:
class Author < ActiveRecord::Base
include ActiveID::Model
attribute :id, ActiveID::Type::StringUUID.new
end
Time-based primary keys (version 1 UUIDs)
They are enabled for model’s primary key with #uuid_generator
method.
For example:
class Author < ActiveRecord::Base
include ActiveID::Model
attribute :id, ActiveID::Type::StringUUID.new
uuid_generator :time
end
Name-based primary keys a.k.a. natural keys (version 5 UUIDs)
They are enabled for model’s primary key by passing attribute names to
#natural_key
method, and namespace to #uuid_namespace
method. The latter
method accepts only UUIDs, either in string format, or a UUIDTools::UUID
object. If #uuid_namespace
method is omitted, then ISO OID namespace is used.
In following example, a natural key in a6908e1e-5493-4c55-a11d-cd8445654de6
namespace will be build of values of author_id
, and title
attributes.
class Work < ActiveRecord::Base
include ActiveID::Model
attribute :id, ActiveID::Type::BinaryUUID.new
attribute :author_id, ActiveID::Type::BinaryUUID.new
belongs_to :author
natural_key :author_id, :title
uuid_namespace "a6908e1e-5493-4c55-a11d-cd8445654de6"
end
Choosing between string and binary serialization
ActiveID allows you to choose between two ways of UUID serialization: as 36 characters long string, or as 16 bytes long binary.
In PostgreSQL, the answer is easy: you should always choose string
serialization. It perfectly works with native UUID
data type, which is
a non-standard feature of PostgreSQL. It also works with textual data types
(i.e. VARCHAR
, TEXT
, etc.), but a UUID
type seems to be a better choice
for performance reasons. Because of special syntax requirements in PostgreSQL,
it does not work with binary types (i.e. BYTEA
), however it seems to be
a neglect-able issue, as UUID
type is more suitable. Please open an issue
if you disagree.
In other RDBSs, either human-readability, or performance must be sacrificed.
With binary serialization, UUIDs are stored in a space-efficient way as 16 bytes long binaries. This is especially beneficial when column is indexed, which is a very common case. Smaller value size means that a bigger piece of index can be kept in RAM, which often leads to a significant performance boost. The downside is that this representation is difficult to read for humans, who access serialized values outside Rails (e.g. in a database console, or in database logs). See also an excellent article "Store UUID in an optimized way" in Percona blog for more information about storing UUIDs as binaries.
With string serialization, UUIDs are stored as 36 characters long strings, which
consist only of lowercase hexadecimal digits, and dashes
(xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
). They are easy to read for humans, but
may hamper performance of indices, especially in case of large tables.
Reading binary UUIDs in a database console
MySQL features a BIN_TO_UUID()
function, which converts binary
UUIDs to their human-readable string representation. There is
a feature request to add a similar feature to MariaDB.
Contributing
First, thank you for contributing! We love pull requests from everyone. By participating in this project, you hereby grant Ribose Inc. the right to grant or transfer an unlimited number of non exclusive licenses or sub-licenses to third parties, under the copyright covering the contribution to use the contribution by all means.
Here are a few technical guidelines to follow:
-
Open an issue to discuss a new feature.
-
Write tests to support your new feature.
-
Make sure the entire test suite passes locally and on CI.
-
Open a Pull Request.
-
After receiving feedback, perform an interactive rebase on your branch, in order to create a series of cohesive commits with descriptive messages.
-
Party!
Credits
This gem is developed, maintained and funded by Ribose Inc.
The ActiveID gem which ActiveID was based on has been developed by Nate Murray with notable help of:
-
pyromaniac
-
Andrew Kane
-
Devin Foley
-
Arkadiy Zabazhanov
-
Jean-Denis Koeck
-
Florian Staudacher
-
Schuyler Erle
-
Florian Schwab
-
Thomas Guillory
-
Daniel Blanco Rojas
-
Olivier Amblet
License
The gem is available as open source under the terms of the MIT License.