CassandraStore
CassandraStore is a fun to use ORM for Cassandra with a chainable, ActiveRecord like DSL for querying, inserting, updating and deleting records plus built-in migration and keyspace management support. It is built on-top of the cassandra-driver gem, using its built-in automated paging what is drastically reducing the complexity of the code base.
Install
Add this line to your application's Gemfile:
gem 'cassandra_store'
And then execute:
$ bundle
Or install it yourself as:
$ gem install cassandra_store
Usage
Connecting
First and foremost, you need to connect to your cassandra cluster like so:
CassandraStore::Base.configure(
hosts: ["127.0.0.1"],
keyspace: "my_keyspace",
cluster_settings: { consistency: :quorum }
)When using rails, you want to do that in an initializer. If you do not yet have
a keyspace, you additionally want to pass replication settings:
CassandraStore::Base.configure(
hosts: ["127.0.0.1"],
keyspace: "my_keyspace",
cluster_settings: { consistency: :quorum },
replication: { class: 'SimpleStrategy', replication_factor: 1 }
)Afterwards, you can create/drop the specified keyspace:
rake cassandra:keyspace:create
rake cassandra:keyspace:dropMigrations
If you are on rails and you don't have any tables yet, you can add migrations now. There is no generator yet, so you have to create them manually:
# cassandra/migrate/1589896040_create_posts.rb
class CreatePosts < CassandraStore::Migration
def up
execute <<-CQL
CREATE TABLE posts (
user TEXT,
domain TEXT,
id TIMEUUID,
message TEXT,
PRIMARY KEY ((user, domain), id)
)
CQL
end
def down
execute "DROP TABLE posts"
end
endAfterwards, simply run rake cassandra:migrate.
Models
Creating models couldn't be easier:
class Post < CassandraStore::Base
column :user, :text, partition_key: true
column :domain, :text, partition_key: true
column :id, :timeuuid, clustering_key: true
column :message, :text
validates_presence_of :user, :domain, :message
before_create do
self.id ||= generate_timeuuid
end
endLet's check this out in detail:
column :user, :text, partition_key: true
column :domain, :text, partition_key: truetells CassandraStore that your partition key is comprised of the user column
as well as the domain column. For more information regarding partition keys
and the data model of cassandra, please check out the cassandra docs. Afterwards,
the clustering/sorting key is specified via:
column :id, :timeuuid, clustering_key: trueThe id is assigned here:
self.id ||= generate_timeuuidPlease note, CassandraStore never auto-assigns any values for you, but you
have to assign them. You can pass a timestamp to generate_timeuuid as well:
generate_timeuuid(Time.now)This is desirable when you have timestamp columns as well and you want them to match with your timeuuid key.
Similarly, when using UUID instead of TIMEUUID you have to use
generate_uuid instead.
In addition, you can of course use all kinds of validations, hooks, etc.
Querying
The interface for dealing with records and querying them is very similar
to the interface of ActiveRecord:
Post.create!(user: "mrkamel", ...)
Post.create(...)
Post.new(...).save
Post.new(...).save!
Post.first.delete
Post.first.destroyCassandraStore supports comprehensive query methods in a chainable way:
all
Post.allwhere
Post.where(user: "mrkamel", domain: "example.com")where_cql
Post.where_cql("user = :user", user: "mrkamel")limit
Post.where(...).limit(10)order
Post.where(...).order(id: "asc")distinct
Post.select(:user, :domain).distinctselect
Post.select(:user, :domain)Please note, when using select in the end an array of hashes will be returned
instead of an array of Post objects.
count
Post.where(...).countfirst
Post.where(...).firstfind_each
Post.where(...).find_each(batch_size: 100) do |post|
# ...
endfind_in_batches
Post.where(...).find_in_batches(batch_size: 100) do |batch|
# ...
endupdate_all
Post.where(...).update_all("message = 'test'")
Post.where(...).update_all(message: "test")delete_all
Post.where(...).delete_allPlease note, that delete_in_batches will run find_in_batches iteratively
and then delete each batch. When dealing with large amounts of records to
delete you usually want to use delete_in_batches instead of delete_all, as
delete_all can time out.
delete_in_batches
Post.where(...).delete_in_batchesAgain, please note, that delete_in_batches will run find_in_batches iteratively
and then delete each batch. When dealing with large amounts of records to
delete you usually want to use delete_in_batches instead of delete_all, as
delete_all can time out.
truncate_table
Post.truncate_tableDeletes all records from the table. This is much faster than delete_all or
delete_in_batches. However, it is not chainable, such that your only option
is to remove all records from the table.
Semantic Versioning
CassandraStore is using Semantic Versioning: SemVer
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request