WeaviateRecord
An ORM for Weaviate
vector database that follows the same conventions as the ActiveRecord
and brings the power of Vector database and Retrieval augmented generation (RAG) to your Ruby application.
This gem uses weaviate-ruby internally to connect with Weaviate
DB.
Installation
gem install weaviate_record
Or you can add it your Gemfile
with:
bundle add weaviate_record
Prerequisites
WeaviateRecord
needs a weaviate database running in your local machine or cloud. For creating an weaviate instance on your local machine, please use weaviate's official configurator.
After creating an instance, set an env variable WEAVIATE_DATABASE_URL
with the database url. If you have authentication enabled on weaviate, set the API key to WEAVIATE_API_KEY
.
If you want to use different vectorizer module instead of transformers, please set the env variable WEAVIATE_VECTORIZER_MODULE
to your model and WEAVIATE_VECTORIZER_API_KEY
to your module's API key.
Configuration
You can configure the WeaviateRecord
gem by creating an initializer or setup file with following code:
WeaviateRecord.configure do |config|
# Sync the local schema with actual schema whenever this file is loaded if this value is set to true
# Default value: false
config.sync_schema_on_load = true
# Threshold for similarity searches
# Default value: 0.55
config.similarity_search_threshold = 1.0
# The file path where WeaviateRecord stores the local copy of your Weaviate database schema.
# If Rails is installed in your project, the default value is "#{Rails.root}/db/weaviate/schema.rb"
# Otherwise, the default value is "#{Dir.pwd}/db/weaviate/schema.rb"
config.schema_file_path = "#{Rails.root}/db/weaviate/schema.rb"
end
Creating Collection in Weaviate
WeaviateRecord
does not have a separate DSL for creating collection like ActiveRecord
. However there are two things you have to keep in mind while creating a collection.
- you should add indexTimestamps and indexNullState to your collection schema. Otherwise, timestamps and null based conditions won't work.
WeaviateRecord::Connection.new.client.create(
class_name: 'Article',
properties: [...],
inverted_index_config: {
"indexNullState": true,
"indexTimestamps": true
}
)
Note: You can create a new Weaviate::Client
instance by calling #client
method on any WeaviateRecord::Connection
instances. These object will automatically use the values you assigned on env variables.
- Wherever you are modifying
Weaviate
schema, be it in rake or migration, or any other file, be sure to call the methodWeaviateRecord::Schema.update!
. It will automatically update your local copy of the database schema.
Usage
To use the WeaviateRecord
for your model, simply inherit the base class.
WeaviateRecord
mixins ActiveModel::Validations
, so you can also add validations as you do for ActiveRecord
models.
class Article < WeaviateRecord::Base
validate :title, presence: true
end
And that's all. Now, you can create and modify weaviate
records as you do in the ActiveRecord
. The syntax is exactly same with few naunces.
Below are all the basic methods defined for CRUD operations. Their syntax and their behaviour is same as their ActiveRecord
equivalent
For batch operations,
- destory_all (Needs atleast one where condition)
For query interface, we have
For debugging purposes, there is one method called #to_query which behaves likes #to_sql
in ActiveRecord
.
All the above methods work exactly the same way those ActiveRecord
methods do. Apart from these, all the methods comes from ActiveModel::Validations
and Enumerable
modules are also available, and then there are few other methods where Weaviate
truly shines.
Keyword Search
To use the weaviate's special keyword based search on your model, there is one method called #bm25. There are some limitations you might be facing while using #bm25
. Notable one is that you cannot chain #count
or #order
method with #bm25
.
Article.bm25('keyword').count # bm25 will be ignored here
Article.bm25('keyword').order # order will be ignored here
There are some scenarios where bm25
search does overfitting. To mitigate that, you can query the meta attribute score
along with the search and filter them once again for relevance.
Article.select(_additional: :score).bm25('You Keyword').take_while do |article|
article.score >= KEYWORD_SEARCH_THRESHOLD
end
Similarity Search
Weaviate offers similarity or vector based search in three ways. You can do it with text, vector or object. Similarily, WeaviateRecord
comes with three methods.
It is important to specify the threshold distance whenever you are using similarity search. Otherwise, you search will not be much relevant. You can do it by either passing distance
parameter to the search or by setting the default value for all three searches in the config.
QnA Transformers - #ask
and #answer
If you have enabled QnA Transformers
in your weaviate database, you can use the #ask method and get an answer
attribute like this:
Article.create(content: "I'm Barney Stinson. You can call me Legendary")
Article.ask('who is he').select(_additional: { answer: :result }).first.answer
# => {"result"=>"barney stinson"}
And just like that, you can easily brings the RAG
to you Ruby application.
Summarizer - #summary
If you have enabled Sum Transformers
in your weaviate database, you can summarize the attribute holding the large text like movie review or article summary. Summarizer don't have its own method for now. However, you can call it by doing little work around on #select
method.
content = <<~TEXT
Ruby on Rails (simplified as Rails) is a server-side web application framework written in Ruby under the MIT License.
Rails is a model–view–controller (MVC) framework, providing default structures for a database, a web service, and web pages.
It encourages and facilitates the use of web standards such as JSON or XML for data transfer and HTML, CSS and JavaScript for user interfacing.
In addition to MVC, Rails emphasizes the use of other well-known software engineering patterns and paradigms, including convention over configuration (CoC), don't repeat yourself (DRY), and the active record pattern.
TEXT
article = Article.create(content: content)
results = Article.where(id: article.id)
.select(_additional: 'summary(properties: ["content"]) { result }')
.first.summary
puts results
Output:
[{"result"=>
"Rails is a server-side web application framework written in Ruby under the MIT License. It is a model–view–controller (MVC) framework, providing default structures for a database, a web service, and web pages. It encourages and facilitates the use of web standards such as HTML, CSS and JavaScript."}]
Limitations
WeaviateRecord
is not yet fully featured ORM like ActiveRecord
. It doesn't support association, DSL or way to write and handle migrations yet.
Support
Feel free to open an issue or PR if you notice any feature is missing or wrong. Happy coding 🎉