LIBMF Ruby
LIBMF - large-scale sparse matrix factorization - for Ruby
Check out Disco for higher-level collaborative filtering
Installation
Add this line to your application’s Gemfile:
gem "libmf"
Getting Started
Prep your data in the format row_index, column_index, value
data = Libmf::Matrix.new
data.push(0, 0, 5.0)
data.push(0, 2, 3.5)
data.push(1, 1, 4.0)
Create a model
model = Libmf::Model.new
model.fit(data)
Make predictions
model.predict(row_index, column_index)
Get the latent factors (these approximate the training matrix)
model.p_factors
model.q_factors
Get the bias (average of all elements in the training matrix)
model.bias
Save the model to a file
model.save("model.txt")
Load the model from a file
model = Libmf::Model.load("model.txt")
Pass a validation set
model.fit(data, eval_set: eval_set)
Cross-Validation
Perform cross-validation
model.cv(data)
Specify the number of folds
model.cv(data, folds: 5)
Parameters
Pass parameters - default values below
Libmf::Model.new(
loss: :real_l2, # loss function
factors: 8, # number of latent factors
threads: 12, # number of threads used
bins: 25, # number of bins
iterations: 20, # number of iterations
lambda_p1: 0, # coefficient of L1-norm regularization on P
lambda_p2: 0.1, # coefficient of L2-norm regularization on P
lambda_q1: 0, # coefficient of L1-norm regularization on Q
lambda_q2: 0.1, # coefficient of L2-norm regularization on Q
learning_rate: 0.1, # learning rate
alpha: 1, # importance of negative entries
c: 0.0001, # desired value of negative entries
nmf: false, # perform non-negative MF (NMF)
quiet: false # no outputs to stdout
)
Loss Functions
For real-valued matrix factorization
-
:real_l2
- squared error (L2-norm) -
:real_l1
- absolute error (L1-norm) -
:real_kl
- generalized KL-divergence
For binary matrix factorization
-
:binary_log
- logarithmic error -
:binary_l2
- squared hinge loss -
:binary_l1
- hinge loss
For one-class matrix factorization
-
:one_class_row
- row-oriented pair-wise logarithmic loss -
:one_class_col
- column-oriented pair-wise logarithmic loss -
:one_class_l2
- squared error (L2-norm)
Metrics
Calculate RMSE (for real-valued MF)
model.rmse(data)
Calculate MAE (for real-valued MF)
model.mae(data)
Calculate generalized KL-divergence (for non-negative real-valued MF)
model.gkl(data)
Calculate logarithmic loss (for binary MF)
model.logloss(data)
Calculate accuracy (for binary MF)
model.accuracy(data)
Calculate MPR (for one-class MF)
model.mpr(data, transpose)
Calculate AUC (for one-class MF)
model.auc(data, transpose)
Example
Download the MovieLens 100K dataset and use:
require "csv"
train_set = Libmf::Matrix.new
valid_set = Libmf::Matrix.new
CSV.foreach("u.data", col_sep: "\t").with_index do |row, i|
data = i < 80000 ? train_set : valid_set
data.push(row[0].to_i, row[1].to_i, row[2].to_f)
end
model = Libmf::Model.new(factors: 20)
model.fit(train_set, eval_set: valid_set)
puts model.rmse(valid_set)
Performance
For performance, read data directly from files
model.fit("train.txt", eval_set: "validate.txt")
model.cv("train.txt")
Data should be in the format row_index column_index value
:
0 0 5.0
0 2 3.5
1 1 4.0
Numo
Get latent factors as Numo arrays
model.p_factors(format: :numo)
model.q_factors(format: :numo)
Resources
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/libmf-ruby.git
cd libmf-ruby
bundle install
bundle exec rake vendor:all
bundle exec rake test