Project

xlearn

0.02
Repository is archived
No release in over 3 years
Low commit activity in last 3 years
High performance factorization machines for Ruby
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

Runtime

>= 0
 Project Readme

xLearn Ruby

xLearn - the high performance machine learning library - for Ruby

Supports:

  • Linear models
  • Factorization machines
  • Field-aware factorization machines

Build Status

Installation

Add this line to your application’s Gemfile:

gem "xlearn"

Getting Started

Prep your data

x = [[1, 2], [3, 4], [5, 6], [7, 8]]
y = [1, 2, 3, 4]

Train a model

model = XLearn::Linear.new(task: "reg")
model.fit(x, y)

Use XLearn::FM for factorization machines and XLearn::FFM for field-aware factorization machines

Make predictions

model.predict(x)

Save the model to a file

model.save_model("model.bin")

Load the model from a file

model.load_model("model.bin")

Save a text version of the model

model.save_txt("model.txt")

Pass a validation set

model.fit(x_train, y_train, eval_set: [x_val, y_val])

Train online

model.partial_fit(x_train, y_train)

Get the bias term, linear term, and latent factors

model.bias_term
model.linear_term
model.latent_factors # fm and ffm only

Parameters

Pass parameters - default values below

XLearn::FM.new(
  task: "binary",      # binary (classification), reg (regression)
  metric: nil,         # acc, prec, recall, f1, auc, mae, mape, rmse, rmsd
  lr: 0.2,             # learning rate
  lambda: 0.00002,     # lambda for l2 regularization
  k: 4,                # latent factors for fm and ffm
  alpha: 0.3,          # hyper parameter for ftrl
  beta: 1.0,           # hyper parameter for ftrl
  lambda_1: 0.00001,   # hyper parameter for ftrl
  lambda_2: 0.00002,   # hyper parameter for ftrl
  epoch: 10,           # number of epochs
  fold: 3,             # number of folds
  opt: "adagrad",      # sgd, adagrad, ftrl
  block_size: 500,     # block size for on-disk training in MB
  early_stop: true,    # use early stopping
  stop_window: 2,      # size of stop window for early stopping
  sign: false,         # convert predition output to 0 and 1
  sigmoid: false,      # convert predition output using sigmoid
  seed: 1              # random seed to shuffle data set
)

Cross-Validation

Cross-validation

model.cv(x, y)

Specify the number of folds

model.cv(x, y, folds: 5)

Data

Data can be an array of arrays

[[1, 2, 3], [4, 5, 6]]

Or a Numo array

Numo::NArray.cast([[1, 2, 3], [4, 5, 6]])

Or a Rover data frame

Rover.read_csv("houses.csv")

Or a Daru data frame

Daru::DataFrame.from_csv("houses.csv")

Performance

For large datasets, read data directly from files

model.fit("train.txt", eval_set: "validate.txt")
model.predict("test.txt")
model.cv("train.txt")

For linear models and factorization machines, use CSV:

label,value_1,value_2,...,value_n

Or the libsvm format (better for sparse data):

label index_1:value_1 index_2:value_2 ... index_n:value_n

You can also use commas instead of spaces for separators

For field-aware factorization machines, use the libffm format:

label field_1:index_1:value_1 field_2:index_2:value_2 ...

You can also use commas instead of spaces for separators

You can also write predictions directly to a file

model.predict("test.txt", out_path: "predictions.txt")

Credits

This library is modeled after xLearn’s Scikit-learn API.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/xlearn-ruby.git
cd xlearn-ruby
bundle install
bundle exec rake vendor:all
bundle exec rake test