Cumo
Cumo (pronounced "koomo") is a CUDA-aware, GPU-optimized numerical library that offers a significant performance boost over Ruby Numo, while (mostly) maintaining drop-in compatibility.
Requirements
- Ruby 2.5 or later
- NVIDIA GPU Compute Capability 3.5 (Kepler) or later
- CUDA 9.0 or later
Preparation
Install CUDA and set your environment variables as follows:
export CUDA_PATH="/usr/local/cuda"
export CPATH="$CUDA_PATH/include:$CPATH"
export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$CUDA_PATH/lib:$LD_LIBRARY_PATH"
export PATH="$CUDA_PATH/bin:$PATH"
export LIBRARY_PATH="$CUDA_PATH/lib64:$CUDA_PATH/lib:$LIBRARY_PATH"
To use cuDNN features, install cuDNN and set your environment variables as follows:
export CUDNN_ROOT_DIR=/path/to/cudnn
export CPATH=$CUDNN_ROOT_DIR/include:$CPATH
export LD_LIBRARY_PATH=$CUDNN_ROOT_DIR/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$CUDNN_ROOT_DIR/lib64:$LIBRARY_PATH
FYI: I use cudnnenv to install cudnn under my home directory like export CUDNN_ROOT_DIR=/home/sonots/.cudnn/active/cuda
.
Installation
Add the following line to your Gemfile:
gem 'cumo'
And then execute:
$ bundle
Or install it yourself as:
$ gem install cumo
How To Use
Quick start
An example:
[1] pry(main)> require "cumo/narray"
=> true
[2] pry(main)> a = Cumo::DFloat.new(3,5).seq
=> Cumo::DFloat#shape=[3,5]
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]
[3] pry(main)> a.shape
=> [3, 5]
[4] pry(main)> a.ndim
=> 2
[5] pry(main)> a.class
=> Cumo::DFloat
[6] pry(main)> a.size
=> 15
Switching from Numo to Cumo
The following find-and-replace should just work:
find . -type f | xargs sed -i -e 's/Numo/Cumo/g' -e 's/numo/cumo/g'
If you want to dynamically switch between Numo and Cumo, something like the following will work:
if gpu
require 'cumo/narray'
xm = Cumo
else
require 'numo/narray'
xm = Numo
end
a = xm::DFloat.new(3,5).seq
Incompatibility With Numo
The following methods behave incompatibly with Numo by default for performance reasons:
extract
[]
count_true
count_false
Numo returns a Ruby numeric object for 0-dimensional NArray, while Cumo returns the 0-dimensional NArray instead of a Ruby numeric object. Cumo differs in this way to avoid synchronization and minimize CPU ⇄ GPU data transfer.
Set the CUMO_COMPATIBLE_MODE
environment variable to ON
to force Numo NArray compatibility (for worse performance).
You may enable or disable compatible_mode
as:
require 'cumo'
Cumo.enable_compatible_mode # enable
Cumo.compatible_mode_enabled? #=> true
Cumo.disable_compatible_mode # disable
Cumo.compatible_mode_enabled? #=> false
You can also use the following methods which behave like Numo's NArray methods. The behavior of these methods does not depend on compatible_mode
.
extract_cpu
aref_cpu(*idx)
count_true_cpu
count_false_cpu
Select a GPU device ID
Set the CUDA_VISIBLE_DEVICES=id
environment variable, or
require 'cumo'
Cumo::CUDA::Runtime.cudaSetDevice(id)
where id
is an integer.
Disable GPU Memory Pool
GPU memory pool is enabled by default. To disable it, set CUMO_MEMORY_POOL=OFF
, or:
require 'cumo'
Cumo::CUDA::MemoryPool.disable
Documentation
See https://github.com/ruby-numo/numo-narray#documentation, replacing Numo with Cumo.
Contributions
This project is under active development. See issues for future works.
Development
Install ruby dependencies:
bundle install --path vendor/bundle
Compile:
bundle exec rake compile
Run tests:
bundle exec rake test
Generate docs:
bundle exec rake docs
Advanced Development Tips
ccache
ccache would be useful to speedup compilation time. Install ccache and configure with:
export PATH="$HOME/opt/ccache/bin:$PATH"
ln -sf "$HOME/opt/ccache/bin/ccache" "$HOME/opt/ccache/bin/gcc"
ln -sf "$HOME/opt/ccache/bin/ccache" "$HOME/opt/ccache/bin/g++"
ln -sf "$HOME/opt/ccache/bin/ccache" "$HOME/opt/ccache/bin/nvcc"
Build in parallel
Set MAKEFLAGS
to specify make
command options. You can build in parallel as:
bundle exec env MAKEFLAG=-j8 rake compile
Specify nvcc --generate-code options
bundle exec env CUMO_NVCC_GENERATE_CODE=arch=compute_60,code=sm_60 rake compile
This is useful even on development because it makes it possible to skip JIT compilation of PTX to cubin during runtime.
Run tests with gdb
Compile with debugging enabled:
bundle exec DEBUG=1 rake compile
Run tests with gdb:
bundle exec gdb -x run.gdb --args ruby test/narray_test.rb
You may put a breakpoint by calling cumo_debug_breakpoint()
at C source codes.
Run tests only a specific line
--location
option is available as:
bundle exec ruby test/narray_test.rb --location 121
Compile and run tests only a specific type
DTYPE
environment variable is available as:
bundle exec DTYPE=dfloat rake compile
bundle exec DTYPE=dfloat ruby test/narray_test.rb
Run program always synchronizing CPU and GPU
bundle exec CUDA_LAUNCH_BLOCKING=1
Show GPU synchronization warnings
Cumo shows warnings if CPU and GPU synchronization occurs if:
export CUMO_SHOW_WARNING=ON
By default, Cumo shows warnings that occurred at the same place only once. To show all, multiple warnings, set:
export CUMO_SHOW_WARNING=ON
export CUMO_SHOW_WARNING_ONCE=OFF
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/sonots/cumo.
License
Related Materials
- Fast Numerical Computing and Deep Learning in Ruby with Cumo - Presentation Slide at RubyKaigi 2018