Project

daru_lite

0.0
A long-lived project that still receives updates
Daru (Data Analysis in RUby) is a library for analysis, manipulation and visualization of data. Daru works seamlessly accross interpreters and leverages interpreter-specific optimizations whenever they are available. It is the default data storage gem for all the statsample gems (glm, timeseries, etc.) and can be used with many others like mixed_models, gnuplotrb and iruby. Daru Lite is a fork of Daru that aims to focus on data manipulation and stability.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 2.6
~> 0.4.2
~> 0.1.2
~> 13.0
~> 3.11
~> 1.60
~> 1.7.0
~> 0.22.0
~> 1.3.0
~> 3.3.4
>= 1.0.0
~> 1.18.0
~> 2.0.0
~> 2.6.0
~> 3.25.0
 Project Readme

daru Lite - Data Analysis in RUby Lite

Simple, straightforward DataFrames for Ruby

Build Status Gem Version

Introduction

daru Lite is a library for data analysis and manipulation in Ruby.

This project started as fork of Daru with the objective to provide :

  • a simple and yet powerfull interface to manipulate data using DataFrames
  • a API consistent with the one historically provided by daru
  • a focus on the core features around data manipulation, droped several cumbersome daru dependencies and the associated features : notably N-Matrix, GSL, R, imagemagick and all plotting libraries. The current project has no major dependencies
  • build a future-proof library that can safely be used in production

Installation

$ gem install daru_lite

or add daru Lite to your Gemfile:

$ bundle add daru_lite

Basic Usage

daru Lite exposes two major data structures: DataFrame and Vector. The Vector is a basic 1-D structure corresponding to a labelled Array, while the DataFrame - daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.

Basic DataFrame intitialization.

data_frame = DaruLite::DataFrame.new(
  {
    'Beer' => ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],
    'Gallons sold' => [500, 400, 450, 200, 250]
  },
  index: ['India', 'China', 'USA', 'Malaysia', 'Canada']
)
data_frame

init0

Load data from CSV files.

df = DaruLite::DataFrame.from_csv('TradeoffData.csv')

init1

Basic Data Manipulation

Selecting rows.

data_frame.row['USA']

man0

Selecting columns.

data_frame['Beer']

man1

A range of rows.

data_frame.row['India'..'USA']

man2

The first 2 rows.

data_frame.first(2)

man3

The last 2 rows.

data_frame.last(2)

man4

Adding a new column.

data_frame['Gallons produced'] = [550, 500, 600, 210, 240]

man5

Creating a new column based on data in other columns.

data_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']

man6

Condition based selection

Selecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by Arel, i.e. by using the where clause.

data_frame.where(data_frame['Gallons sold'].lt(300))

con0

You can pass a combination of boolean operations into the #where method and it should work fine:

data_frame.where(
  data_frame['Beer']
  .in(['Snow', 'Kingfisher','Tiger Beer'])
  .and(
    data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))
  )
)

con1

Documentation

Docs can be found here.