Uncsv

A parser for unruly CSVs

Parse CSVs with heirarchical headers and duplicated headers. Skip lines by line number, etc.

Documentation

Read below to get started, or see the API Documentation for more details.

Installation

Add this line to your application's Gemfile:

gem 'uncsv'

And then execute:

bundle

Or install it yourself as:

gem install uncsv

Usage

Reading a CSV with Uncsv is similar to using Ruby's built-in CSV class. Create a new instance of Uncsv and pass it a String or IO. The second argument is an options hash, see below.

require 'uncsv'

data = "A,B,C\n1,2,3"
csv = Uncsv.new(data, header_rows: 0)
csv.map do { |row| row['B'] }

Opening a File

Uncsv can read directly from the filesystem with the open method.

Uncsv.open('my_data.csv')

Enumerable Methods

Uncsv is an Enumerable. All enumerable methods like each, map, reduce, etc. are supported.

data = "A,B,C\n1,2,3\n4,5,6"
csv = Uncsv.new(data, header_rows: 0)
c_total = csv.reduce do { |sum, row| sum + row['C'] }

Options

The following options can be passed as a hash to the second argument of the Uncsv constructor, or set inside the constructor block.

Uncsv.new(data, skip_blanks: true)

# Is equivalent to
Uncsv.new(data) do |config|
  config.skip_blanks = true
end

Uncsv Options

:expand_headers: Default false. If set to true, blank header row cells will assume the header of the row to their left. This is useful for heirarchical headers where not all the header cells are filled in. If set to an array of header indexes, only the specified headers will be expanded.
:header_rows: Default []. Can be set to either a single row index or an array of row indexes. For example, it could be set to 0 to indicate a header in the first row. If set to an array of indexes ([1,2]), the header row text will be joined by the :header_separator. For example, if if the cell (0,0) had the value "Personal" and cell (1,0) had the value "Name", the header would become "Personal.Name". Any data above the last header row will be ignored.
:header_separator: Default ".". When using multiple header rows, this is a string used to separate the individual header fields.
:nil_empty: Default true. If true, empty cells will be set to nil, otherwise, they are set to an empty string.
:normalize_headers: Default false. If set to true, header field text will be normalized. The text will be lowercased, and non-alphanumeric characters will be replaced with underscores (_). If set to a string, those characters will be replaced with the string instead. If set to a hash, the hash will be treated as options to KeyNormalizer, accepting the :separator, and :downcase options. If set to another object, it is expected to respond to the normalize(key) method by returning a normalized string.
:skip_blanks: Default false. If true, rows whose fields are all empty will be skipped.
:skip_rows: Default []. If set to an array of row indexes, those rows will be skipped. This option does not apply to header rows.
:unique_headers: Default false. If set to true, headers will be forced to be unique by appending numbers to duplicates. For example, if two header cells have the text "Name", the headers will become "Name.0", and "Name.1". The separator between the text and the number can be set using the :header_separator option.

Options from Std-lib CSV

See the documentation for Ruby's built-in CSV class for the following options.

:col_sep
:field_size_limit
:quote_char
:row_sep
:skip_blanks

Development

After checking out the repo, run bundle to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To check your work, run bin/rspec run the tests and bin/rubocop to check style. To generate a code coverage report, set the COVERAGE environment variable when running the tests.

COVERAGE=1 bin/rspec
bin/rubocop

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/nullscreen/uncsv.