No commit activity in last 3 years
No release in over 3 years
High-level CSV parser
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

 Project Readme

CSVParser

Parse CSV rows by defining parsing blocks for individual columns.

Each row is parsed one-by-one. First a new Hash is initialized to store data for the row. Then, each individual column is parsed by calling matching parsing blocks. Parsing blocks are passed the column's value and header key and can set arbitrary state on the Hash for the current row.

Installation

$ gem install parshap-csv_parser

Usage

Given a data.csv file:

Name,Phone
john doe,555-481-2345
jane c doe,555-123-4567

You can parse it like so:

require "csv_parser"

class MyParser < CSVParser
  parse "Name" do |val|
    self[:first_name], self[:last_name] = val.split(nil, 2)
  end
  
  parse /Phone( Number)?/ do |val|
    self[:phone] = val
  end
end

MyParser.new(CSV.open "data.csv").each do |row|
  puts "#{row[:first_name]}, #{row[:last_name]}: #{row[:phone]}"
end

See example.rb and test.rb for more examples.

Defining Parsers

Parsing blocks are added using the CSVParser.parse class method. The first and only parameter, case, determines if the block should be executed for a particular column (by using the === operator with the column's header value). The block is passed the column value and its associated header value. The block can update the values for the current row by using self as a Hash.

Column and header values are always converted to strings and stripped of whitespace first.

class MyParser < CSVParser
  parse /^(first|last)?\W*name$/i do |val|
    self[:name] = val.capitalize
  end
end

Once Parsers

Using CSVParser.parse_once, you can define parsers that will only be called once per row, for the first matching column. In the above example, if parse_once was used, the block would only be called once even with the occurrence of multiple name columns.

Default Row Values

The CSVParser#defaults method is used to generate a hash to use for each row. You can use this to set default values.

class MyParser < CSVParser
  def defaults
    { name: "User", emails: [] }
  end
end

CSV Instance

The CSVParser constructor takes an instance of the Ruby Standard Library CSV class. This object can be created in any way, but the :headers option must be false.

Tests

$ ruby test.rb