Process Structured CSV files (`structured_csv`)

Purpose

The structured_csv_to_yaml script converts a “Structured CSV” file into a YAML file.

When you have data of a yet-undefined data structure, it is useful to manage them inside a CSV file which can be viewed and edited by a CSV editor, such as Excel.

This is extremely useful in developing a normalized structure for such data, as you can ensure that the existing data can be normalized according to a defined structure.

Ultimately, the data is to be meant to exported to a YAML file.

This script supports UTF-8 CSV files.

Note	This was originally developed to create over 50 normalized data models for ITU Operational Bulletin data. See https://github.com/ituob/ for more details.

Installation

Add this line to your application’s Gemfile:

gem 'structured_csv'

and then run:

bundle install

Or install it without a Gemfile:

gem install structured_csv

Usage

$ structured_csv_to_yaml [input-file.csv]

Where,

input-file.csv: is the input CSV file, the output will be named as input-file.yaml.

Details

A Structured CSV file has these properties:

Two structured sections. A section is defined by the first column on an otherwise empty row that is either the first row or a row preceded by an empty row. Two section types are allowed: METADATA and DATA.

The METADATA section has values organized like key-value pairs:

Column 1 is the name of key
Column 2 is the value

The key can be a normal string or namespaced:

foobar, this maps to the YAML key foobar:
foo.bar.boo, this maps to the YAML structure:
```
foo:
  bar:
    boo:
```

A typical YAML output is like:

---
metadata:
  locale:
    bar:
      en: beef
      fr: boeuf
      jp: 牛肉
data:
  foo:
    bar:
    ...

A sample METADATA section looks like this table:

METADATA
locale.bar.en	beef
locale.bar.fr	boeuf
locale.bar.jp	牛肉

And generates this YAML:

---
metadata:
  locale:
    bar:
      en: beef
      fr: boeuf
      jp: 牛肉

The DATA section has values organized in a table form. The first row is the header row. The first column is assumed to be the key.

A sample DATA section looks like this table:

DATA
foo.bar.en	foo.bar.fr	foo.bar.jp	description
beef	boeuf	牛肉	Yummy!
pork	porc	豚肉	Delicious!

By default, this table generates this YAML format:

---
data:
  beef:
    foo:
      bar:
        en: beef
        fr: boeuf
        jp: 牛肉
    description: Yummy!
  pork:
    foo:
      bar:
        en: pork
        fr: porc
        jp: 豚肉
    description: Delicious!
  ...

In cases where there is no DATA key, you have to specify the type=array to generate an array:

DATA	type=array
foo.bar.en	foo.bar.fr	foo.bar.jp	description
beef	boeuf	牛肉	Yummy!
pork	porc	豚肉	Delicious!

---
data:
  - foo:
      bar:
        en: beef
        fr: boeuf
        jp: 牛肉
    description: Yummy!
  - foo:
      bar:
        en: pork
        fr: porc
        jp: 豚肉
    description: Delicious!
  ...

You are also allowed to specify the data types of columns. The types of string, boolean and integer are supported.

DATA
foo.bar.en[string]	foo.bar.fr[string]	yummy[boolean]	availability[integer]
beef	boeuf	TRUE	3
pork	porc	FALSE	10

---
data:
  beef:
    foo:
      bar:
        en: beef
        fr: boeuf
    yummy: true
    availability: 3
  pork:
    foo:
      bar:
        en: pork
        fr: porc
    yummy: false
    availability: 10
  ...

Examples

The samples/ folder contains a number of complex examples.

structured_csv

Development

Runtime

Process Structured CSV files (`structured_csv`)

Purpose

Installation

Usage

Details

Examples

structured_csv

Development

Runtime

Process Structured CSV files (structured_csv)

Purpose

Installation

Usage

Details

Examples

Process Structured CSV files (`structured_csv`)