Simple Spreadsheet Extractor¶ ↑
- Authors
-
Stuart Owen, Finn Bacall
- Version
-
0.17.0
- Contact
- Licence
-
BSD (See LICENCE or www.opensource.org/licenses/bsd-license.php)
- Copyright
-
© 2010-2015 The University of Manchester, UK
<img src=“https://codeclimate.com/github/myGrid/simple-spreadsheet-extractor-gem/badges/gpa.svg” />
Synopsis¶ ↑
This is a simple gem that provides a facility to read an XLS or XLSX Excel spreadsheet document and produce an XML representation of its content.
CSV output can also be generated for a single sheet.
Internally it uses Apache POI, using the sister github.com/myGrid/simple-spreadsheet-extractor tool.
This is a simple tool developed for use within SysMO-DB.
Installation¶ ↑
Java 8 or above (JRE) is required.
gem install simple-spreadsheet-extractor
*Note that Windows is no longer supported* (since version 0.7.2.1)
Usage¶ ↑
-
require ‘simple-spreadsheet-extractor’
-
include the module SysMODB::SpreadsheetExtractor
-
pass an IO object to the method spreedsheet_to_xml which responds with the XML for the contents of the spreadsheet. Alternatively use spreadsheet_to_csv for CSV.
-
you can now also pass in the filepath to the Excel file instead of an IO object
-
-
if something goes wrong with the extraction then a SysMODB::SpreadsheetExtractionException will be thrown
-
by default the JVM is allocated 512M of memory, you can override this by passing a string as the last argument. This will be passed to -Xmx in the java command.
e.g.
#examples/example.rb - takes path, i.e. ruby example.rb /tmp/spreadsheet.xls require 'rubygems' require 'simple-spreadsheet-extractor' include SysMODB::SpreadsheetExtractor path=ARGV.first f=open path begin puts spreadsheet_to_xml(f) rescue SysMODB::SpreadsheetExtractionException=>e puts "Something went wrong #{e.message}" end
Formulas are evaluated placing the result in the XML produced for that cell, however the original formula is included as an attribute.
Row and column indexes start at 1, rather than 0, to keep consistent with namings of the cells in Excel.
An XSD schema for the XML is available in doc/schema-v1.xsd
The desired spreadsheet extractor jar can be specified by defining SPREADSHEET_EXTRACTOR_JAR_PATH in a config file (e.g. environment.rb)
CSV can be generated in a similar way, and also takes an optional sheet number. If the sheet number is missing then the first sheet is used.
Note that the sheet number for the first sheet is 1, and can either be a string or integer.
e.g.
puts spreadsheet_to_csv(f,"1")
Example XML¶ ↑
<?xml version="1.0" encoding="UTF-8"?> <workbook xmlns="http://www.sysmo-db.org/2010/xml/spreadsheet"> <named_ranges/> <styles> <style id="style21"> <border-top>dotted 1pt</border-top> <border-bottom>dotted 1pt</border-bottom> <border-left>dotted 1pt</border-left> <border-right>dotted 1pt</border-right> <background-color>#ffcc99</background-color> </style> <style id="style22"> <border-top>dotted 1pt</border-top> <border-bottom>dotted 1pt</border-bottom> <border-left>dotted 1pt</border-left> <border-right>dotted 1pt</border-right> <font-weight>bold</font-weight> <font-style>italics</font-style> <text-decoration>underline</text-decoration> </style> <style id="style23"> <color>#2323dc</color> </style> </styles> <sheet name="Sheet1" index="1" hidden="false" very_hidden="false"> <data_validations/> <columns first_column="1" last_column="3"> <column index="1" column_alpha="A" width="2048"/> <column index="2" column_alpha="B" width="2048"/> <column index="3" column_alpha="C" width="2048"/> </columns> <rows first_row="1" last_row="6"> <row index="1"> <cell column="1" column_alpha="A" row="1" type="numeric">13.0</cell> <cell column="2" column_alpha="B" row="1" type="numeric">654153.0</cell> <cell column="27" column_alpha="AA" row="1" type="string">AA</cell> <cell column="28" column_alpha="AB" row="1" type="string">AB</cell> <cell column="53" column_alpha="BA" row="1" type="string">BA</cell> <cell column="54" column_alpha="BB" row="1" type="string">BB</cell> <cell column="55" column_alpha="BC" row="1" type="string">BC</cell> </row> <row index="2"> <cell column="1" column_alpha="A" row="2" type="numeric" style="style21">547654.0</cell> </row> <row index="3"> <cell column="1" column_alpha="A" row="3" type="numeric">45465.0</cell> </row> <row index="4" height="14.05pt"> <cell column="1" column_alpha="A" row="4" type="numeric" style="style22" formula="A1+1">14.0</cell> </row> <row index="6"> <cell column="1" column_alpha="A" row="6" type="datetime" style="style23" formula="DATE(2009,6,15)">2009-06-15T0:0:0+0100</cell> </row> </rows> </sheet> <sheet name="Sheet2" index="2" hidden="false" very_hidden="false"> <data_validations/> <columns first_column="1" last_column="1"> <column index="1" column_alpha="A" width="2048"/> </columns> <rows first_row="1" last_row="1"/> </sheet> <sheet name="Sheet3" index="3" hidden="false" very_hidden="false"> <columns first_column="1" last_column="1"> <column index="1" column_alpha="A" width="2048"/> </columns> <rows first_row="1" last_row="1"/> </sheet> </workbook>