Repository is archived
No commit activity in last 3 years
No release in over 3 years
Converts Southeaster Daily Performance reports from HTML to CSV
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 0

Runtime

 Project Readme

Intro

Southeastern publish Daily Performance reports. They're not very usable as they are. This project contains a library, and some tools, to convert that data to splendid CSV.

If you're interested in analysing the data (rather than using this library to convert it) then checkout the southeastern daily performance website.

Installation

$ gem install southeastern-daily-performance

Usage

# To convert a single html report to csv
$ sedpr-to-csv <location-of-html>

# To convert a directory containing multiple html reports to multiple csv reports
$ convert-all-html-data <location-of-html-reports> <location-of-csv-output>

Examples

Explicitly download html and convert local file

$ curl "http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132" > sedpr.html
$ sedpr-to-csv sedpr.html

Implicitly download html and convert

$ sedpr-to-csv http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132

Notes

Combining all csv files into one big file

$ echo "Date,Problem,Scheduled departure time,Scheduled departure station,Scheduled arrival station,Affect on service" > combined.csv
$ cat /path/to/csv/files/*.csv >> combined.csv

Combining all csv overview files into one file

$ echo "Date,Services scheduled,Services run,Services within 5 minutes of schedule" > combined.overview.csv
$ cat /path/to/csv/files/*.overview.csv >> combined.overview.csv

TODO

  • 2010-04-21 breaks the parser...

  • I should now be in a position to use origin and destination stations rather than using the affect on service to parse the routes. This should remove these warnings: "Warning. Unknown, or missing, affect on service: '10:24 Dover Priory - Charing Cross'"

  • Don't generate empty csv files when the data doesn't exist (e.g 1st-5th dec 2010)

  • Don't generate csv overview files containing 0s when the data doesn't exist (e.g. 1st-5th dec 2010) - i.e. emit a warning if the necessary data can't be found.