Project

unbork

0.0
No commit activity in last 3 years
No release in over 3 years
Based on a blog post by Rowan Thorpe (http://rowanthorpe.wordpress.com/2012/10/15/unmangle-utf-8-from-double-encoded-utf-8-my-shell-script-and-batch-script-tweaks/), forcefully replaces broken UTF-8 encoded as pairs of Latin-1 characters. Includes a verbatim copy of Rowan's sed script.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

 Project Readme

unbörk

Unmangles your UTF-8 encoded strings stored in MySQL's default Latin-1 Swedish encoding.

This is just a small wrapper around the concepts in Rowan Thorpe's blog post, and includes a verbatim copy of his sed rules file.

Use it when working with legacy databases created with the wrong internal encodings. Fixing the database itself is always the best option, but it might be infeasible due to many reasons: downtime, legacy applications, lack of administrative access.

Includes most Latin scripts, Greek and some miscellaneous characters. Adding Cyryllic and other scripts should be fairly trivial.

Usage

irb> t = User.find(...).full_name
"Bartek Urbański"
irb> require 'unbork'
irb> unbork t
"Bartek Urbański"