Project

unbork

0.0

No commit activity in last 3 years

No release in over 3 years

unbork k3rni/unbork Homepage Documentation Source Code Bug Tracker Wiki

Based on a blog post by Rowan Thorpe (http://rowanthorpe.wordpress.com/2012/10/15/unmangle-utf-8-from-double-encoded-utf-8-my-shell-script-and-batch-script-tweaks/), forcefully replaces broken UTF-8 encoded as pairs of Latin-1 characters. Includes a verbatim copy of Rowan's sed script.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

8,126

Releases

1.0.1

2013-03-27

2013-03-27

Development

Primary Language

Ruby

Licenses

MIT

Average date of last 50 commits

2013-03-27

Reverse Dependencies

Dependencies

Development

bundler

>= 0

gemcutter

>= 0

jeweler

>= 0

simplecov

>= 0

Project Readme

unbörk

Unmangles your UTF-8 encoded strings stored in MySQL's default Latin-1 Swedish encoding.

This is just a small wrapper around the concepts in Rowan Thorpe's blog post, and includes a verbatim copy of his sed rules file.

Use it when working with legacy databases created with the wrong internal encodings. Fixing the database itself is always the best option, but it might be infeasible due to many reasons: downtime, legacy applications, lack of administrative access.

Includes most Latin scripts, Greek and some miscellaneous characters. Adding Cyryllic and other scripts should be fairly trivial.

Usage

irb> t = User.find(...).full_name
"Bartek UrbaÅ„ski"
irb> require 'unbork'
irb> unbork t
"Bartek Urbański"