A UTF-8 Validator State Machine¶ ↑
Provides an implementation of a state machine for validating UTF-8 encoded strings. Clients may request that encoding errors be reported in several ways:
simple true / false indicator
a raised exception
What This gem does Not Provide¶ ↑
UTF-8 Encoding
UTF-8 Decoding
That functionality is left as an exercise for the reader.
Thanks To¶ ↑
- The Unicode Consortium
At unicode.org/ for all the information published there.
- Frank Yung-Fong Tang
For the state machine algorithm. See: unicode.org/mail-arch/unicode-ml/y2003-m02/att-0467/01-The_Algorithm_to_Valide_an_UTF-8_String
- Markus Kuhn
For invalid test data. www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
Useful Information¶ ↑
Other interesting and/or useful information can be found:
A Word On Ruby Versions¶ ↑
It is expected that this validator will be used in Ruby environments prior to 1.9.x. However, nothing prohibits use with Ruby 1.9 or 2.0. Tests recognize these environments and adjust behavior accordingly.
Copyright¶ ↑
Copyright © 2011-2014 Guy Allard. See LICENSE.txt for further details.