Project

uniscribe

0.08
Low commit activity in last 3 years
A long-lived project that still receives updates
Explains Unicode characters/code points: Displays their name, category, and shows compositions
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 0.9, < 3.0
~> 2.0, >= 2.0.1
~> 1.4
>= 3.5, < 5.0
~> 1.13, >= 1.13.1
 Project Readme

uniscribe | Describe the Unicode [version] [ci]

Describes Unicode characters with their name and shows compositions. UNICODE 16.0*

  • Helps you understand how glyphs and codepoints are structured within the data
  • Gives you the names of glyphs and codepoints, which can be used for further research
  • Highlights invalid/special/blank codepoints

Uses a similar color coding like its lower-level companion tool unibits.

Setup

Make sure you have Ruby installed and installing gems works properly. Then do:

$ gem install uniscribe

Usage

Pass the string to debug to uniscribe:

From CLI

$ uniscribe "test strı̈ng"

From Ruby

require "uniscribe/kernel_method"
uniscribe "test strı̈ng"

Output


0074 ├─ t		├─ LATIN SMALL LETTER T
0065 ├─ e		├─ LATIN SMALL LETTER E
0073 ├─ s		├─ LATIN SMALL LETTER S
0074 ├─ t		├─ LATIN SMALL LETTER T
0020 ├─ ] [		├─ SPACE
0073 ├─ s		├─ LATIN SMALL LETTER S
0074 ├─ t		├─ LATIN SMALL LETTER T
0072 ├─ r		├─ LATIN SMALL LETTER R
---- ├┬ ı̈		├┬ Composition
0131 │├─ ı		│├─ LATIN SMALL LETTER DOTLESS I
0308 │└─ ◌̈		│└─ COMBINING DIAERESIS
006E ├─ n		├─ LATIN SMALL LETTER N
0067 ├─ g		├─ LATIN SMALL LETTER G

Examples

Tamil

>> uniscribe "நகரத்தில்"

Screenshot Tamil

Thai

>> uniscribe "ม้าลายหกตัว"

Screenshot Thai

Ideographic Variations

>> uniscribe "辻󠄀㚑󠄁"

Screenshot Ideographic Variations

(the variation is not visible in the screenshot, because my system does not render it correctly)

Emoji Sequences

>> uniscribe "3️⃣🤸‍♀"

Screenshot Emoji

Lots of Combining Marks

>> uniscribe "̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍"

Screenshot Marks

Random Sequences of some Special Unicode Codepoints

>> uniscribe "\0A\u{E01D7}\x7F\r\n\u{D0000}\u{81}\u{FFF9}B\u{FFFB}🏴\u{E0061}\u{E007F}\u{10FFFF}"

Screenshot Strange

Some Blanks

>> uniscribe "­ᅠ 𝅸"

Screenshot Blanks

*Notes

Although the gem is generally up to date with Unicode 16.0, the proper detection of compositions / graphemes / combined characters depends on your Ruby version:

You can run uniscribe -v to check for the Unicode level of your uniscribe version.

Also see

Copyright (C) 2017-2024 Jan Lelis https://janlelis.com. Released under the MIT license.