Returns a list which General Categories a Unicode string belongs to.
Unicode version: 16.0.0 (September 2024)
Supported Rubies: 3.3, 3.2, 3.1, 3.0
Old Rubies that might still work: 2.7, 2.6, 2.5, 2.4, 2.3, 2.X
Gemfile
gem "unicode-categories"
Usage
require "unicode/categories"
# All general categories of a string
Unicode::Categories.categories("A 2") # => ["Lu", "Nd", "Zs"]
Unicode::Categories.categories("A 2", format: :long)
# => ["Decimal_Number", "Space_Separator", "Uppercase_Letter"]
# Also aliased as .of
Unicode::Categories.of("\u{10c50}") # => ["Cn"]
# Single character
Unicode::Categories.category("☼", format: :long) # => "Other_Symbol"
The list of categories is always sorted alphabetically.
Hints
Regex Matching
If you have a string and want to match a substring/character from a specific Unicode block, you actually won't need this gem. Instead, you can use the Regexp Unicode Property Syntax \p{}
:
"Find decimal numbers (like 2 or 3) within a string".scan(/\p{Nd}+/) # => ["2", "3"]
See Idiosyncratic Ruby: Proper Unicoding for more info.
List of General Categories
You can retrieve a list of all General Categories like this:
require "unicode/categories"
puts \
"Short | Long\n" +
"------|-----\n" +
Unicode::Categories.names(format: :table).to_a.map{ |r| " %s | %s" % r }.join("\n")
Short | Long |
---|---|
Cc | Control |
Cf | Format |
Cn | Unassigned |
Co | Private_Use |
Cs | Surrogate |
LC | Cased_Letter |
Ll | Lowercase_Letter |
Lm | Modifier_Letter |
Lo | Other_Letter |
Lt | Titlecase_Letter |
Lu | Uppercase_Letter |
Mc | Spacing_Mark |
Me | Enclosing_Mark |
Mn | Nonspacing_Mark |
Nd | Decimal_Number |
Nl | Letter_Number |
No | Other_Number |
Pc | Connector_Punctuation |
Pd | Dash_Punctuation |
Pe | Close_Punctuation |
Pf | Final_Punctuation |
Pi | Initial_Punctuation |
Po | Other_Punctuation |
Ps | Open_Punctuation |
Sc | Currency_Symbol |
Sk | Modifier_Symbol |
Sm | Math_Symbol |
So | Other_Symbol |
Zl | Line_Separator |
Zp | Paragraph_Separator |
Zs | Space_Separator |
See unicode-x for more Unicode related micro libraries.
MIT License
- Copyright (C) 2016-2024 Jan Lelis https://janlelis.com. Released under the MIT license.
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1