IndoorVoice: Lowercase all-caps strings excluding acronyms
DOES YOUR DATA CONTAIN ALL-CAPS TEXT THAT YOU WISH WAS PROPERLY CASED?
Have your data use its indoor voice.
require 'open-uri'
require 'indoor_voice'
# You can use any word list. Here we use Scrabble words.
url = 'https://scrabblehelper.googlecode.com/svn/trunk/ScrabbleHelper/src/dictionaries/TWL06.txt'
words = open(url).readlines.map(&:chomp)
# You can use any language. :en is the BCP 47 code for English.
model = IndoorVoice.new(words, :en)
model.setup # wait a moment
model.downcase('HP, IBM AND MICROSOFT ARE TECHNOLOGY CORPORATIONS.')
# => "HP, IBM and microsoft are technology corporations."
model.titlecase('HP, IBM AND MICROSOFT ARE TECHNOLOGY CORPORATIONS.')
# => "HP, IBM And Microsoft Are Technology Corporations."
model.titlecase('HP, IBM AND MICROSOFT ARE TECHNOLOGY CORPORATIONS.', except: %w(a an and as at but by en for if in of on or the to via))
# => "HP, IBM and Microsoft Are Technology Corporations."
model.titlecase('HP, IBM AND MICROSOFT ARE TECHNOLOGY CORPORATIONS.', except: words)
# => "HP, IBM and Microsoft are technology corporations."
IndoorVoice is based on the assumption that most acronyms contain non-word character sequences. For example, no English word has the character sequence bm
in a word-final position, therefore IBM
must be an acronym.
Once you have a string with only acronyms in uppercase, you can (in your own code) selectively uppercase letters, like the first letter in each sentence, or the first letter of each word. Since most titlecasing gems recase acronyms, IndoorVoice provides its own titlecase
method.
Why?
No gem for titlecasing dealt with acronyms well. In case this gem doesn't suit your needs, see:
- titleize, titlecase, title_case and gruber-case, based on TitleCase.pl by John Gruber
- namecase, based on Lingua::EN::NameCase by Mark Summerfield
- clever_title
Copyright (c) 2015 James McKinney, released under the MIT license