Profanity Filter
Strategies to publish offensive texts online can be roughly grouped into 5 categories:
- Similarities, eg. b ⇔ 6
- Diacritics(sound alteration), eg. u ⇔ ü, ù, ú
- Constructions(multi-part), eg. W ⇔ VV, V ⇔ /
- Injections, eg. s-h-i-t, shhhhhhhhhhhit
- Unicode(same shape but different unicode), eg ⒜, ⍺, a, 𝐚, 𝑎, 𝒂, 𝒶, 𝓪, 𝔞, 𝕒, 𝖆, 𝖺, 𝗮, 𝘢, 𝙖
This profanity filter implements:
- [Full Support] diacritics, injections, unicode
- [Partial Support] similarities, constructions
This gem is also integrated with WebPurify. Usage example below.
Installation
Add this line to your application's Gemfile:
gem 'profanity-filter', '~> 1.0'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install profanity-filter
Versioning
Version 1.0 onward is not compatible with previous versions. See [changelog(https://github.com/cardinalblue/profanity-filter/blob/master/CHANGELOG.md)] for details.
Usage
In your Ruby code,
# basic usage
pf = ProfanityFilter.new
pf.profane? ('ssssshit')
# => true
pf.profanity_count('fjsdio fdsk fU_cK_THIS_shI_T')
# => 2
If we want to integrate WebPurify,
# with WebPurify
pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
With WebPurify enabled, texts sent to profane?
and profanity_count
will first be checked against the mechanism this gem provides, then against WebPurify if no positive results are returned.
Strategies
There are four different strategies
that we can compose to our heart's content.
-
:partial_match
will flag a text as profane if any substrings of it is in our dictionary. -
:allow_symbol
will flag a text as profane if any word in the text matches our dictionary after removing the symbols. -
:duplicate_characters
will flag a text as profane if any word in the text matches our dictionary after removing duplications. -
:leet
will flag a text as profane if any word in the text matches our dictionary after substituting similar unicode characters with their letter correspondents.
Config
By default, the profanity filter implements :partial_match
and :allow_symbol
strategies. But we can specify what strategies we want:
pf = ProfanityFilter.new
# type :basic is the default
pf.profane?('test_string', strategies: :basic)
pf.profanity_count('test_string', strategies: :basic)
# type :all includes all four strategies
pf.profane?('test_string', strategies: :all)
pf.profanity_count('test_string', strategies: :all)
# compose our own
pf.profane?('test_string', strategies: [:partial_match, :leet])
pf.profanity_count('test_string', strategies: [:partial_match, :leet])
Also, we also supports whitelist
, specified at object creation.
pf_with_whitelist = ProfanityFilter.new(whitelist: ['asshole'])
pf_with_whitelist.profane?('asshole')
# => false
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/cardinalblue/profanity-filter. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the ProfanityFilter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.