HTMLEntities
The canonical source for this project can be found at GitHub: threedaymonk/htmlentities.
HTML entity encoding and decoding for Ruby
HTMLEntities is a simple library to facilitate encoding and decoding of named
(ý
and so on) or numerical ({
or Ī
) entities in HTML
and XHTML documents.
Usage
HTMLEntities works with UTF-8 (or ASCII) strings only.
Please ensure that your system is set to display UTF-8 before running these
examples. In Ruby 1.8, you'll need to set $KCODE = "u"
.
Decoding
require 'htmlentities'
coder = HTMLEntities.new
string = "élan"
coder.decode(string) # => "élan"
Encoding
This is slightly more complicated, due to the various options. The encode method takes a variable number of parameters, which tell it which instructions to carry out.
require 'htmlentities'
coder = HTMLEntities.new
string = "<élan>"
Escape unsafe codepoints only:
coder.encode(string) # => "<élan>"
Or:
coder.encode(string, :basic) # => "<élan>"
Escape all entities that have names:
coder.encode(string, :named) # => "<élan>"
Escape all non-ASCII/non-safe codepoints using decimal entities:
coder.encode(string, :decimal) # => "<élan>"
As above, using hexadecimal entities:
coder.encode(string, :hexadecimal) # => "<élan>"
You can also use several options, e.g. use named entities for unsafe codepoints, then decimal for all other non-ASCII:
coder.encode(string, :basic, :decimal) # => "<élan>"
Flavours
HTMLEntities knows about three different sets of entities:
-
:xhtml1
– Entities from the XHTML1 doctype -
:html4
– Entities from the HTML4 doctype. Differs from +xhtml1+ only by the absence of +&apos+ -
:expanded
– Entities from a variety of SGML sets
The default is :xhtml
, but you can override this:
coder = HTMLEntities.new(:expanded)
Licence
This code is free to use under the terms of the MIT licence. See the file COPYING.txt for more details.
Contact
Send email to pbattley@gmail.com
.