Multibases
Multibase is a protocol for disambiguating the encoding of base-encoded (e.g., base32, base64, base58, etc.) binary appearing in text.
Multibases
is the ruby implementation of multiformats/multibase.
This gem can be used both for encoding into or decoding from multibase packed
strings, as well as serve as a general purpose library to do BaseX
encoding
and decoding without adding the prefix.
🙌🏽 This is called
multibases
instead of the singular form, to stay consistent with themultihashes
gem, which was forced to take a different name hasmultihash
was already taken, which is also the case formultibase
and others. In the future, this might be renamed tomultiformats-base
, with a backwards-compatible interface.
Installation
Add this line to your application's Gemfile:
gem 'multibases'
or alternatively if you would like to bring your own engines and not load any of the built-in ones:
gem 'multibases', require: 'multibases/bare'
And then execute:
$ bundle
Or install it yourself as:
$ gem install multibases
Usage
This is a low-level library, but high level implementations are provided. You can also bring your own encoder/decoder. The most important methods are:
-
Multibases.encode(encoding, data, engine?)
: encodes the given data with a built-in engine for encoding, or engine if it's given. Returns anEncoded
PORO that haspack
. -
Multibases.unpack(packed)
: decodes a multibase packed string into anEncoded
PORO that hasdecode
. -
Multibases::Encoded.pack
: packs the multihash into a single string -
Multibases::Encoded.decode(engine?)
: decodes the PORO's data using a built-in engine, or engine if it's given. Returns a decodedByteArray
.
encoded = Multibases.encode('base2', 'mb')
# => #<struct Multibases::Encoded
# code="0", encoding="base2", length=16,
# data=[Multibases::EncodedByteArray "0110110101100010"]>
encoded.pack
# => [Multibases::EncodedByteArray "00110110101100010"]
encoded = Multibases.unpack('766542')
# => #<struct Multibases::Encoded
# code="7", encoding="base8", length=5,
# data=[Multibases::EncodedByteArray "66542"]>
encoded.decode
# => [Multibases::DecodedByteArray "mb"]
This means that the flow of calls is as follows:
data ➡️ (encode) ➡️ encoded data
encoded data ➡️ (pack) ➡️ multibasestr
multibasestr ➡️ (unpack) ➡️ encoded data
encoded data ➡️ (decode) ➡️ data
Convenience methods are provided:
-
Multibases.pack(encoding, data, engine?)
: callsencode
and thenpack
-
Multibases.decode(packed, engine?)
: callsunpack
and thendecode
Multibases.pack('base2', 'mb')
# => [Multibases::EncodedByteArray "00110110101100010"]
ByteArrays and encoding
As you can see, the "final" methods output a ByteArray
. These are simple
DelegateClass
wrappers around the array with bytes, which means that the hex
encoding of hello
is not actually stored as "f68656c6c6f"
:
packed = Multibases.pack('base16', 'hello')
# => [Multibases::EncodedByteArray "f68656c6c6f"]
packed.to_a # .__getobj__.dup
# => [102, 54, 56, 54, 53, 54, 99, 54, 99, 54, 102]
They override inspect
and force the encoding to UTF-8
(in inspect), but
you can use the convenience methods to use the correct encoding:
Note: If you're using
pry
and have not changed the printer, you naturally won't see the output as described above, but instead see the inner Array of bytes, always.
data = 'hello'.encode('UTF-16LE')
data.encoding
# => #<Encoding:UTF-16LE>
data.bytes
# => [104, 0, 101, 0, 108, 0, 108, 0, 111, 0]
packed = Multibases.pack('base16', data)
# => [Multibases::EncodedByteArray "f680065006c006c006f00"]
decoded = Multibases.decode(packed)
# => [Multibases::DecodedByteArray "h e l l o "]
decoded.to_s('UTF-16LE')
# => "hello"
Implementations
You can find the current multibase table here. At this moment, built-in engines are provided as follows:
encoding | code | description | implementation |
---|---|---|---|
identity | 0x00 | 8-bit binary | bare |
base1 | 1 | unary (1111) | ❌ |
base2 | 0 | binary (0101) |
base2 💨 |
base8 | 7 | octal | base_x |
base10 | 9 | decimal | base_x |
base16 | f | hexadecimal |
base16 💨 |
base16upper | F | hexadecimal |
base16 💨 |
base32hex | v | rfc4648 no padding - highest char |
base32 ✨ |
base32hexupper | V | rfc4648 no padding - highest char |
base32 ✨ |
base32hexpad | t | rfc4648 with padding |
base32 ✨ |
base32hexpadupper | T | rfc4648 with padding |
base32 ✨ |
base32 | b | rfc4648 no padding |
base32 ✨ |
base32upper | B | rfc4648 no padding |
base32 ✨ |
base32pad | c | rfc4648 with padding |
base32 ✨ |
base32padupper | C | rfc4648 with padding |
base32 ✨ |
base32z | h | z-base-32 (used by Tahoe-LAFS) |
base32 ✨ |
base58flickr | Z | base58 flicker | base_x |
base58btc | z | base58 bitcoin | base_x |
base64 | m | rfc4648 no padding |
base64 💨 |
base64pad | M | rfc4648 with padding - MIME enc |
base64 💨 |
base64url | u | rfc4648 no padding |
base64 💨 |
base64urlpad | U | rfc4648 with padding |
base64 💨 |
Those with a 💨 are marked because they are backed by a C implementation (using
pack
and unpack
) and are therefore suposed to be blazingly fast. Those with
a ✨ are marked because they have a custom implementation over the generic
base_x
implementation. It should be faster.
The version of the spec that this repository was last updated for is available
via Multibases.multibase_version
:
Multibases.multibase_version
# => "1.0.0"
Bring your own engine
The methods of multibases
allow you to bring your own engine, and you can safe
additional memory by only loading multibases/bare
.
# Note: This is not how multibase was meant to work. It's supposed to only
# convert the input from one base to another, and denote what that base
# is, stored in the output. However, the system is _so_ flexible that this
# works perfectly for any reversible transformation!
class EngineKlazz
def initialize(*_)
end
def encode(plain)
plain = plain.bytes unless plain.is_a?(Array)
Multibases::EncodedByteArray.new(plain.reverse)
end
def decode(encoded)
encoded = encoded.bytes unless encoded.is_a?(Array)
Multibases::DecodedByteArray.new(encoded.reverse)
end
end
Multibases.implement 'reverse', 'r', EngineKlazz, 'alphabet'
# => Initializes EngineKlazz with 'alphabet'
Multibases.pack('reverse', 'md')
# => [Multibases::EncodedByteArray "rdm"]
Multibases.decode('dm')
# => [Multibases::DecodedByteArray "md"]
# Alternatively, you can pass the instantiated engine to the appropriate
# function.
engine = EngineKlazz.new
# Mark the encoding as "existing" and attach a code
Multibases.implement 'reverse', 'r'
# Pack, using a custom engine
Multibases.pack('reverse', 'md', engine)
# => [Multibases::EncodedByteArray "rdm"]
Multibases.decode('rdm', engine)
# => [Multibases::DecodedByteArray "md"]
Using the built-in encoders/decoders
You can use the built-in encoders and decoders.
require 'multibases/base16'
Multibases::Base16.encode('foobar')
# => [Multibases::EncodedByteArray "666f6f626172"]
Multibases::Base16.decode('666f6f626172')
# => [Multibases::DecodedByteArray "foobar"]
These don't add the multibase
prefix to the output and they use the canonical
encode
and decode
nomenclature.
The base_x
/ BaseX
encoder does not have a module function. You must
instantiate it first. The result is an encoder that uses the base alphabet to
determine its base. Currently padding is ❌ not supported for BaseX
, but
might be in a future update using a second argument or key.
require 'multibases/base_x'
Base3 = Multibases::BaseX.new('012')
# => [Multibases::Base3 alphabet="012" strict]
Base3.encode('foobar')
# => [Multibases::EncodedByteArray "112202210012121110020020001100"]
You can use the same technique to inject a custom alphabet. This can be used on
the built-in encoders, even the ones that are not BaseX
:
base = Multibases::Base2.new('.!')
# => [Multibases::Base2 alphabet=".!"]
base.encode('foo')
# [Multibases::EncodedByteArray ".!!..!!..!!.!!!!.!!.!!!!"]
base.decode('.!!...!..!!....!.!!!..!.')
# => [Multibases::DecodedByteArray "bar"]
All the built-in encoder/decoders take strings, arrays or byte-arrays as input.
expected = Multibases::Base16.encode("abc")
# => [Multibases::EncodedByteArray "616263"]
expected == Multibases::Base16.encode([97, 98, 99])
# => true
expected == Multibases::Base16.encode(Multibases::ByteArray.new("abc".bytes))
# => true
Related
-
multiformats/multibase
: the spec repository -
multiformats/ruby-multicodec
: the ruby implementation ofmultiformats/multicodec
-
multiformats/ruby-multihash
: the ruby implementation ofmultiformats/multihash
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run
rake test
to run the tests. You can also run bin/console
for an interactive
prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
.
To release a new version, update the version number in version.rb
, and then
run bundle exec rake release
, which will create a git tag for the version,
push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at SleeplessByte/ruby-multibase. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Shrine::ConfigurableStorage project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.