Sycamore
"The Egyptians' Holy Sycamore also stood on the threshold of life and death, connecting the two worlds."
-- Wikipedia: Tree of Life
Sycamore is an implementation of an unordered tree data structure.
Features:
- easy, hassle-free access to arbitrarily deep nested elements
- grows automatically when needed
- familiar Hash interface
- no more
nil
-induced errors
Imagine a Sycamore tree as a recursively nested set. The elements of this set, called nodes, are associated with a child tree of additional nodes and so on. This might be different to your usual understanding of a tree, which has to have one single root node, but this notion is much more general. The usual tree is just a special case with just one node at the first level. But I prefer to think of the root to be implicit. Effectively every object is a tree in this sense. You can assume self
to be the implicit root.
Restrictions:
- Only values you would use as keys of a hash should be used as nodes of a Sycamore tree. Although Ruby's official Hash documentation says a Hash allows you to use any object type, one is well advised to use immutable objects only. Enumerables as nodes are explicitly excluded by Sycamore.
- The nodes are unordered and can't contain duplicates.
- A Sycamore tree is uni-directional, i.e. has no relationship to its parent.
Why
Trees in the sense of recursively nested sets are omnipresent today. But why then are there so few implementations of tree data structures? The answer is simple: because of Ruby's powerful built-in hashes. The problem is that while Ruby's Hash, as an implementation of the Hash map data structure, might be perfectly fine for flat dictionary like structures, it is not very well-suited for storing tree structures. Ruby's hash literals, which allow it to easily nest multiple hashes, belie this fact. But it catches up when you want to build up a tree with hashes dynamically and have to manage the hash nesting manually.
In contrast to the few existing implementations of tree data structures in Ruby, Sycamores is based on Ruby's very efficient hashes and contains the values directly without any additional overhead. It only wraps the hashes itself. This wrapper object is very thin, containing nothing more than the hash itself. This comes at the price of the aforementioned restrictions, prohibiting it to be a general applicable tree implementation.
Another compelling reason for the use of Sycamore is its handling of nil
. Much has been said about the problem of nil
(or equivalent null-values in other languages), including: "It was my Billion-dollar mistake" from its founder, Tony Hoare. Every developer has experienced it in the form of errors such as
NoMethodError: undefined method '[]' for nil:NilClass
With Sycamore this is a thing of the past.
Supported Ruby versions
- MRI >= 2.1
- JRuby
- Rubinius
Dependencies
- none
Installation
The recommended installation method is via RubyGems.
$ gem install sycamore
Usage
I will introduce Sycamore's Tree API by comparing it with Ruby's Hash API.
In the following I'll always write Tree
for the Sycamore tree class, instead of the fully qualified Sycamore::Tree
. By default, this global Tree
constant is not available. If you want this, you'll have to
require 'sycamore/extension'
When you can't or don't to want to have the Tree
alias constant in the global namespace, but still want a short alternative name, you can alternatively
require 'sycamore/stree'
to get an alias constant STree
with less potential for conflicts.
I recommend trying the following code yourself in a Ruby REPL like Pry.
Creating trees
A Sycamore::Tree
can be created similar to Hashes with the standard constructor or the class-level []
operator.
Tree.new
creates an empty Sycamore::Tree
.
tree = Tree.new
tree.empty? # => true
No additional arguments are supported at the time. As you'll see, for a Sycamore::Tree
the functionality of the Hash constructor to specify the default value behaviour is of too little value to justify its use in the default constructor, so I'd like to reserve them for something more useful.
The []
operator creates a new Tree
and adds the arguments as its initial input. It can handle a single node value, a collection of nodes or a complete tree.
Tree[1] # => #<Sycamore::Tree:0x3fcfe51a5a3c {1=>n/a}>
Tree[1, 2, 3] # => #<Sycamore::Tree:0x3fcfe51a56f4 {1=>n/a, 2=>n/a, 3=>n/a}>
Tree[1, 2, 2, 3] # => #<Sycamore::Tree:0x3fcfe51a52d0 {1=>n/a, 2=>n/a, 3=>n/a}>
Tree[x: 1, y: 2] # => #<Sycamore::Tree:0x3fcfe51a4e34 {:x=>1, :y=>2}>
As you can see in line 3 nodes are stored as a set, i.e. with duplicates removed.
Note that multiple arguments are not interpreted as an associative array as Hash[]
does, but rather as a set of leaves, i.e. nodes without children.
Hash[1, 2, 3, 4] # => {1=>2, 3=>4}
Hash[1, 2, 3] # => ArgumentError: odd number of arguments for Hash
You can also see that children of leaves, i.e. nodes without children, are signified with n/a
. When providing input data with Hashes, you can use nil
as the child value of a leaf.
Tree[x: 1, y: 2, z: nil]
# => #<Sycamore::Tree:0x3fcfe51a4e34 {:x=>1, :y=>2, :z=>n/a}>
In general the nil
child value for leaves in Hash literals is mandatory, but on the first level it can be ommitted, by providing the leaves as an argument before the non-leaf nodes.
Tree[:a, :b, c: {d: 1, e: nil}]
# => #<Sycamore::Tree:0x3fd3f9c6bb0c {:a=>n/a, :b=>n/a, :c=>{:d=>1, :e=>n/a}}>
If you really want to have a node with nil
as a child, you'll have to put the nil
in an array.
Tree[x: 1, y: 2, z: [nil]]
# => #<Sycamore::Tree:0x3fd641858264 {:x=>1, :y=>2, :z=>nil}>
Accessing trees
Access to elements of a Sycamore::Tree
is mostly API-compatible to that of Rubys Hash class. But there is one major difference in the return type of most of the access methods: Since we are dealing with a recursively defined tree structure, the returned children are always trees as well.
The main method for accessing a tree is the []
operator.
tree = Tree[x: 1, y: {2 => "a"}]
tree[:x] # => #<Sycamore::Tree:0x3fea48d24d40 {1=>n/a}>
tree[:y] # => #<Sycamore::Tree:0x3fea48d24b74 {2=>"a"}>
tree[:y][2] # => #<Sycamore::Tree:0x3fea48d248f4 {"a"=>n/a}>
The actual nodes of a tree can be retrieved with the method nodes
.
tree.nodes # => [:x, :y]
tree[:x].nodes # => [1]
tree[:y].nodes # => [2]
tree[:y][2].nodes # => ["a"]
If it's certain that a tree has at most one element, you can also use node
to get that node directly.
tree[:y].node # => 2
tree[:y][2].node # => "a"
tree[:x][1].node # => nil
tree.node # Sycamore::NonUniqueNodeSet: multiple nodes present: [:x, :y]
The bang variant node!
raises an error when the node set is empty, instead of returning nil
.
tree[:y][2].node! # => "a"
tree[:x][1].node! # => # Sycamore::EmptyNodeSet: no node present
As opposed to Hash, the []
operator of Sycamore::Tree
also supports multiple arguments which get interpreted as a path.
tree[:y, 2].node # => "a"
For compatibility with Ruby 2.3 Hashes, this can also be done with the dig
method.
tree.dig(:y, 2).node # => "a"
fetch
, as a more controlled way to access the elements, is also supported.
tree.fetch(:x) # => #<Sycamore::Tree:0x3fea48d24d40 {1=>n/a}>
tree.fetch(:z) # => KeyError: key not found: :z
tree.fetch(:z, :default) # => :default
tree.fetch(:z) { :default } # => :default
Fetching the child of a leaf behaves almost the same as fetching the child of a non-existing node, i.e. the default value is returned or a KeyError
gets raised. In order to differentiate these cases, a Sycamore::ChildError
as a subclass of KeyError
is raised when accessing the child of a leaf.
fetch_path
allows a dig
similar access with fetch
semantics, except it requires the path of nodes to be given as an Enumerable.
tree.fetch_path([:y, 2]).node # => "a"
tree.fetch_path([:y, 3]) # => KeyError: key not found: 3
tree.fetch_path([:y, 3], :default) # => :default
tree.fetch_path([:y, 3]) { :default } # => :default
The number of nodes of a tree can be determined with size
. This will only count direct nodes.
tree.size # => 2
total_size
or its short alias tsize
returns the total number of nodes of a tree, including the nodes of children.
tree.total_size # => 5
tree[:y].tsize # => 2
The height of a tree, i.e. the length of its longest path can be computed with the method height
.
tree.height # => 3
empty?
checks if a tree is empty.
tree.empty? # => false
tree[:x, 1].empty? # => true
leaf?
checks if a node is a leaf.
tree.leaf? :x # => false
tree[:x].leaf? 1 # => true
leaves?
(or one of its aliases external?
and flat?
) can be used to determine this for more nodes at once.
Tree[1, 2, 3].leaves?(1, 2) # => true
Without any arguments leaves?
returns whether all nodes of a tree are leaves.
Tree[1, 2].leaves? # => true
include?
checks whether one or more nodes are in the set of nodes of this tree.
tree.include? :x # => true
tree.include? [:x, :y] # => true
include?
can also check whether a tree structure (incl. a hash) is a sub tree of a Sycamore::Tree
.
tree.include?(x: 1, y: 2) # => true
to_h
returns the tree as a Hash.
tree.to_h # => {:x=>1, :y=>{2=>"a"}}
Accessing absent trees
There is another major difference in the access method behaviour of a Scyamore tree in comparison to hashes: The child access methods even return a tree when it does not exist. When you ask a hash for a non-existent element with the []
operator, you'll get a nil
, which is an incarnation of the null-problem and the cause of many bug tracking sessions.
hash = {x: 1, y: {2 => "a"}}
hash[:z] # => nil
hash[:z][3] # => NoMethodError: undefined method `[]' for nil:NilClass
Sycamore on the other side returns a special tree, the Nothing
tree:
tree = Tree[x: 1, y: {2 => "a"}]
tree[:z] # => #<Sycamore::Nothing>
tree[:z][3] # => #<Sycamore::Nothing>
Sycamore::Nothing
is a singleton Tree
implementing a null object. It behaves on every query method call like an empty tree.
Sycamore::Nothing.empty? # => true
Sycamore::Nothing.size # => 0
Sycamore::Nothing[42] # => #<Sycamore::Nothing>
Sycamore adheres to a strict command-query-separation (CQS). A method is either a command changing the state of the tree and returning self
or a query method, which only computes and returns the results of the query, but leaves the state unchanged. The only exception to this strict separation is made, when it is necessary in order to preserve Hash compatibility. All query methods are supported by the Sycamore::Nothing
tree with empty tree semantics.
Among the command methods are two subclasses: additive command methods, which add elements and destructive command methods, which remove elements. These are further refined into pure additive and pure destructive command methods, which either support additions or deletions only, not both operations at once. The Sycamore::Tree
extends Ruby's reflection API with class methods to retrieve the respective methods: query_methods
, command_methods
, additive_command_methods
, destructive_command_methods
, pure_additive_command_methods
, pure_destructive_command_methods
.
Tree.command_methods
# => [:add, :<<, :replace, :create_child, :[]=, :delete, :>>, :clear, :compact, :replace, :[]=, :freeze]
Tree.additive_command_methods
# => [:add, :<<, :replace, :create_child, :[]=]
Tree.pure_additive_command_methods
# => [:add, :<<, :create_child]
Tree.pure_destructive_command_methods
# => [:delete, :>>, :clear, :compact]
Pure destructive command methods on Sycamore::Nothing
are no-ops. All other command methods raise an exception.
Sycamore::Nothing.clear # => #<Sycamore::Nothing>
Sycamore::Nothing[:foo] = :bar
# => Sycamore::NothingMutation: attempt to change the Nothing tree
But inspecting the Nothing
tree returned by Tree#[]
further shows, that this isn't the end of the story.
tree[:z].inspect
# => absent child of node :z in #<Sycamore::Tree:0x3fc88e04a470 {:x=>1, :y=>{2=>"a"}}>
tree[:z][3].inspect
# => absent child of node 3 in absent child of node :z in #<Sycamore::Tree:0x3fc88e04a470 {:x=>1, :y=>{2=>"a"}}>
We'll actually get an Absence
object, a proxy object for the requested not yet existing tree. As long as we don't try to change it, this Absence
object delegates all method calls to Sycamore::Nothing
. But as soon as we call a non-pure-destructive command method, the missing tree will be created, added to the parent tree and the method call gets delegated to the now existing tree.
tree[:z] = 3
tree.to_h # => {:x=>1, :y=>{2=>"a"}, :z=>3}
So a Sycamore::Tree
is a tree, on which the nodes grow automatically, but only when needed. And this works recursively on arbitrarily deep nested absent trees.
tree[:some][:very][:deep] = :node
tree.to_h # => {:x=>1, :y=>{2=>"a"}, :z=>3, :some=>{:very=>{:deep=>:node}}}
In order to determine whether a node has no children, you can simply use empty?
.
tree = Tree[a: 1]
tree[:a].empty? # => false
tree[:b].empty? # => true
But how can you distinguish an empty from a missing tree?
user = Tree[name: 'Adam', shopping_cart_items: []]
user[:shopping_cart_items].empty? # => true
user[:foo].empty? # => true
One way is the use of the absent?
method, which only returns true
on an Absence
object.
user[:shopping_cart_items].absent? # => false
user[:foo].absent? # => true
Another possibility, without the need to create the Absence
in the first place is the leaf?
method, since it also checks for the presence of a node.
user.leaf? :shopping_cart_items # => true
user.leaf? :foo # => false
But the leaf?
method has as similar problem in this respect: it doesn't differentiate between absent and empty children.
tree = Tree[foo: nil, bar: []]
tree.leaf? :foo # => true
tree.leaf? :bar # => true
strict_leaf?
and strict_leaves?
(or their short aliases sleaf?
and sleaves?
) are more strict in this regard: when a node has an empty child tree it is considered a leaf, but not a strict leaf.
tree.strict_leaf? :foo # => true
tree.strict_leaf? :bar # => false
Besides absent?
, the congeneric methods blank?
(as an alias of empty?
) and its negation present?
are ActiveSupport compatible available. Unfortunately, the natural expectation of Tree#present?
and Tree#absent?
to be mutually opposed leads astray.
user[:shopping_cart_items].absent? # => false
user[:shopping_cart_items].present? # => false
The risks rising from an ActiveSupport incompatible present?
is probably greater then this inconsistence. So, if you want check if a tree is not absent, use existent?
as the negation of absent?
.
Beside these options, fetch
is also a method to handle this situation in a nuanced way.
user.fetch(:shopping_cart_items) # => #<Sycamore::Tree:0x3febb9c9b3d4 {}>
user.fetch(:foo)
# => KeyError: key not found: :foo
user.fetch(:foo, :default) # => :default
Empty child trees also play a role when determining equality. The eql?
and ==
equivalence differ exactly in their handling of this question: ==
treats empty child trees as absent trees, while eql?
doesn't.
Tree[:foo].eql? Tree[foo: []] # => false
Tree[:foo] == Tree[foo: []] # => true
All empty child trees can be removed with compact
.
Tree[:foo].eql? Tree[foo: []].compact # => true
An arbitrary structure can be compared with a Sycamore::Tree
for equality with ===
.
Tree[:foo, :bar] === [:foo, :bar] # => true
Tree[:foo, :bar] === Set[:foo, :bar] # => true
Tree[:foo => :bar] === {:foo => :bar} # => true
Changing trees
Let's examine the command methods to change the contents of a tree. The add
method or the <<
operator as its alias allows the addition of one, multiple or a tree structure of nodes.
tree = Tree.new
tree << 1
tree << [2, 3]
tree << {3 => :a, 4 => :b}
puts tree
> Tree[1=>nil, 2=>nil, 3=>:a, 4=>:b]
The []=
operator is Hash-compatible supported.
tree[5] = :c
puts tree
> Tree[1=>nil, 2=>nil, 3=>:a, 4=>:b, 5=>:c]
Note that this is just an add
with a previous call of clear
, which deletes all elements of the tree. This means, you can safely assign another tree without having to think about object identity.
If you want to explicitly state, that a node doesn't have any children, you can specify it in the following equivalent ways.
tree[:foo] = []
tree[:foo] = {}
To remove a child tree entirely, you can assign Nothing
or nil
to the parent node.
tree[:foo] = Nothing
tree[:foo] = nil
If you really want to overwrite the current child nodes with a single nil
node, you have to do it in the following way.
tree[:foo] = [nil]
Note that all of these values are interpreted consistently inside input tree structures on creation, addition, deletion etc., i.e. empty Enumerables become empty child trees, Nothing
or nil
are used as place holders for the explicit negation of a child and [nil]
is used for a child trees with a single nil
node.
puts Tree[ a: { b: nil }, c: { d: []}, d: [nil] ]
>Tree[:a=>:b, :c=>{:d=>[]}, :d=>[nil]]
Beside the deletion of all elements with the already mentioned clear
method, single or multiple nodes and entire tree structures can be removed with delete
or the >>
operator.
tree >> 1
tree >> [2, 3]
tree >> {4 => :b}
puts tree
> Tree[5=>:c, :foo=>[]]
When removing a tree structure, only child trees with no more existing nodes get deleted.
tree = Tree[a: [1,2]]
tree >> {a: 1}
puts tree
> Tree[:a=>2]
tree = Tree[a: 1, b: 2]
tree >> {a: 1}
puts tree
> Tree[:b=>2]
Iterating trees
The fundamental each
and with that all Enumerable methods behave Hash-compatible.
tree = Tree[ 1 => {a: 'foo'}, 2 => :b, 3 => nil ]
tree.each { |node, child| puts "#{node} => #{child}" }
> 1 => Tree[:a=>"foo"]
> 2 => Tree[:b]
> 3 => Tree[]
each_path
iterates over all paths to leafs of a tree.
tree.each_path { |path| puts path }
> #<Path: /1/a/foo>
> #<Path: /2/b>
> #<Path: /3>
The paths are represented by Sycamore::Path
objects and are basically an Enumerable of the nodes on the path, specifically optimized for the enumeration of the set of paths of a tree. It does this, by sharing nodes between the different path objects. This means in the set of all paths, every node is contained exactly once, even the internal nodes being part of multiple paths.
Tree['some possibly very big data chunk' => [1, 2]].each_path.to_a
# => [#<Sycamore::Path["some possibly very big data chunk",1]>,
# #<Sycamore::Path["some possibly very big data chunk",2]>]
Searching in trees
search
returns the set of all paths to child trees containing a node or tree.
tree = Tree[ 1 => {a: 'foo'}, 2 => :b, 3 => [:a, :b, :c] ]
tree.search :a # => [#<Sycamore::Path[1]>, #<Sycamore::Path[1]>]
tree.search a: 'foo' # => [#<Sycamore::Path[1]>]
If you search for multiple nodes, only the paths to child trees containing all of the given nodes are returned.
tree.search [:b, :c] # => [#<Sycamore::Path[3]>]
All Tree
methods for which it makes sense accept path objects as input instead or in combination with nodes or tree structures. This allows it to apply the search results to any of these methods.
Getting help
Contributing
see CONTRIBUTING for details.
License and Copyright
(c) 2015-2016 Marcel Otto. MIT Licensed, see LICENSE for details.