Ternary-tree: structure sharing data for learning purpose

This is a Nim project but the idea is mainly in the scope of Clojure’s persistent data structure. I was alway curious about the details of persistent vectors. And thanks to the excellent post of Understanding Clojure’s Persistent Vectors, pt. 1 a lot of the details is revealed.

I was mostly using JavaScript and don’t have much experience in C or Java, so I choose Nim for my toy and Nim has comparable(kind of) performance to C. Nim has a builtin but optional GC, which makes my at ease.

Clojure is using a 32-branches tree for performance. I choose 3 for learning purposes. At first I thought a tree with 3 branches is more flexible for inserting, for example, it has both left and right, insertion could take place in both directions, and it’s more natural for tries to have left and right branches. Although it could be a not good choice for performance.

Luckily there are still some interesting things to be found in implementing such a data structure.

First, the list TernaryTreeList is implemented in a B+ tree, leaf nodes and internal nodes are different. Leaf nodes contains values, while internal nodes contains sizes of tree, and sizes are used as indexes. In Clojure Vectors, indexes are calculated like i >>> 5 to find each index in every level, that’s very fast. In TernaryTreeList, I have to comparing sizes of each branch from left to right to find out which branch contains the index I wanted, which has a cost in performance.

When a TernaryTreeList list is first initialized, values are distributed sparely but with a minimal depth. It’s not placed in a most compact way, just don’t be too deep. And that mean there will be some holes among the tree.

Concise notations

To show the structure in details, I would show a TernaryTreeList list, with 3 items, which means [1 2 3], structured like

      ^
    / | \
   1  2  3

in a compact form:

(1 2 3)

if empty holes exist, that place would be left empty, which make [1 3] of:

      ^
    / | \
   1     3

to be:

(1 _ 3)

and for deeper tree [1 4 5 6]:

      ^
    / | \
   1     ^
       / | \
      4  5  6

it becomes:

(1 _ (4 5 6))

thats much more concise.

Creating

So when I init lists of ranges from 0 to 20, they are like:

(_ _ _)
1
(1 _ 2)
(1 2 3)
(1 (2 _ 3) 4)
((1 _ 2) 3 (4 _ 5))
((1 _ 2) (3 _ 4) (5 _ 6))
((1 _ 2) (3 4 5) (6 _ 7))
((1 2 3) (4 _ 5) (6 7 8))
((1 2 3) (4 5 6) (7 8 9))
((1 2 3) (4 (5 _ 6) 7) (8 9 10))
((1 (2 _ 3) 4) (5 6 7) (8 (9 _ 10) 11))
((1 (2 _ 3) 4) (5 (6 _ 7) 8) (9 (10 _ 11) 12))
((1 (2 _ 3) 4) ((5 _ 6) 7 (8 _ 9)) (10 (11 _ 12) 13))
(((1 _ 2) 3 (4 _ 5)) (6 (7 _ 8) 9) ((10 _ 11) 12 (13 _ 14)))
(((1 _ 2) 3 (4 _ 5)) ((6 _ 7) 8 (9 _ 10)) ((11 _ 12) 13 (14 _ 15)))
(((1 _ 2) 3 (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) 14 (15 _ 16)))
(((1 _ 2) (3 _ 4) (5 _ 6)) ((7 _ 8) 9 (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
(((1 _ 2) (3 _ 4) (5 _ 6)) ((7 _ 8) (9 _ 10) (11 _ 12)) ((13 _ 14) (15 _ 16) (17 _ 18)))
(((1 _ 2) (3 _ 4) (5 _ 6)) ((7 _ 8) (9 10 11) (12 _ 13)) ((14 _ 15) (16 _ 17) (18 _ 19)))

Since items are roughly equally distributed in 3 branches of each internal node, the holes are also sparsely distributed. It’s not a optimal solution because the overall depths of all items can still be reduced, and the existence of internals nodes also takes memory.

An alternative solution is giving the middle branch more items that is makes use of the full spaces of given depth, then the placement is compacter than above.

(_ _ _)
1
(1 _ 2)
(1 2 3)
((1 _ 2) 3 4)
((1 _ 2) 3 (4 _ 5))
((1 _ 2) (3 4 5) 6)
((1 _ 2) (3 4 5) (6 _ 7))
((1 2 3) (4 5 6) (7 _ 8))
((1 2 3) (4 5 6) (7 8 9))
(((1 _ 2) 3 4) (5 6 7) (8 9 10))
(((1 _ 2) 3 4) (5 6 7) ((8 _ 9) 10 11))
((1 _ 2) ((3 4 5) (6 7 8) (9 10 11)) 12)
((1 _ 2) ((3 4 5) (6 7 8) (9 10 11)) (12 _ 13))
((1 2 3) ((4 5 6) (7 8 9) (10 11 12)) (13 _ 14))
((1 2 3) ((4 5 6) (7 8 9) (10 11 12)) (13 14 15))
(((1 _ 2) 3 4) ((5 6 7) (8 9 10) (11 12 13)) (14 15 16))
(((1 _ 2) 3 4) ((5 6 7) (8 9 10) (11 12 13)) ((14 _ 15) 16 17))
(((1 _ 2) 3 (4 _ 5)) ((6 7 8) (9 10 11) (12 13 14)) ((15 _ 16) 17 18))
(((1 _ 2) 3 (4 _ 5)) ((6 7 8) (9 10 11) (12 13 14)) ((15 _ 16) 17 (18 _ 19)))

However to find the best-fit size. I have to divide size by 3 very frequently. That might be slower than dividing 2 in a binary computer…

Inserting

You may also wondering how it would be like if I keep appending elements at the end. This is how it looks like:

(_ _ _)
0
(0 1 _)
(0 1 2)
((0 1 2) 3 _)
((0 1 2) (3 4 _) _)
((0 1 2) (3 4 5) _)
((0 1 2) (3 4 5) 6)
((0 1 2) (3 4 5) (6 7 _))
((0 1 2) (3 4 5) (6 7 8))
(((0 1 2) (3 4 5) (6 7 8)) 9 _)
(((0 1 2) (3 4 5) (6 7 8)) (9 10 _) _)
(((0 1 2) (3 4 5) (6 7 8)) (9 10 11) _)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) 12 _) _)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) (12 13 _) _) _)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) (12 13 14) _) _)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) (12 13 14) 15) _)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) (12 13 14) (15 16 _)) _)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) (12 13 14) (15 16 17)) _)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) (12 13 14) (15 16 17)) 18)
(((0 1 2) (3 4 5) (6 7 8)) ((9 10 11) (12 13 14) (15 16 17)) (18 19 _))

It is quite compact. And take a looks a (range 10), which has 9 + 1 items:

(((0 1 2) (3 4 5) (6 7 8)) 9 _)

It’s consisted of 2 branches(and 1 extra empty branch) of 9 items and 1 elements. So accessing element 9 is quite fast since depth is 1, rather than go to look it up in a branch of a depth of 3.

How about prepending? It’s quite similar to appending:

(_ _ _)
0
(_ 1 0)
(2 1 0)
(_ 3 (2 1 0))
(_ (_ 4 3) (2 1 0))
(_ (5 4 3) (2 1 0))
(6 (5 4 3) (2 1 0))
((_ 7 6) (5 4 3) (2 1 0))
((8 7 6) (5 4 3) (2 1 0))
(_ 9 ((8 7 6) (5 4 3) (2 1 0)))
(_ (_ 10 9) ((8 7 6) (5 4 3) (2 1 0)))
(_ (11 10 9) ((8 7 6) (5 4 3) (2 1 0)))
(_ (_ 12 (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))
(_ (_ (_ 13 12) (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))
(_ (_ (14 13 12) (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))
(_ (15 (14 13 12) (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))
(_ ((_ 16 15) (14 13 12) (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))
(_ ((17 16 15) (14 13 12) (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))
(18 ((17 16 15) (14 13 12) (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))
((_ 19 18) ((17 16 15) (14 13 12) (11 10 9)) ((8 7 6) (5 4 3) (2 1 0)))

Still take a look a (range 10), just a reverse structure:

(_ 9 ((8 7 6) (5 4 3) (2 1 0)))

It’s optimized on purpose since it’s quite common use case to add new items at head or at tail. The goal is too reduce holes inside the tree.

You may also curious how it looks like when items are inserted randomly, say using assocAfter, here’s how it appears(first integer indicates “depth”):

2 : (0 1 _)
2 : (0 1 2)
3 : ((0 3 _) 1 2)
3 : ((0 3 _) 1 (2 4 _))
4 : (((0 3 _) 1 (2 4 _)) 5 _)
4 : (((0 3 _) (1 6 _) (2 4 _)) 5 _)
4 : (((0 3 _) (1 6 7) (2 4 _)) 5 _)
4 : (((0 3 _) (1 6 7) (2 4 _)) (5 8 _) _)
5 : (((0 3 _) ((1 6 7) 9 _) (2 4 _)) (5 8 _) _)
6 : (((0 3 _) (((1 10 _) 6 7) 9 _) (2 4 _)) (5 8 _) _)
6 : (((0 3 11) (((1 10 _) 6 7) 9 _) (2 4 _)) (5 8 _) _)
6 : (((0 3 11) (((1 10 12) 6 7) 9 _) (2 4 _)) (5 8 _) _)
6 : (((0 3 11) (((1 10 12) 6 7) 9 _) (2 4 _)) (5 8 13) _)
6 : (((0 3 11) (((1 10 12) 6 7) 9 _) (2 4 14)) (5 8 13) _)
7 : (((0 3 11) ((((1 15 _) 10 12) 6 7) 9 _) (2 4 14)) (5 8 13) _)
7 : (((0 3 11) ((((1 15 _) 10 12) 6 7) 9 _) (2 4 14)) ((5 16 _) 8 13) _)
7 : (((0 3 11) ((((1 15 _) 10 12) 6 7) 9 _) (2 (4 17 _) 14)) ((5 16 _) 8 13) _)
7 : (((0 3 11) ((((1 15 _) 10 12) 6 7) 9 _) ((2 18 _) (4 17 _) 14)) ((5 16 _) 8 13) _)
7 : (((0 3 11) ((((1 15 _) 10 12) 6 (7 19 _)) 9 _) ((2 18 _) (4 17 _) 14)) ((5 16 _) 8 13) _)
7 : (((0 3 11) ((((1 15 _) 10 12) 6 (7 19 _)) 9 _) ((2 18 _) (4 17 20) 14)) ((5 16 _) 8 13) _)

; after balanced

4 : (((0 _ 3) (11 1 15) (10 _ 12)) ((6 _ 7) (19 9 2) (18 _ 4)) ((17 _ 20) (14 5 16) (8 _ 13)))

Sometimes it creates new branches, sometimes it fills in the holes. The depth grew quite fast into a depth of 7 since I wanted to share branches in most case. And larger depth means slower lookup and even more memory occupation.

To reduce depth, forceInplaceBalancing is called. The depth goes from 7 to 4. The count of holes and nodes are not necessarily reduced in current algorithm though…

You may also notice that it’s still reusing branches during random updates. Sister branches are reused, while parent branches are recreated.

Taking 13 for example, when it’s inserted, it’s at the tail, so the left branches are reused, only 2 new internal nodes are created, 7 internal nodes are reused.

6 : (((0 3 11) (((1 10 12) 6 7) 9 _) (2 4 _)) (5 8 _) _)
6 : (((0 3 11) (((1 10 12) 6 7) 9 _) (2 4 _)) (5 8 13) _)

And when 7 is inserted, 3 internal nodes are created, while 2 internal nodes are reused. This is not quite good though…

4 : (((0 3 _) (1 6 _) (2 4 _)) 5 _)
4 : (((0 3 _) (1 6 7) (2 4 _)) 5 _)

But if you take a balanced list and insert randomly, for example inserting 888 into (range 17) after each item:

4 : (((0 888 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 1 888) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 888 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 3 888) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 888 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
5 : ((((0 _ 1) (2 _ 3) (4 _ 5)) 888 _) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 888 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 7 888) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 888 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 9 888) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 888 11)) ((12 _ 13) (14 _ 15) (16 _ 17)))
5 : (((0 _ 1) (2 _ 3) (4 _ 5)) (((6 _ 7) (8 _ 9) (10 _ 11)) 888 _) ((12 _ 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 888 13) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 13 888) (14 _ 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 888 15) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 15 888) (16 _ 17)))
4 : (((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 888 17)))
5 : ((((0 _ 1) (2 _ 3) (4 _ 5)) ((6 _ 7) (8 _ 9) (10 _ 11)) ((12 _ 13) (14 _ 15) (16 _ 17))) 888 _)

Most times, only 2~3 new internal nodes are created among all 12~13 internal nodes, that’s not bad.

So the drawback is this tree does not balance automatically. During most of the lifetime of a list, it’s not balanced, so it would be not quite efficient. However I don’t a simple solution for that at current.

Concat and slice

Two other major operations on lists are slice and concat. If you don’t case about balancing, concat is fast, but increasing depth every time:

(1 (2 _ 3) 4) ; a
(5 (6 _ 7) 8) ; b
(9 (10 _ 11) 12) ; c

((1 (2 _ 3) 4) _ (5 (6 _ 7) 8)) ; a b

(((1 (2 _ 3) 4) _ (5 (6 _ 7) 8)) _ (9 (10 _ 11) 12)) ; a b c

it might trigger rebalancing at times though…

And for slicing, it still tries to reuse some branches but turned out not quite good… but it is supposed to be a little better if more items are sliced…

  # part of nim code 
  for i in 0..<8:
    for j in i..<9:
      echo fmt"{i}-{j} ", d.slice(i, j).formatInline
; original structure

((1 2 3) (4 _ 5) (6 7 8))
0-0 (_ _ _)
0-1 1
0-2 (1 _ 2)
0-3 (1 2 3)
0-4 ((1 2 3) _ 4)
0-5 ((1 2 3) _ (4 _ 5))
0-6 (((1 2 3) _ (4 _ 5)) _ 6)
0-7 (((1 2 3) _ (4 _ 5)) _ (6 _ 7))
0-8 ((1 2 3) (4 _ 5) (6 7 8))
1-1 (_ _ _)
1-2 2
1-3 (2 _ 3)
1-4 ((2 _ 3) _ 4)
1-5 ((2 _ 3) _ (4 _ 5))
1-6 (((2 _ 3) _ (4 _ 5)) _ 6)
1-7 (((2 _ 3) _ (4 _ 5)) _ (6 _ 7))
1-8 (((2 _ 3) _ (4 _ 5)) _ (6 7 8))
2-2 (_ _ _)
2-3 3
2-4 (3 _ 4)
2-5 (3 _ (4 _ 5))
2-6 ((3 _ (4 _ 5)) _ 6)
2-7 ((3 _ (4 _ 5)) _ (6 _ 7))
2-8 ((3 _ (4 _ 5)) _ (6 7 8))
3-3 (_ _ _)
3-4 4
3-5 (4 _ 5)
3-6 ((4 _ 5) _ 6)
3-7 ((4 _ 5) _ (6 _ 7))
3-8 ((4 _ 5) _ (6 7 8))
4-4 (_ _ _)
4-5 5
4-6 (5 _ 6)
4-7 (5 _ (6 _ 7))
4-8 (5 _ (6 7 8))
5-5 (_ _ _)
5-6 6
5-7 (6 _ 7)
5-8 (6 7 8)
6-6 (_ _ _)
6-7 7
6-8 (7 _ 8)
7-7 (_ _ _)
7-8 8

Others

I also did some experiments on persistent map with ternary tree. It’s using a trie but the hashes(which are integers) are used for indexing, and lookup steps are similar. Maps could be a lot deeper since hashes are really sparse. I could perform forced balancing as well but it’s always a lot deeper than lists. No much experience yet…

I’ve using this library in my other toy projects for a while. Some optimizations is added and it should not be very slow now. Anyway it’s quite interesting journey to play with a similar tree structure like the persistent data structure I used for years in Clojure.

Advices on optimizations are welcome. (not going to switching from ternary
branches in this project though…)

3 Likes

This idea of B-trees does generalize to other branching factors, of course, and it even allows for guaranteed logarithmic time of slice and concat operations, with no need to do O(n) work to rebalance a tree, ever. I was looking at how to do efficient concat, guarnateed, after trying to find an RRB-tree implementation without bugs that guarantees that run time, but haven’t found one yet (I begin to suspect whether that is possible – the research papers are not very explicit on some parts of the implementation). Here is an article on how to do the slice and concat operations in O(log n) time on B-trees that represent vectors: https://github.com/jafingerhut/core.btree-vector/blob/master/doc/intro.md

1 Like

The data structures used by PureScript offer an elegant way of using ternary tree(partially) and got a smart solution for rebalancing…