So it took me weeks trying to deal with its bugs and to optimize the performance for persistent list. I found it quite interesting that an array-based solution can be used to simulate persistent data structure in many occasions. And it was quite funny it made my script a lot faster since my own implementation for data-sharing tree is relatively slow.
The other part of the trick is to use
[1,2,3], with two indexing parameters it can represent
 or even
, meanwhile the memory is reused. This trick was learnt from Go slices, but later I noticed since it’s immutable data in Clojure and in calcit-js, the shared array can be used in more situations.
Let’s see how it will work in these common scenarios:
access by index
Accessing index of
x of array is simple enough, just in this case we need to add
x with an extra
startIndex. Simple math~
As mentioned above, in immutable data, we are safe to reuse the same piece of array, and meanwhile we mark a new index:
startIndex + 1. For example,
[1 2 3 4] can be represented with:
arrayData: [1,2,3,4] startIndex: 0 endIndex: 4
[2 3 4] may use same piece of array but with a modified start index:
arrayData: [1,2,3,4] startIndex: 1 endIndex: 4
In this case, array memory is shared, calling
rest is quite cheap. Although not cheap enough like a linked list, but it would support fast accessing.
So same idea to
rest, if we want to slice
[2 3] out of
[1 2 3 4], it’s just changing indexes:
arrayData: [1,2,3,4] startIndex: 1 endIndex: 3
And now you know how to implement
butlast here… And this model will always work even if you slice and keep slicing a list. The cost for each slicing step is just an object of 3 fields.
You may find it cheap for accessing, but how about calling
conj like we normally do in Clojure? Well, it depends. I noticed that in one case, the memory can be still be reused. Say we have a list
(def a0 [1 2]), and we want
(def a1 (conj a0 3)). At first
a0 has a structure like:
arrayData: [1,2] startIndex: 0 endIndex: 2
tricky that we can just put 3 in the same array, then we get
a1 with structure:
arrayData: [1,2,3] startIndex: 0, endIndex: 3
right? But the dirty part is we changed previous data of
a0. So, does it break our code? Nope, now we have
a0 with a structure:
arrayData: [1,2,3] startIndex: 0 endIndex: 2
even though the list has 3 items, being a list, it’s size is still two based on the indexes, and its values
2 are not influenced. So in this say, we can reuse quite some memory, it would even be cheaper than using a tree structure.
Well, we all know it’s a very special case. If our persistent list
[1 2] actually has a structure of:
arrayData: [1,2,4] startIndex: 0, endIndex: 2
then we can’t use this trick, because it would break previous value. Unless we have a very smart compiler and it happens to find out that previous value if no longer used. Well… it’s unrealistic for a dynamic language.
Judging by real-world usages, I suppose in quote some cases we create a list and we will keep filling it with data and at last share it to other functions. For example, in reducing, we don’t share the list of each internal state, we only give out the result to other functions. So, that trick may help in quite some cases.
When we can’t reuse previous array, personally, I prefer fallback mode that converting to tree list and bla…bla…bla…
The special trick for appending can not be used in prepending directly. However, in theory, it’s possible. Let’s see an extended structure of
[1 2 3 4]
arrayData: [1,2,3,4] negativeArrayData:  startIndex: 0, endIndex: 4
When we prepend it with
-1, it turns into:
arrayData: [1,2,3,4] negativeArrayData: [-1] startIndex: -1 endIndex: 4
And then we prepend it with
-2, it turns into:
arrayData: [1,2,3,4] negativeArrayData: [-1, -2] startIndex: -2 endIndex: 4
so… the solution is viable. Just we are not sure if this will really make the code faster, since there are 2 arrays to maintain now.
Also, it’s only available for several specific cases. We may still turn to tree list for general cases.
insertion and deletion
No simple trick for modifications. Tree list is required now.
After the data is turned into a tree, in my case of calcit-js, it’s not turning back to array implicitly. Creating things costs new memory.
There might be some possible tricks, if we modify an element in the middle of a list, maybe, we can reuse multiple slices to represent the final tree. But in this case, the tree can be really a chaos, it would highly likely to be unbalanced and thus makes accessing unpredictably slow. Meanwhile balancing the tree can be heavy operations as well. So chaotic…
I noticed V8 has some tricks for optimizing arrays, by choosing from multiple internal structures to make some of the arrays really fast. Maybe not bad idea for us in persistent list too.
I haven’t heard about such tricks in Clojure community yet. I glanced source code pf
PersistentVector and didn’t notice about this. I’m not sure. Maybe the trick can be helpful. In my own case, with an array representation of persistent list, in many cases, it makes arguments spreading really cheap on js side. So it did bring benefits to calcit-js.