A trick for cheaper persistent list in JavaScript

I’m trying to optimize my own toy language which compiles to JavaScript, calcit-js. And it’s using persistent data as well. For details there was a post.

So it took me weeks trying to deal with its bugs and to optimize the performance for persistent list. I found it quite interesting that an array-based solution can be used to simulate persistent data structure in many occasions. And it was quite funny it made my script a lot faster since my own implementation for data-sharing tree is relatively slow.

So here’s the trick, use two modes to represent a persistent list, first is an array, then the real tree list. The array is optimized by JavaScript itself, and can be used to simple cases like reading, and last, a tree list is used when it’s actually being modified. So when a persistent list is first created, it’s in an array, and only being converted to a tree list when it has to.

The other part of the trick is to use startIndex and endIndex to track how this array is used to represent a persistent list. Suppose a JavaScript array of [1,2,3], with two indexing parameters it can represent [1] [1,2] [1,2,3] [2,3] [2] [3] or even [], meanwhile the memory is reused. This trick was learnt from Go slices, but later I noticed since it’s immutable data in Clojure and in calcit-js, the shared array can be used in more situations.

Let’s see how it will work in these common scenarios:

access by index

Accessing index of x of array is simple enough, just in this case we need to add x with an extra startIndex. Simple math~

rest

As mentioned above, in immutable data, we are safe to reuse the same piece of array, and meanwhile we mark a new index: startIndex + 1. For example, [1 2 3 4] can be represented with:

arrayData: [1,2,3,4]
startIndex: 0
endIndex: 4

And [2 3 4] may use same piece of array but with a modified start index:

arrayData: [1,2,3,4]
startIndex: 1
endIndex: 4

In this case, array memory is shared, calling rest is quite cheap. Although not cheap enough like a linked list, but it would support fast accessing.

butlast, slice(subvec)

So same idea to rest, if we want to slice [2 3] out of [1 2 3 4], it’s just changing indexes:

arrayData: [1,2,3,4]
startIndex: 1
endIndex: 3

And now you know how to implement butlast here… And this model will always work even if you slice and keep slicing a list. The cost for each slicing step is just an object of 3 fields.

append(conj)

You may find it cheap for accessing, but how about calling conj like we normally do in Clojure? Well, it depends. I noticed that in one case, the memory can be still be reused. Say we have a list (def a0 [1 2]), and we want (def a1 (conj a0 3)). At first a0 has a structure like:

arrayData: [1,2]
startIndex: 0
endIndex: 2

tricky that we can just put 3 in the same array, then we get a1 with structure:

arrayData: [1,2,3]
startIndex: 0,
endIndex: 3

right? But the dirty part is we changed previous data of a0. So, does it break our code? Nope, now we have a0 with a structure:

arrayData: [1,2,3]
startIndex: 0
endIndex: 2

even though the list has 3 items, being a list, it’s size is still two based on the indexes, and its values 1 and 2 are not influenced. So in this say, we can reuse quite some memory, it would even be cheaper than using a tree structure.

Well, we all know it’s a very special case. If our persistent list [1 2] actually has a structure of:

arrayData: [1,2,4]
startIndex: 0,
endIndex: 2

then we can’t use this trick, because it would break previous value. Unless we have a very smart compiler and it happens to find out that previous value if no longer used. Well… it’s unrealistic for a dynamic language.

Judging by real-world usages, I suppose in quote some cases we create a list and we will keep filling it with data and at last share it to other functions. For example, in reducing, we don’t share the list of each internal state, we only give out the result to other functions. So, that trick may help in quite some cases.

When we can’t reuse previous array, personally, I prefer fallback mode that converting to tree list and bla…bla…bla…

prepend(cons)

The special trick for appending can not be used in prepending directly. However, in theory, it’s possible. Let’s see an extended structure of [1 2 3 4]

arrayData: [1,2,3,4]
negativeArrayData: []
startIndex: 0,
endIndex: 4

When we prepend it with -1, it turns into:

arrayData: [1,2,3,4]
negativeArrayData: [-1]
startIndex: -1
endIndex: 4

And then we prepend it with -2, it turns into:

arrayData: [1,2,3,4]
negativeArrayData: [-1, -2]
startIndex: -2
endIndex: 4

so… the solution is viable. Just we are not sure if this will really make the code faster, since there are 2 arrays to maintain now.

Also, it’s only available for several specific cases. We may still turn to tree list for general cases.

insertion and deletion

No simple trick for modifications. Tree list is required now.

After the data is turned into a tree, in my case of calcit-js, it’s not turning back to array implicitly. Creating things costs new memory.

There might be some possible tricks, if we modify an element in the middle of a list, maybe, we can reuse multiple slices to represent the final tree. But in this case, the tree can be really a chaos, it would highly likely to be unbalanced and thus makes accessing unpredictably slow. Meanwhile balancing the tree can be heavy operations as well. So chaotic…


I noticed V8 has some tricks for optimizing arrays, by choosing from multiple internal structures to make some of the arrays really fast. Maybe not bad idea for us in persistent list too.

I haven’t heard about such tricks in Clojure community yet. I glanced source code pf PersistentVector and didn’t notice about this. I’m not sure. Maybe the trick can be helpful. In my own case, with an array representation of persistent list, in many cases, it makes arguments spreading really cheap on js side. So it did bring benefits to calcit-js.

3 Likes