Hello everyone,
After being inspired by @ikitommi’s talk about performance I decided to pick up the glove and see if some core functions could be rewritten better performance.
The TL;DR is yes, with some caveats, with significant speedups.
The meat of things:
Plenty of core functions walk sequences, but if these sequences are known in advance (on call site or def
ed) the walk can be expanded to other functions, giving better performance.
The functions I tackled and their respective speedups:
-
get-in
: 4-8x speedup (depending on depth) -
assoc-in
: ~2x speedup -
select-keys
: ~8x speedup -
update-in
: some speedup, but still needs refinement
There are some other implementations but the speedups gained by them are less significant.
The down side is that they are implemented as macros, so composition takes a hit. Also, there’s slight deviation from core function’s behavior, for example select keys will add nil
s if the keys don’t exist.
The repo also serves as an educational resource, and contains benchmarks of different ways to get
keys from records and maps, shedding some light on their performance characteristics, and a few profiling scenarios to characterize these tests.
I also attempted to make these benchmarks reproducible by anyone who clones the repo.
Looks like there are similar efforts tackling this issue from another direction as well: https://github.com/joinr/structural/
In the future:
- There’s more work to be done with
merge
andmemoize
- Provide functions which invoke the underlying data structure’s methods wherever possible
- Dispatch based on type hints?
Check it out here: https://github.com/bsless/clj-fast
Results: https://github.com/bsless/clj-fast/blob/master/doc/results.md
Cheers,
Ben