In my analysis, the reasoning ended up being that nth
gives you some small speed ups since you’re not coercing to a seq all the time (you can in fact get even faster using direct method invocation via (.nth ^clojure.lang.Indexed some-vector n)
since clojure.core/nth
still has some overhead that will drag you down in hot loops). In this case, since you’re traversing the input over and over and over, there are savings to be had just by making access less costly. This is about a 30% gain if I recall, but that gain compounds if we have a better algorithm.
The real savings came from being smarter about what you’re computing (typical computer science stuff), which eventually led to later algorithm. In your original implementation, in order to interpolate an an input X, you leveraged the simpler function interpolate-point
, which takes an X a vector of sampled points. interpolate-point
simply traverses the sampled points to find an adjacent pair of points (or none) to average for the interpolation. You did this by two separate O(N), early-terminating scans via split-with
, e.g. take-while
and drop-while
. You do this for every input x in X. So I think complexity of the original was O(N) + O(N + 1) + O(N) [since you invoked last
on the preceding values, which is O(N)], still O(N) in general though.
Note that invoking clojure.core/last
will always yield an O(N) scan of the sequence (or seqable thing); so for cases like getting the last element of an indexed structure (like a vector), where you want O(1) lookup time, avoid last
and use peek
(for vectors) or a combination of nth
and count
e.g. (nth the-vector (dec (count the-vector)))
to preserve efficiency.
In relative terms, eliminating the need for the second scan and eliminating the invocation of last
(another scan), should leave us doing around <= 1/3 of the work (I figured 27% in one concrete case of a vector of 0 , 8). Then if we get savings from using nth
, say a 30% reduction, we get that down to about 19% of the original runtime. I think the type hinted function supplied to split-adjacent
gets us the final gains for the intermediate step.
When we shift towards the lerp-points
implementation, we’re just building on the gains we made before. This time, instead of multiple scans for every point, we only traverse the input and the interpolation samples 1x. So, we similarly cut our algorithm down to (still technically O(N)), doing about 1/3rd of the work (even though we’re doing additional work like creating the linear ranges and more sophisticated sampling, it’s trivially affordable).
It does. I’m trying to recall; there was a specific reason I did that (the original version was a lazy seq). I recall doing it for profiling clarity; dumping everything into a vector (via vec
which uses a transient) ends up being pretty harmless to overall perf, but substantially aided profiling further down the chain in other functions. When you leave it up to the lazy sequence generation bit, it gets hard to separate the generation of interpolated points (which technically happens very fast when producing an unrealized lazy seq) vs the processing of those points e.g. in the correlation function, since the realization is deferred. You get a big call stack where the deferred sequence is being realized far from where it was defined (as expected), so it became hard to separate into easily profiled “blocks.” By producing a vector, I was able to get a direct handle on how much faster interpolate-points
was in a profiler-friendly way (I was using visualvm).
I am less worried about the map lookups; they are actually very quick (I think for small arraymaps, doing a linear scan over a small array is about as fast as a direct field lookup in a type/record/object, at least for keyword keys). Similarly unconcerned about constructing new arraymaps every iteration (small array clone, plus allocation). You could certainly explore optimizing that further though by using mutable containers to hold the state (or even just a double array), which would avoid any lookups and map creation cost. Notice that I “did” resort to manually unpacking the values, as opposed to naive destructuring, since this avoids intermediate calls to clojure.core/get
, and instead defers to the IFn implementation of key lookup for maps, which does not incur the type dispatch overhead of the more general get
, which is what the destructuring would emit (structural looks into these sort of optimizations while preserving the syntax sugar)).
I definitely think there are further optimizations to consider; the most drastic (and fruitful) of which would be injecting more primitive array stuff in. There are still intermediate steps to explore (