Performance issues with cross-correlation

By a coincidence there is a discussion around int casts for aget here

1 Like

Are the vectors of inputs assumed/forced to be sampled at the same frequency?

Where we have vector a, and vector b, and we want to compute (cross-correlation a b 0.5), are we assuming that

  • for every pair of sequential points in a, [[x1 y1] [x2 y2]], the distance between x2 and x1 is some constant k (currently 1.0 in the sample data)?
  • (= (map (first a)) (map (first b)) , e.g. points are sampled at the same x coordinates in both datasets (as they are in the sample data)?

If so, then a majority of the shifting is redundant when the step size is not a multiple of k (you get exactly the same answers over and over for the trimmed vectors, which I think get factored out during the correlation computation). Like say a shifting by 0.1, the computed stats for shifts of 0.1, 0.2, … 0.9 appear to have exactly the same results for trim-vectors. There are a bunch of additional optimizations about efficiently shifting and maintaining a running total/count as well, so you don’t have to repeatedly recompute everything (just update some running stats). I think some of this may apply to the array based version as well (haven’t looked through it thoroughly though). If the first assumption proves false (e.g. you have arbitrary step sizes between sampled coordinates) then I think that can be addressed as well to enable some optimizations.

Yeah, the step size between points is not necessarily constant! so my test arrays are not exactly representative. Sorry about the confusion :slight_smile:

It’s a bit of a two fold problem. First is that there can be gaps in data (certain sections are missing). And second, there is some artifact of the machine generating the data (and XRF Scanner) which will give strange inconsistent steps at times.

It’s sorta in the nature of these ongoing unconstrained projects that new requirements emerge (like being able to delete sections on the middle of the scan) - so while I’ve thought about “normalize” the data to be a constant step size I generally shy away from the idea b/c it places a constraint on valid input. The input is in ever-increasing [position, value] pairs - so for the time being I’m trying to accept all valid input

Further playing around really gives me the impression that the Java compiler and VM seem to handle a lot of weird edge cases somehow. Aggressively inlining everything I managed to get rid of almost all casts, the emitted Java looked better to me and the flame graph looks much better -

10-cpu-flamegraph-2020-09-11-18-24-47

but the final run time is marginally worse (using criterium). As seen earlier, even though casting can take up a large chunk of the perf/event time and look bad on the flame graph, but for whatever reason this doesn’t equate to a massive change in speed

In any case, this has all been very educational :slight_smile: - and I’m really glad I learned to use all these tools to inspect my program better. Thanks again for everyone’s help

Fyi, inlining everything doesn’t always produce faster code. HotSpot is apparently really good at targeting small methods that it can provably inline. If you have a bunch of manually inlined code, those method signatures can get too big for the inlining budget, so they don’t get optimized (you may have uninentionally de-optimized your code by trying to optimize it…). So, sometimes it’s actually “better” to have things in small function calls. I actually ran into this a couple of times (I had some specific examples in another repository). So, using macros and definline all over the place is in theory great, but in practice, results may vary (always measure before/after).

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.