Performance issues with cross-correlation

joinr · August 13, 2020, 3:36pm

There are some low hanging fruit:

split-with has to do 2 passes through the data, applying its predicate to each entry to produce the sequence. That’s two O(N) passes that could be done in one.

If you have indexed structures, nth will typically be faster than first or second since there’s some coercion to seq that happens. A little overhead.

Since interpolate-points is the focal point of work (as indicated by profiling), you get some gains by changing it to use nth. On mine (for two 1000 element vectors) that drops runtime from 60s to 42s (meh). The better algorithmic gains come from eliminating the duplicate scans using a helper function:

(defn split-adjacent [f xs]
  (reduce (fn [acc x]
            (if (f x)
              x
              (reduced [acc x]))) nil xs))

(defn interpolate-point
  [x2 v1]
  (let [[preceeding-point following-point]
            (split-adjacent #(<= (nth % 0) x2) v1)]
    (if (or (nil? preceeding-point)
            (nil? following-point))
      nil
      (/ (+ (nth preceeding-point 1)
            (nth following-point 1))
         2.0))))

Which gets runtime down to 5s or so. Still very slow, but showing improvement. May need to add some corner case coverage to split-adjacent but it appears to produce the same results as intended so far. I get the sense that interpolate-points is begging to be more efficient algorithimically.