Introduction
I was trying to take the derivative of some unevenly sampled data, at the same time I’m learning (slowly) the tmd
and related libraries for data science. So I explored a bit, and wondered where the performance hogs are, and if there are any good ways of using the libraries for better performance.
Basically the goal of the function is to take two columns, each representing a free variable in the dataset (like data over time), take the diff of each of those columns, and divide those.
First attempts
I began with what I felt like should be an ‘ok’ implementation (keep in mind I’m a n00b with these libraries)
(ns dev.differentiation
(:require
[tech.v3.dataset :as ds]
[tech.v3.dataset.reductions :as ds-reduce]
[tech.v3.datatype :as dtype]
[tech.v3.datatype.gradient :as dt-grad]
[tech.v3.datatype.functional :as dfn]
[tablecloth.api :as api]))
(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
(def example-data
(api/dataset {:time [1 2 3 6 8 11 203 211] :x (take 8 (repeatedly #(rand 100)))}))
(defn d-dt1 [ds time-col data-col result-col]
(let [grad-t (cons 1 (dt-grad/diff1d (time-col ds)))
grad-x (cons 0 (dt-grad/diff1d (data-col ds))) ; empty first element
compute-col (map #(/ ^Double %1 ^Double %2) grad-x grad-t)]
(api/add-column ds result-col compute-col)))
Some demo data
(defn random-example-data [n]
(api/dataset {:time (take n (iterate #(+ % (rand)) 1)) :x (take n (repeatedly #(rand 100)))}))
(def medium-example-data
(random-example-data 10000))
(def big-example-data
(random-example-data 10000000))
And a benchmark
(require '[criterium.core :as c])
(c/quick-bench (-> example-data
(d-dt1 :time :x :dt-dx))) ; 110 us
(c/quick-bench (-> medium-example-data
(d-dt1 :time :x :dt-dx))) ; 4 ms
(c/quick-bench (-> big-example-data
(d-dt1 :time :x :dt-dx))) ; 5 s
Okay, so 10e6 data points , M = 2 columns x (2 op for diff + 1 op for division) → 60e6 operations minimum, so a rough estimate of about 10e6 operations per second. slow. I would like a factor 1000 faster please . Hardware should be able to do more than 1 math op per cycle per thread, so this is in the order of 1e9 operations per second, times however many cores you have.
Also, not sure if the big example data is realized in memory before the benchmark is run, so might be some measurement error in there.
This clearly illustrates that I’m using the library inefficiently. So, on to second try I tried reading some more api docs.
(defn d-dt2 [ds time-col data-col result-col]
(let [grad-t (cons 1 (dt-grad/diff1d (time-col ds)))
grad-x (cons 0 (dt-grad/diff1d (data-col ds)))
ds-intermediate (into ds [[:dt grad-t] [:dx grad-x]])
ds-result (assoc ds
result-col (dfn// (ds-intermediate :dx) (ds-intermediate :dt)))]
ds-result))
I found the docs for tech.v3.datatype.functional
, and saw that you could call the functions straight on columns. No performance increase yet though.
For the third attempt, I got a factor of 3 improvement in speed on the big example data, not bad (I was thinking at the time)!
(defn d-dt3 [ds time-col data-col result-col]
(let [ds (assoc ds
:dx (dtype/->array :float64 (cons 0 (dt-grad/diff1d (ds data-col))))
:dt (dtype/->array :float64 (cons 1.0 (dt-grad/diff1d (ds time-col)))))]
(assoc (dissoc ds :dx :dt)
result-col (dfn// (ds :dx) (ds :dt)))))
(c/quick-bench (-> example-data
(d-dt3 :time :x :dt-dx))) ; 500 us
(c/quick-bench (-> big-example-data
(d-dt3 :time :x :dt-dx))) ; 2 s !
Just one more minute…
While writing this post, I though “just another experiment…”
(defn d-dt-values [time x]
(let [a (dt-grad/diff1d time)
b (dt-grad/diff1d x)]
(dfn// a b)))
(defn d-dt4 [ds time-col data-col result-col]
(assoc ds result-col (d-dt-values (ds time-col) (ds data-col))))
And to my big surprise, this was way. way. faster. I got really exited seeing the benchmarks.
(c/quick-bench (-> example-data
(d-dt4 :time :x :dt-dx))) ; 99 us an improvement!
(c/quick-bench (-> medium-example-data
(d-dt4 :time :x :dt-dx))) ; 290 us even better!!
(c/quick-bench (-> big-example-data
(d-dt4 :time :x :dt-dx))) ; 34 ms !!!!
from 6 seconds to 0.034 seconds. a factor of almost 200.
60e6 / 0.034 = 1.7e9 operations per second. Hey! this is getting good. (unless I missed something and the data is just the first few values lazily evaluated in the benchmark…)
Last thoughts…
And that’s it for my experimentation. I hope that anyone can improve on the performance of this code a bit,
and let me see what speedy data processing code looks like using these libraries!
As always, any input is greatly appreciated
All benchmarks together:
(comment
(c/quick-bench (-> example-data
(d-dt1 :time :x :dt-dx))) ; 110 us
(c/quick-bench (-> example-data
(d-dt2 :time :x :dt-dx))) ; 313 us
(c/quick-bench (-> example-data
(d-dt3 :time :x :dt-dx))) ; 500 us
(c/quick-bench (-> example-data
(d-dt4 :time :x :dt-dx))) ; 99 us
(c/quick-bench (-> medium-example-data
(d-dt1 :time :x :dt-dx))) ; 4 ms
(c/quick-bench (-> medium-example-data
(d-dt2 :time :x :dt-dx))) ; 4 ms
(c/quick-bench (-> medium-example-data
(d-dt3 :time :x :dt-dx))) ; 4 ms
(c/quick-bench (-> medium-example-data
(d-dt4 :time :x :dt-dx))) ; 290 us
(c/quick-bench (-> big-example-data
(d-dt1 :time :x :dt-dx))) ; 5 s
(c/quick-bench (-> big-example-data
(d-dt2 :time :x :dt-dx))) ; 6 s
(c/quick-bench (-> big-example-data
(d-dt3 :time :x :dt-dx))) ; 2 s, some improvement
(c/quick-bench (-> big-example-data
(d-dt4 :time :x :dt-dx))) ; 34 ms !