Hey folks, I just whipped up a little auto-transducifying thread macro last night. It basically searches for contiguous groups of transducers and comps them together into a function that sequences the values flowing in the thread through the transducers. First the code, then we’ll discuss:
(def transducables
#{(symbol 'map)
(symbol 'cat)
(symbol 'mapcat)
(symbol 'filter)
(symbol 'remove)
(symbol 'take)
(symbol 'take-while)
(symbol 'take-nth)
(symbol 'drop)
(symbol 'drop-while)
(symbol 'replace)
(symbol 'partition-by)
(symbol 'partition-all)
(symbol 'keep)
(symbol 'keep-indexed)
(symbol 'map-indexed)
(symbol 'distinct)
(symbol 'interpose)
(symbol 'dedupe)
(symbol 'random-sample)})
(defn transducable? [form]
(when (sequential? form)
(contains? transducables (first form))))
(defn compose-transducer-group [xfs]
(->> xfs
(map #(apply (first %) (rest %)))
(apply comp)))
(defn xfn [xf-group]
(fn [args]
(sequence
(compose-transducer-group xf-group)
args)))
(defmacro x>>
"Just like ->> but first composes consecutive transducing fns into a function
that sequences the last arguement through the transformers. Also, calls nth
for ints. So:
(x>> [1 2 3]
(map inc)
(map (partial + 2)))
Becomes:
((xfn [[map inc]
[map (partial + 2)]])
[1 2 3])"
[x & threads]
(let [forms (->> threads
(partition-by transducable?)
(mapv #(if-not (and (transducable? (first %))
(second %))
%
(list (list `(xfn ~(mapv vec %))))))
(apply concat))]
(loop [x x, forms forms]
(if forms
(let [form (first forms)
threaded (cond (seq? form)
(with-meta `(~(first form) ~@(next form) ~x) (meta form))
(int? form)
(list `nth x form)
:else
(list form x))]
(recur threaded (next forms)))
x))))
(defmacro x>
"Just like -> but first composes consecutive transducing fns into a function
that sequences the second arguement through the transformers. Also, calls nth
for ints. So:
(x> [1 2 3]
(conj 4)
(map inc)
(map (partial + 2))
2)
Becomes like:
(nth
((xfn [[map inc] [map (partial + 2)]])
(conj [1 2 3]
4))
2)"
[x & threads]
(let [forms (->> threads
(partition-by transducable?)
(mapv #(if-not (and (transducable? (first %))
(second %))
%
(list (list `(xfn ~(mapv vec %))))))
(apply concat))]
(loop [x x, forms forms]
(if forms
(let [form (first forms)
threaded (cond (seq? form)
(with-meta `(~(first form) ~x ~@(next form)) (meta form))
(int? form)
(list `nth x form)
:else
(list form x))]
(recur threaded (next forms)))
x))))
Neat, right? Pretty simple. Why would you want to do this? Well, for one thing:
(->> (range 10000000)
(map inc)
(filter odd?)
(mapcat #(do [% (dec %)]))
(partition-by #(= 0 (mod % 5)))
(map (partial apply +))
;; (mapv dec)
(map (partial + 10))
(map #(do {:temp-value %}))
(map :temp-value)
(filter even?)
(apply +)
time)
Returns:
"Elapsed time: 5768.05707 msecs"
5000054999994
Whereas:
(x>> (range 10000000)
(map inc)
(filter odd?)
(mapcat #(do [% (dec %)]))
(partition-by #(= 0 (mod % 5)))
(map (partial apply +))
;; (mapv dec)
(map (partial + 10))
(map #(do {:temp-value %}))
(map :temp-value)
(filter even?)
(apply +)
time)
Returns:
"Elapsed time: 2102.793256 msecs"
5000054999994
Twice the speed with basically the same code.
The more transducers you can get lined up contiguously, the less boxing you’ll have in your thread. Let’s uncomment the (mapv dec)
that is present in both the threads above. Because mapv
is not a transducer, items get boxed halfway through our thread. As a result our performance degrades slightly for x>>
.
First ->>
:
"Elapsed time: 4590.188052 msecs"
44999977000016
Hmm, ->>
actually goes faster now, perhaps due to mapv
removing some laziness.
But for x>>
:
"Elapsed time: 2351.326542 msecs"
44999977000016
So we lost some speed due to the boxing, but we’re still doing way better than the default thread macro. Point is, if you want to maximize performance, try to align your transducers contiguously.
One more perc with x>
and x>>
: a naked integer in a thread becomes an nth
on the value threading through, so you can use them as replacements for get-in
for most cases with heterogeneous nestings:
(let [m {:a {:b [0 1 {:c :res}]}}]
(x> m :a :b 2 :c))
That’s just a personal preference but some might not want the change in semantics. Also, x>
is different in that if a transducer is passed in, it acts like thread-last. This too is a semantic divergence from ->
but in my experience we rarely ever use functions that take a function as a first argument in thread-first threads, so there probably won’t be too much confusion - and this makes x>
more useful.
Next steps: integrate Christoph Grand’s xforms transducers (maybe automatically, anaphorically, for the core fns, like reduce
and into
?)
Anyway, I just whipped this up, so there may be downsides to this formalism I haven’t noticed, but that’s what this post is for - to debate the merits or lackthereof! If it’s not a bad idea, I’ll probably wrap it up in a lib for general usage - but you are hereby granted, by the power invested in me, the inalienable right to copy and paste any of the words or code in this post anywhere you want.
(edit: oh, and the meat of the threading code is taken from clojure.core anyway) Any other features or constraints y’all’d like to see in this code pattern?
Happy hacking!
(edit: After some discussion below, I released the library version of this idea here: johnmn3/injest)