Just wrote my first blog post about faster reduce in clojure


#1

Teaser: You can speed up mapv by a factor of more than 9x:

Link: https://medium.com/@rauh/faster-clojure-reduce-57a104448ea4

Any comments are welcome. :slight_smile:


#2

I think it’s also helpful to think about what you’re giving up with an iterator model vs either sequences or transducers with reducible collections. What are the tradeoffs? One of the biggest is that you have moved from thread-safe immutable objects to thread-unsafe mutable objects. To me, that’s giving up a lot, and not what I want as a default.

A lot of what you’re doing here is the exact same thing transducers take advantage of (that’s why MultiIterator exists - it’s embedded in the transducer machinery), except they don’t expose an unsafe interface.

As an aside, I found it hard to replicate any of the numbers in here - everything seemed at least an order of magnitude faster on my 4-year old laptop, making the differences pretty small.


#3

Thanks for the feedback Alex!

I actually hadn’t thought about Thread safety at all. I figured iterating over some CLJ persistent data structures is save to do with an iterator even if the collection is changed in another thread. Is that not the case? I see that reduce over a java ArrayList will also just use an iterator internally. Could you elaborate on the thread safety issues with my iterator approach? If this isn’t thread safe (for clojure collections) I would definitely throw this out the window.

What do you mean with exposing an unsafe interface? The loop-it macro doesn’t actually expose the iterator. I feel like I’m doing something quite similar to what doseq does: Provide the values in a collection in the fastest way possible.

Not sure what the difference could be. I’m using Criterium to benchmark. You can see the JVM parameters in the project.clj (I’m running an Oracle JVM with Java 1.8.0)