Just wrote my first blog post about faster reduce in clojure

Teaser: You can speed up mapv by a factor of more than 9x:

Link: https://medium.com/@rauh/faster-clojure-reduce-57a104448ea4

Any comments are welcome. :slight_smile:

2 Likes

I think it’s also helpful to think about what you’re giving up with an iterator model vs either sequences or transducers with reducible collections. What are the tradeoffs? One of the biggest is that you have moved from thread-safe immutable objects to thread-unsafe mutable objects. To me, that’s giving up a lot, and not what I want as a default.

A lot of what you’re doing here is the exact same thing transducers take advantage of (that’s why MultiIterator exists - it’s embedded in the transducer machinery), except they don’t expose an unsafe interface.

As an aside, I found it hard to replicate any of the numbers in here - everything seemed at least an order of magnitude faster on my 4-year old laptop, making the differences pretty small.

3 Likes

Thanks for the feedback Alex!

I actually hadn’t thought about Thread safety at all. I figured iterating over some CLJ persistent data structures is save to do with an iterator even if the collection is changed in another thread. Is that not the case? I see that reduce over a java ArrayList will also just use an iterator internally. Could you elaborate on the thread safety issues with my iterator approach? If this isn’t thread safe (for clojure collections) I would definitely throw this out the window.

What do you mean with exposing an unsafe interface? The loop-it macro doesn’t actually expose the iterator. I feel like I’m doing something quite similar to what doseq does: Provide the values in a collection in the fastest way possible.

Not sure what the difference could be. I’m using Criterium to benchmark. You can see the JVM parameters in the project.clj (I’m running an Oracle JVM with Java 1.8.0)

Has anyone looked more at the thread safety issues of this approach? Was it confirmed? Otherwise I find it a nice new macro, not just its performance benefits, but its also a nice way to write certain form of loops.

Usually when I have to work on large collections and performance starts to become an issue, I prefer to pipe stuff to tesser: I have an opta-core CPU and I don’t see why I should keep 7 free cores when doing parallel programming with Clojure is so easy to both understand and code.

Same reasoning, but I usually use Claypoole.

Didn’t know about it! I’ll take a closer look asap

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.