Best concise/idiomatic way to `map` for side-effects?

Webdev_Tory · May 8, 2020, 5:16pm

I have an fn that is going to repeatedly process URLS and add things to a database. After the first step I have the collection of things that will need processing; what’s the most elegant way to do this? (doseq [x xs] ...) is going to have me binding individual x, which is unnecessary; but straight-up map requires me to defeat its laziness with (doall (map ...)). Either of these will ultimately do the job, but is there a “right way” that would be concise?

mdiin · May 8, 2020, 5:37pm

I usually reach for doseq for that use case, but I guess mapv, as the strict sibling of map, could be used as well. I just feel like using map is hiding the side-effecting nature of the computation. Maybe it is the other way around, that doseq makes the side-effecting nature explicit.

Webdev_Tory · May 8, 2020, 5:40pm

Yeah, it’s a problem: with map I expect the return value to matter, while with doseq I expect specific treatment of the individuals. In truth, In truth, all I want is the side-effects; I don’t care about either of those implications.

didibus · May 8, 2020, 6:35pm

You want run! I believe.

mdiin · May 8, 2020, 7:09pm

Very nice! I love how even after so many years of Clojure I can still discover useful stuff in Clojure core.

Webdev_Tory · May 8, 2020, 9:22pm

There it is! Beautiful! https://clojuredocs.org/clojure.core/run! This is why I ask these questions.

Webdev_Tory · May 9, 2020, 9:53pm

Extension of the question – is there a way of using a threaded run!, a la pmap ?

seancorfield · May 9, 2020, 10:20pm

pmap is almost always the wrong solution: it is a sledgehammer with no way to control how it works.

Using Java’s executors from Clojure is idiomatic and builds on battle-tested solutions. These give you lots of control, both in terms of how many threads and/or how much parallelism, and also which algorithms are used.

Webdev_Tory · May 11, 2020, 7:42pm

Thanks. I’ll look into the executors. In my limited usage (thread-safe file writes and data downloads), pmap never caused troubles and simply improved my speed by a factor of 4. I’m curious in what scenarios you’ve known of such problems with it?

seancorfield · May 11, 2020, 8:21pm

Because it has no control over the level of concurrency used, you can easily overwhelm a service you are interacting with if it does “too much in parallel”, you can turn a well-behaved process into a CPU bound one (without speeding it up much) in some situations, and you can also sometimes get “too little” benefit if what you’re doing could be done with much more parallelism because it’s the opposite (and you could actually run a lot more requests in parallel than pmap would create).

We’ve run into all three to varying degrees. In the first case, we took down our own search engine service by adding pmap to a process that ran search queries so it could generate and send HTML emails!

These days we always use executors because we can easily control the number of threads used, the algorithm used (e.g., work-stealing, introduced in Java 8), and/or the amount of parallelism. We can start off small and use system configuration to experiment and find the ideal settings for the most “bang for our buck” that still plays nice with others.

didibus · May 12, 2020, 6:40am

That’s true for side-effect, I’ll still just put out there that pmap is intended for pure computation, and for that its still pretty decent.

Webdev_Tory · May 12, 2020, 4:51pm

Thanks for the elaboration. I think that must be true of “real” multi-threaded work, such as systems with long-running processes that repeatedly invoke multiple threads. I’ve not yet had to deal with such things; instead they tend to be one-off scripts and one-time IO stuff, so it’s not hard for me to stay pure and enjoy the easiness of pmap.

system · November 11, 2020, 4:51am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.