Implications of Project Loom/JEP 425 on core.async?

xji · May 16, 2022, 1:42am

Hi folks, I’m not familiar with Clojure as I program in Elixir professionally (but am quite interested in the comparison between BEAM languages and Clojure, and eager to improve my proficiency in Clojure), therefore sorry if this question doesn’t make much sense or gets something fundamentally wrong:

JEP 425 (Project Loom, virtual threads) is attracting a lot of attention recently. Does it have any implications for Clojure’s core.async? Would virtual threads be a natural fit for the go of core.async? (I mean, in Golang go literally launches a goroutine/green thread). From what I remember, in core.async, when a task/state machine is paused, it’s taken off the thread for another task/state machine to run, while if the execution resumes, it’s put back onto the thread again. Also, if I remember it correctly, the go tasks launched by core.async share a fix-sized thread pool. Then, wouldn’t it make sense to use a pool of virtual threads for it once Project Loom is done?

And if so, what would be the implications (e.g. performance or otherwise) to core.async? It would be interesting to see a comparison of core.async vs. BEAM VM’s actor processes in handling massively concurrent workloads, though of course I know such things might not be directly comparable using a single number.

(P.S.: One thing I’d like to clarify about JEP 425 is that, will it be a fundamental addition to the JVM itself, or is it more of an “official virtual thread API built on top of conventional JVM threads” kind of thing? AFAIK there were already some libraries providing lightweight threads on JVM e.g. Quasar, which from what I read seems to perform bytecode manipulation. I guess JEP 425 will be something different?)

didibus · May 16, 2022, 2:01am

No it wouldn’t. The go machinery of core.async already implements a form of lightweight concurency that multiplexes tasks onto real OS threads, adding another layer of it would probably just slow things down.

What is more likely to benefit is adding another macro like thread but that spawns a Loom fiber (aka virtual thread) instead. You probably simply would stop using go, since Loom’s virtual threads are stackful, I’d say it’s just all around better. The advantage of a stackful implementation is that you can now take and put from higher order functions or functions called from inside the macro, so you don’t have to flatten everything or introduce more channels.

I’m not sure if Loom would be faster than core.async’s go machinery. I feel it’s hard to beat the state machine approach, but fibers have more safepoints for yielding, so from a “use them for concurency” I’d say you’ll probably get better use out of them, because it’s hard not to choke your go and properly coordinate their yield points to be optimal, I think that’ll be easier with Loom, since for example all IO will automatically yield, you won’t accidentally block a thread, so in practice I think it’ll be a lot easier to use them for high concurency.

It is a fundamental addition to the JVM. They had to add a form of tail call optimization and a form of coroutine to the JVM, and they reworked all the existing APIs to introduce yield points for all blocking operations, and I think more work to make the stack work with virtual threads.

orestis · May 16, 2022, 3:16pm

My understanding of Loom is that they effectively made all Blocking IO calls suspendable, so that you can write blocking IO as usual and not have to think about system resources.

In that universe, core.async becomes more a coordination mechanism (channels) and less about cooperation, so the go macro will become more or less irrelevant (in systems designed for Loom).

So you could probably design an actor based system on top of core.async, with the caveat that CPU is still an issue. The BEAM has coooerative scheduling, so you can really write any kind of code without starving the system.

seancorfield · May 16, 2022, 6:17pm

As I posted on Slack, when a discussion started there, here’s a quick mock-up of how to use core.async blocking operations with a variant of go that uses virtual threads:

(defn vthread-factory [name]
  (-> (Thread/ofVirtual)
      (.name name 0)
      (.factory)))

(defonce ^:private ^ThreadFactory go-factory! (vthread-factory "go-pool-"))

(defmacro go! [& body]
  `(let [c# (async/chan)
         t# (.newThread go-factory!
                        ^:once
                        (fn* []
                             (try
                               (>!! c# (do ~@body))
                               (finally
                                 (async/close! c#)))))]
     (.start t#)
     c#))

(defmacro go-loop! [binding & body]
  `(go! (loop ~binding ~@body)))

(comment

  (let [c (async/chan)]
    (go-loop! [ns (range 10)]
      (when (seq ns)
        (>!! c (first ns))
        (recur (rest ns))))
    (go-loop! []
      (tap> (<!! c))
      (recur)))

  )

It’s just playground code but it “works” for basic stuff. I’d love to see someone with more experience with core.async do some experimentation and benchmarking in this area – it may not even make sense to use virtual threads for this (although removing the complex source code manipulation that go currently does should remove some limitations and possibly bugs, compared to the current implementation).

didibus · May 16, 2022, 6:25pm

I think you’re better of either doing something similar to thread or simply replacing the existing thread executor with one that uses virtual threads:

github.com

clojure/core.async/blob/6ac8ed2a26c67bbc3cc43e4e650f97c984666b0c/src/main/clojure/clojure/core/async.clj#L472

      
        
                   (dispatch/run
                     (^:once fn* []
                      (let [~@(mapcat (fn [[l sym]] [sym `(^:once fn* [] ~(vary-meta l dissoc :tag))]) crossing-env)
                            f# ~(ioc/state-machine `(do ~@body) 1 [crossing-env &env] ioc/async-custom-terminators)
                            state# (-> (f#)
                                       (ioc/aset-all! ioc/USER-START-IDX c#
                                                      ioc/BINDINGS-IDX captured-bindings#))]
                        (ioc/run-state-machine-wrapped state#))))
                   c#)))
            
            
(defonce ^:private ^Executor thread-macro-executor
              (Executors/newCachedThreadPool (conc/counted-thread-factory "async-thread-macro-%d" true)))
            
            
(defn thread-call
              "Executes f in another thread, returning immediately to the calling
              thread. Returns a channel which will receive the result of calling
              f when completed, then close."
              [f]
              (let [c (chan 1)]
                (let [binds (Var/getThreadBindingFrame)]
                  (.execute thread-macro-executor

I think personally adding a vthread and vthread-call similar to thread and thread-call that uses the ThreadPerTask executor (with a thread factory that creates vthread with a nice name and a counter) is what I’d add to core.async. I feel it’s the only change you need to make to it to support Loom. Then you have full control if you want a go block, which will remain the most portable option and might be faster in some cases, a real thread which is still probably ideal for compute heavy tasks, or virtual threads for I/O or light compute tasks.

I would also probably change the thread pool of thread to be bounded to your number of cores, since you won’t need to use it for I/O anymore, though that be a backwards breaking change. So maybe it be nice to also add a compute and compute-call that uses real threads on a CPU core bounded pool. Then you would use vthread for everything except for compute when doing heavy compute. And you’d still use go if you want to be portable with ClojureScript or older JDKs, or just curious to compare how the Goroutines perform compared to vthreads.

seancorfield · May 16, 2022, 6:51pm

Well, that’s why it needs someone who uses and knows core.async well to do some actual experiments to see how it might play out for real, rather than us armchair critics just positing theoreticals

Part of the issue is that even the blocking ops use O/S threads right now for the callback to deliver the promise used to unblock them which is an unnecessary use of threads of any kind so this probably needs more than just a bit of monkey-patching and keyhole surgery.

Maybe core.async is completely the wrong model altogether for concurrency based on virtual threads?

didibus · May 16, 2022, 7:57pm

Very true.

I think this will be something for JDK as a whole to figure out the best practice. Going forward, is there any reason to have any real threads?

I know currently they don’t have forced preemption (but plan to have it eventually), it only yields on IO or synchronisation. So for heavy compute you might still benefit from a separate pool. But for IO or just simple data shuffling it might be you can entirely rely on vthreads.

In the case of the async pool in core.async, it is currently used for data-shuffling and light compute inside go blocks. So would it be better to swap it for vthreads? Or would that slow things down, hard to say.

I don’t know about that. Obviously vthreads make a lot of concurency easier, even future and agents backed by vthreads can become a lot better now since you can spawn infinite of them, though you still have that compute/IO dicotomy.

But I feel that the CSP abstraction is still quite powerful and nice to work with for synchronization, it was the case even for real threads. If you look at go-lang for example, they made that their one and only way to synchronize between virtual threads, so there’s precedent here too.

That said, core.async specifically might have been constrained in its design by having to make due with a stackless cooperative coroutine implementation, maybe the ergonomics of the API would have been different with stackful preemptive fibers instead, and now that vthreads are there, it could be we see that play out.

There’s also been competition to CSP more recently, I think especially around structured concurency abstractions, so maybe those will become more popular.

Lastly, I wonder if promise abstractions could make a come back, such as with future, and I’m curious if there’s a way to better mix promise like things with channel like things.

Andrey_Antukh · October 5, 2022, 7:21am

After researching on it a bit; personally I can say that the virtual threads can just replace the go macro implementation because it is “no longer needed” (I mean it is already handled by the JVM runtime).
With go blocks backed by virtual threads we can just perform blocking operations on channels instead of building the state machine, because the same things will be already handled by the JVM.

Replacing a macro based state machine building with a runtime that is to be able park virtualthreads on blocking operations will enable the puts and takes spawn across the auxiliary functions or inline callbacks (inside the go blocks) instead of the needing to use additional go blocks (the inherently particularity of having this done in a macro).

The rest of the core.async / CSP abstractions are still valid and very useful and applies to the same use cases. In other words and in my opinion: virtualthreads replaces the implementation detail of go blocks.

I’m already working in an experiment in funcool/promesa to support core.async analogous abstractions but with CompletableFutures and VirtualThreads (that will make it usable on CLJS, with promesa abstractions, and get the full potential on JVM with virtualthreads). In a future I expect to also make an experiment and replace the core.async code with virtualthreads backed go blocks on the websockets code on penpot codebase.

system · April 5, 2023, 7:21pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.