Cloroutine v1


#1

Hi. I have reached some kind of local optimum in my coroutine journey so it’s time to freeze the API and bundle the first production release of cloroutine. Compared to the previous alpha release, the API has been simplified and the implementation doesn’t rely on core.async anymore, for reasons detailed below.

Applications

I’ve identified 3 major categories of problems that may benefit from this programming style and written a howto for each one (see the guides section in the readme). I’m pretty sure the list is not exhaustive but those are the ones matching common clojure patterns.

  • generators : the example relies on lazy sequences as it’s pretty straightforward to do so, but it could be easily extended to the fast-path reduce protocols to improve performance.
  • asynchronous processes : I chose a future-based example as this style is quite trendy these days but the technique is really transposable to many other concurrency primitives. I dogfooded it in implementation of missionary's sequential processes (task-based) and made a POC for an implementation of core.async's go blocks (channel-based).
  • transducers : the idea to leverage coroutines to define transducers is not mine but I still found interesting to compare my implementation to the original.

API

Coroutine-based programming involves two kinds of state :

  • accidental state : when the process must be suspended, most of the time the stack is not empty and needs to be moved on the heap in order to survive until the process is resumed, at which point it must be moved back on the stack. Keeping track of this state is the value proposition of cloroutine.
  • incidental state : coroutine-based programming is a form of imperative programming, when a function call is split into a suspend part and a resume part it’s very likely that the suspend action will perform some effect and the resume action will somehow depend on this effect. The state associated with this effect is business-related.

core.async IOC machinery is painful to use because it stores both kinds of state in the same object, requiring a basic understanding of its internals. My first attempt to decouple this state consisted of passing suspend and resume code to the cr macro, which would then rewrite breaking var calls into inlined suspend code followed by inlined resume code. Inlined code was supposed to close over incidental state. This strategy happened to work quite well but was a bit clumsy to work with, mostly because writing stateful stuff in clojure makes ugly code.

It turned out relying on thread-local state was a more elegant solution, given coroutines provide strong guarantees to run synchronously in the caller thread. So the final cr API simply takes a map of suspend vars to resume vars and introduces a break on suspend var calls, scheduling the resume var call for the next iteration. Breaking var functions are expected to handle incidental state via thread-local context set up around the coroutine call.

Implementation

I decided to roll my own implementation for various reasons. Since the beginning, I was reluctant with the idea of relying on core.async's IOC machinery because this is an implementation detail, not easily customizable as some helpers are namespace-private, and there’s no commitment for a stable API. More importantly, a significant amount of defects have been known for a long time and still don’t show any sign of being addressed, and the overall pace of maintenance activity is currently too low for my quality standards. On the technical side, the current design suffers from several flaws making it obvious to me that a major rewrite will eventually become unavoidable. Related issues include :

  • exception handling story is still incomplete, showing wrong behavior in some nontrivial cases (ASYNC-220).
  • letfn body rewriting is not supported althought it is technically possible to implement (ASYNC-221).
  • locals clearing seems not to have been considered (ASYNC-219, ASYNC-223)

On the personal side, the spiritual pleasure of compiler design is a reward in itself, largely worth the effort.

The core.async legacy is still visible in the overall strategy with the heavy usage of SSA intermediate representation. It happens to works very well for this particular problem, and most of the improvements were allowed by having this proper structure as plain data through the entire compiler pipeline, performing the following passes :

  1. AST analysis
  2. SSA construction
  3. variable lifespan trace
  4. variable slot assignment
  5. code emission

The AST analysis is performed by clojure.tools.analyzer.jvm for JVM Clojure and cljs.analyzer.api for ClojureScript.

The SSA construction is performed quite similarly to core.async, except the entire tree is walked. When an asynchronous form is encountered (e.g a fn body), it is walked as well to keep track of closed over locals, a necessary information to perform locals clearing. Another difference is that exception handling is a first-class citizen, represented as edges in the block graph just like regular branching jumps.

The variable lifespan trace is a pass walking the block graph in the past for each variable read and marking this variable in each walked block until the creator block is encountered. That way we know for each block which variable need to be kept stored.

The variable slot assignment pass looks at variable lifespan overlaps and gives slot indices to each such that overlapping variables have different slots, using a basic graph coloring strategy. That way, no more slots than necessary need to be allocated, improving memory footprint.

The code emission generates clojure code, mapping ssa blocks to functions. A plain array is instanciated to store variables and current running block. Each time a block transition happens, variables not needed anymore in the target block are niled in the array, effectively clearing locals. A loop ensures blocks are run until suspended or terminated state is reached, and the whole stuff is wrapped in a zero-arg function.

Final thoughts

This is my first non-toy compiler tech project, and giving birth to this was a very exciting experience. I hope you will find it useful, and of course I’m highly interested in feedbacks of any kind. Special thanks to Tim Baldridge for its initial design inspiration, I probably wouldn’t even have started without it. Thanks to Jean Niklas L’orange for its work on transducers and conduits, very inspiring as well.


#2

Very nice. Any plan to add a ClojureScript version of all three example usage?


#3

Generators and conduits should work out of the box on clojurescript (minus the macro requiring stuff). Async/await requires to replace j.u.c.CompletableFuture with e.g js/Promise and then adapt the API.
I’m not willing to include portability concerns into the guides because it’s not strictly necessary for didactic purpose. If you’re experiencing difficulties porting the guides to clojurescript, feel free to open an issue.


#4

It looks super interesting, congrats for your hard work.

I’m reading the example on transducers and I don’t understand eveyrthing. Does your tool allow to have asynchronous transducers (in the same way you can attach a transducer to a channel) ?


#5

Conduits are really just an alternative syntax to define transducers. Transducers produced by conduits are fully compatible with other transducers and have the exact same limitations, including their fundamentally synchronous nature. You can attach them to channels, if you will. The trick is to emulate blocking on input, and rewrite the code in a purely synchronous way.

The benefit of this syntax is mainly about clarity, because you don’t have to explicitly manage state. You can achieve the exact same thing by hand, with a reducing function transformer closing over a stateful construct (that’s how clojure.core’s transducers are written), but this is very OOP-ish and not easy to get right.


#6

Wow, I am impressed! Thanks for sharing; hope to give it a go soon.


#7

Thank you! Both Cloroutine and Missionary look really interesting.


#8

Having read the readme for both Cloroutine and Missionary again, as well as your proof of concept of the go-block using Cloroutine primitives, and compared it to missionary’s implementation of sequential processes, it seems to me that for most use cases missionary is an alternative to core.async but with a better story for error handling. As I see it the only thing missing from missionary to let it replace core.async is the communicating part of CSP, am I right?

Did you have a specific usecase for something like cloroutine/missionary that made you pursue it, or was it mainly an interesting technical challenge? :slight_smile:


#9

Missionary is what I use for all my IO effect coordination needs. It overcomes the following frustrations I experienced with core.async's CSP :

  • functional vs imperative : in missionary the underlying abstraction is the task, a pure value representing any asynchronous effect. Declaring a sp block is a pure operation (vs go blocks or async/await blocks, which are effectful).
  • error handling : core.async requires to program defensively because channels are unaware of failure so you are not provided anything to recover from uncaught exceptions in go blocks or in transducers attached to channels. In missionary, every asynchronous effect can fail (just like every synchronous effect can fail on the underlying platform) and is cancellable.
  • the frontier between communication and processes is very blurry in core.async IMO. A channel is supposed to be pure-communication, but a channel with a custom buffering strategy and a transducer attached to it is pretty much a process that happens to have an input and an output.
  • channels as the universal communication primitive feels unpractical to me. Sometimes what you need is a rendez-vous, sometimes what you need is a dataflow variable, sometimes what you need is a semaphore. You can implement any one with any other, but having several semantics at hand allows you to choose the primitive suited to the problem.

Cloroutine is not strictly necessary to achieve that. In the past I used to rely on monadic composition, and over time I found the IOC syntax to be more expressive, and that’s why I started digging into coroutines.


#10

Ok, that makes a lot of sense. I see the value in both approaches though, as having a single flexible primitive also feels nice. I will have to play with missionary in some of my projects to see more clearly how it compares to/replaces core.async.

What I really like about missionary is the clear approach to handling errors, which is a lot cleaner than some of the libraries adding error handling on top of core.async.


#11

here is an example of how a promise-based async/await can be done in clojurescript.


#12

Very cool @leonoel, thank you! I’d love to hear your thoughts on this thread if you have any to share.

Also wanted to make sure you’ve seen Andy Wingo’s series of fantastic related posts, starting with https://wingolog.org/archives/2017/06/27/growing-fibers

Cheers!


#13

I’ve read Andy Wingo’s notes on coroutine implementation in scheme, it’s always interesting to learn about how things can be done in idealistic environments. Unfortunately his strategy is not applicable to the JVM nor JS because there’s nothing similar to call/cc. That’s why cloroutine provides stackless coroutines while guile (and others, e.g lua) provide stackful coroutines. Stackless coroutines are more limited because you can’t suspend computation from within a nested stack frame, core.async's go blocks have the same limitation, modern javascript constructs as well. It’s possible to work at bytecode level to circumvent the platform limitations, but this requires major compiler modifications and the solution can’t be macro-based anymore. Therefore, stackless coroutines seem to be a good compromise for clojure.

His post on communication primitives is very interesting as well. I’m not familiar with concurrent ML but as I understand it, it’s a FP-flavored version of CSP. Instead of imperatively performing IO on channels, channel events are wrapped in values and composed with functional operators. Both models are equally powerful, you can implement CML on top of core.async.

I’ve had a look at Nathaniel J Smith’s work as well. I think he did a good job at identifying the accidental complexity carried by asynchronous programming, and I find the comparison with goto brilliant. I’ve not spent much time on it yet and I’m still being experimenting on this topic so the following will be mostly subjective.

I’m not fully convinced by trio, my gut feeling about his approach is that it tries to solve too many problems at once. There’s more than one way to implement parallelism and eventually you’ll have to interop with something working differently anyways. However, I’m on the same page on its fundamental design principles :

  • Every forked process must eventually be joined
  • Every asynchronous operation must be cancellable

I think the functional programming approach is more flexible and able to solve the same problems more elegantly. missionary relies on an agnostic representation of asynchronous effects as values. Error handling and cancellation are first class, to accurately match the synchronous counterpart of the underlying platform (every operation can throw, threads can receive interruption signal at any time). Any callback-based API can be made compliant with this convention. When you have an uniform way to deal with effects, you can implement any composition strategy on top of it. missionary provides the basic building blocks (race and join for parallel composition, sp for sequential composition) but the point is really to be allowed to switch to anything else at will.

I’m still dubious about CSP and concurrent ML. select doesn’t seem to be a feature worth having in application space. These explanations sound right to me, and that’s why I chose to stand away from it from the beginning.