When to use thread local state
Thread local state is all about imperative programming.
Resource reuse
Let’s say I need to perform some heavy string concatenation work, and I want to cut down garbage generation to increase performance. clojure.core/str
instanciates a new buffer for each call, although its extent is strictly limited to the evaluation of the function. Buffers could be reused across invocations, but we need to ensure a given buffer won’t be accessed concurrently. ThreadLocal
provides exactly this guarantee.
(def ^ThreadLocal tl
(ThreadLocal/withInitial
(reify java.util.function.Supplier
(get [_] (StringBuilder.)))))
(defn my-str [x & ys]
(let [^StringBuilder sb (.get tl)]
(. sb setLength 0)
(loop [x x ys ys]
(. sb append x)
(if ys
(recur (first ys) (next ys))
(. sb toString)))))
All good. my-str
is still pure from the outside, it doesn’t allocate more objects than necessary and it’s race-condition free.
Dynamic scope
STM transactions are a good example of resources with dynamic extent. Because the lifecycle of a transaction must be fully managed by the underlying engine, it would not be a good idea to expose it directly in the API. Instead, we have to wrap the set of actions to be performed in a dosync
block, and let the engine take care of running it within an appropriate transactional context. Transactions are never actually exposed, they’re just implicitly spanning the execution of the expression block. Indefinite scope + dynamic extent => dynamic scope.
(def names (ref []))
(dosync
(alter names conj "zack") ;; ok
@(future (alter names conj "shelley"))) ;; error, not in transaction
What allows the transaction object to never be exposed to the user is thread-local context. During the execution of a dosync
block, any STM action performs a lookup to this variable to get the current transaction and updates it to keep track of intents. The transaction is invisible to other threads so any attempt to escape synchronous scope of execution is doomed to fail.
Why dynamic vars won’t help
If you take the approximation (= dynamic-vars thread-local)
for granted, you may be tempted to implement this kind of stuff with dynamic vars. Unfortunately you can’t, because binding conveyance is allowed to break thread locality, effectively exposing your unsynchronized objects to race conditions. Don’t do that.
When NOT to use thread local state
Now, you can say : OK, I will use thread-local state with pure values only. Then I can safely capture current context and restore it later in another thread, nothing wrong can happen. I can use this trick to provide implicit arguments to my functions and my code can be much more concise.
The problem is, adding implicit context breaks referential transparency, increases mental overhead and fights against common clojure idioms. There’s just too many ways things can go wrong. Lazy sequences, and laziness in general, will escape thread-locality. Lambdas won’t capture thread-local context.
You should always prefer functional style and explicit arguments. If you think the arity of your function is too high, use maps. The dynamic var system relies on maps, anyways.
The special
library’s design is really typical of how anti-FP this pattern is. First, the library must eagerify results to make sure each evaluation happens within the extent of the function call (synchronously), which means you can’t use infinite sequences anymore. Then, if you want to declare a lambda inside a managed context, you have to be aware the context won’t be propagated to the lambda. Basically, you introduced non-determinism in your function and you’re not doing FP anymore.
(require '[special.core :refer [condition]])
(defn non-deterministic [n]
(for [i (range n)]
(if (odd? i)
(condition :odd i :normally 100)
i)))
A functional approach would be something like this :
(defn non-deterministic [n]
(fn [condition]
(for [i (range n)]
(if (odd? i)
(condition :odd i :normally 100)
i))))
Non-determinism is now explicit, functions are pure, you can close over your context, you can use infinite sequences, you can test in isolation. Sure, it’s more characters to type, but you need to be aware of the trade-off and ask yourself if what you get is worth giving away referential transparency.
How all this relates to green threading
A green thread, in the broad sense, is an identity holding a logical sequential process, represented in a way that allows execution to be fully managed in user-space. Because green threads run on actual threads, thread-local state can be leveraged to keep track of the process currently running (but that’s really an implementation detail).
Now some questions arise about dynamic vars.
- When a green thread is spawned, should it inherit current thread-local state ?
go
blocks do that in JVM clojure, consistently with future
et al, but not in clojurescript.
- If I declare a
binding
block inside a go
block, and there’s an asynchronous boundary inside, what should happen ? Should dynamic context be teared down before the asynchronous boundary, and restored after ? Should I be allowed to set!
bindings in this case ? How is it actually done in core.async
? What about clojurescript ?
To be honest I don’t think there’s an obvious answer to all these questions, because it’s not clear at all which problem dynamic vars are trying to solve. missionary
's design adds even more questions because process declaration is decoupled from execution. So that’s why I chose to ignore them completely. This resulted in a sound, performant, platform-consistent behavior and I’m really happy with it. When I need to use a function relying on dynamic vars, I wrap it in a function taking explicit arguments. I may change my mind in the future if I see a convincing rationale about the current design of dynamic vars, but frankly I doubt it will ever happen.