Won’t that just return another future, which when you deref it, returns another future, ad infinitum? So no stack overflow unless you try to use recursion to dereference them, but no useful result. A recursive function with no base case is simply not useful independent of stack considerations.
This is the basic idea behind lazy sequences generated recursively too. In that case, you would use lazy-seq to defer the rest of the sequence, typically as a recursive call. This gets placed on the heap as a deferred value though, so no stack overflow.
While that is true the stack limit is usually so high, that it only becomes an issue in error cases. I’d say it is so extremely rare, and not something to worry about. In fact it might be good, since catching an error case with futures involved is much less obvious.
As with everything there are many strategies to mitigate such problems, future would not be on my list.
My use case I think is safe, in that I start with 20 or 30 threads, and they call sleep(60) before calling themselves again, so it is at most 30 threads, that create another 30 threads a minute later, and then another 30 threads a minute later. I’m assuming the old threads are reclaimed, I can probably fine tune the speed at which they are reclaimed.
Thank you for all of this. I might have gone down the wrong road by asking about Futures. Thank you for the bit about Delay.
Let’s assume that I decide to instead go back to “Java basics” and use an Executor Service to manage a thread pool, can you recommend any Clojure libraries that would make this activity less painful and more idiomatic for Clojure?
Okay, let me start over and reconsider the problem from the start.
I have some work that I need to do, for the side effects.
I’d like to “fire and forget”
That is, I’d like to have this function fire, and do some work that has side effects. I want this to happen in a thread, away from the main thread.
If I don’t use Futures or Delays or any of that, what might be a clean approach? Should I go back to “Java basics” and use an Executor Service and a thread pool? If yes, can you recommend a Clojure library that makes this less painful than in Java?
Set up as an independent worker? Yes, I could do this. I have done this in the past. But in the past I hard coded the number of workers. But you are right, I could set up a variable number of workers. But I don’t want this work to happen in the main thread. I’d still need to find a way to get this to happen in some thread that is not the main thread.
I don’t think this is painful at all. I personally like the control ExecutorServices provide as you can tweak them however you like. One neat thing is that Clojure functions implement the necessary interfaces to make interop pretty seamless.
;; can tune size to whatever makes sense for you
(def exec (Executors/newFixedThreadPool 5))
;; there is also newScheduledThreadPool if you want the timing stuff you mentioned
;; aka. schedule or scheduleAtFixedRate.
;; to submit a task to the pool
(.submit exec #(do-some-work ....))
;; at a later time
.submit even returns a future, just in case you ever want to wait for a result.
The reason I don’t like future from clojure itself, is that you have no control over the size of the pool or when to terminate it.
The API is quite reasonable. If you want communication between the threads core.async works pretty well.
That’s a fixed pool though, isn’t it? What if I wanted a dynamic, bounded worker pool operating on a shared input stream? Do you think it’s even worth trying to implement it, instead of relying on whatever Java has on offer – especially if I want to retain granular control over each worker? Or perhaps there’s a native Clojure lib that solves this elegantly?
Sure thing! Let’s imagine I’m creating some sort of a service that communicates with an external provider, say: a chatbot or similar. I hook up to some form of an event stream, and as I receive requests, I dispatch them to some parallel agent (worker etc.) to handle. That worker will ultimately deliver something to an external service, so there’s network IO involved and the work itself is purely for side-effects. Once the work’s done, the worker quits and releases its slot.
Now, I’d like to excercise some form of control over these workers. I don’t expect traffic to be constant, so I don’t need all hands on board at all times. I’d also like to set an upper limit. Lastly, I would need to exercise some degree of control in case any of the workers misbehaves (hangs, crashes), or if I simply choose to recall them for whatever reason.
Chances are of course that my approach is wrong from the outset and I should instead have a fixed pool of workers subscribing to a queue or something.
I think core.async checks many of the boxes here. E.g. a request channel grabbing events and pushing them onto a work queue (could be a simple channel, or abstracted to allow for maybe a distributed queue if you are really going for Web Scale etc.).
You can load balance the workers however you want; if they are not compute intensive but more IO bound, I would imagine just using lightweight threads (go routines) and spinning up as many as you need (e.g. one per task) and let them either complete|fail|timeout and get GC’d. They won’t waste resources when parked (aside from a few kb for state). Maybe tie their lifecycle to a timeout. You’d need some shared resource to ensure the bound (if necessary in this case) is respected; so like active goroutines bump an atom when they receive work, which determines whether new goroutines are created. Probably less of a problem if using lightweight threads though…
Otherwise you could spool up a registry of core.async/thread that look for work, timeout, and die. Have a little load balancer go routine that tries to push work to current thread pool; if there are no threads looking for work and the bound is not reached, register a new thread, otherwise wait for an available thread etc.
One issue is the go routines are multiplexed over a fixed thread pool (you can configure it) that is shared. If you want compute-intensive dedicated threads, then you can still provision them using the same API, and even tie the life of the thread to the go routine (e.g. goroutine acts as a proxy for receipt of work completion, spawns the thread with work and a reciept channel, parks and waits on the delivery to the receipt channel).
If you need “more granular” control over individual workers, nothing stops you from keeping track of the active workers and wiring in some communication channel (e.g. to send poison pills and the like). That is starting to blur into erlang actor territory, but not entirely; more like a process model vs. an all-purpose mailbox.
I can think of analogues using non-core.async stuff too here. You could manage all the lifecycle stuff with threads/futures. Load balancing and result delivery (absent channels) would be interesting (probably queues again). Potentially many choices…I think a work queue with a fixed threadpool is probably the easiest way to explore something. You can still explore failure/exception/interrupt cases there too.