I went through the core.async
walkthrough before doing some async stuff, and when I hit Line 101 I became curious to see what the limits of the go-threads were.
Some quick test code was made (a copy of the walkthrough code) where n go threads are made, each thread sends “hi” to its own channel, and alts!!
is used to collect the results:
(require '[clojure.core.async :refer [<! >! chan go thread alts! alts!!]])
(def timings
(into
(sorted-map)
(for [n [1 10 20 30 40 50 75 100 150 200 250 500 750 1000 1500 2000 3000]]
(let [cs (repeatedly n chan)
begin (System/currentTimeMillis)]
(doseq [c cs] (go (>! c "hi")))
(dotimes [i n]
(let [[v c] (alts!! cs)]
(assert (= "hi" v))))
(let [dur (- (System/currentTimeMillis) begin)]
(println "Read" n "msgs in" dur "ms")
[n dur])))))
And lo and behold, 1000 channels is 100 times slower than 100 channels. I don’t know what I was expecting really, but I feel like it shouldn’t take seconds for thousands of go threads. 10k threads took 30 seconds on my machine. I’ve clearly hit some limit, and I have a use-case where I might be waiting on 1k-10k asynchronous replies from the cloud.
At first I was wondering about what made the test code slow. I thought a bit in my ‘hammock’ (chair) and realized that the limiting part should be alts!!
for large collections.
So the ‘fix’, after repl’ing some more was to run alts!!
in parallel
(into
(sorted-map)
(for [n [1 10 20 30 40 50 75 100 150 200 250 500 750 1000 1500 2000 3000]]
(let [cs (repeatedly n chan)
begin (System/currentTimeMillis)]
(doseq [c cs] (go (>! c "hi")))
(doall (pmap
(fn [cs]
(dotimes [i (count cs)]
(let [[v c] (alts!! cs)]
(assert (= "hi" v)))))
(partition-all 100 cs)))
(let [dur (- (System/currentTimeMillis) begin)]
(println "Read" n "msgs in" dur "ms")
[n dur]))))
And suddenly the time it takes scales linearly. huh. Also 16-20 ms for 3k channels, instead of 2400 ms for 3k channels.
So lesson learned: alts!!
is very slow for large (thousand+) number of channels. I do not understand why, because the source code for alts
is quite difficult to follow for me, but it looks like it expands to some giant cond
statement.
- Have you hit slow cases with async code before? and how did you solve it?
- Is my test code garbage? if so, how?