You need to take from this list, meaning it should remove the taken number from the list (that’s the first mutable change you need), and then you need to add the number to the sum (the second mutable change you need).
This means you need two mutable things. Well in Clojure everything is immutable by default, but you can create mutable things, and ref
is one way to do it. It creates a mutable reference.
So now you have a mutable reference to the list of work, and a mutable reference to the sum. To change a ref, there’s a few functions you can use such as: ref-set, alter and commute.
To get and remove from the list you can do:
(first @work) ;; reads first
(ref-set work (next @work)) ;; removes first from ref
To add to sum you can do:
(alter sum + (first @work))
ref-set will just set the value to the new value, where as alter will take the old value and the new value and compute a new value from them. That’s why for sum we use alter
, but for removing from work we use ref-set
.
So putting it all together you have:
(defn dowork
[work-ref sum-ref]
(dosync
(when-let [n (first @work-ref)] ; avoid null exception so we don't try to add nil to sum
(alter sum-ref + n)
(ref-set work-ref (next @work-ref)))))
Okay, but why are we even using dosync here? Well, only because the problem then tells you that you should have multiple workers doing work concurrently. That means you need a way to coordinate them, so that they don’t do the same work more than once, or overwrite each other’s work.
For example, you don’t want two of them adding the same number to sum, that will skew the result. Similarly, you don’t want one of them to change the sum back to a prior state while the other has already removed the number, again that will skew your result.
This is where dosync
comes in. It basically defines the atomic unit of work that each worker can do, and a means for them to check with others if they are about to overlap or overwrite each other so that they don’t.
This is done pretty simply. The worker who enters a dosync
will get the next number from work, and the current value of sum, it’ll remove the item and update the sum on its own workstation, but before commiting the changes back to the ref, it’ll check if anyone else changed the ref while they were working. If so, the worker will realize that they can’t commit, and need to rebase their changes against the new ref value, and then they’ll try to commit again, until they manage to get in.
So how do you create two workers and have them work concurrently? Well future
is one way to do so.
(future (dowork work sum))
Will spawn one worker and have them do work. If you want to spawn two workers you just do:
(future (dowork work sum))
(future (dowork work sum))
Ok, but we don’t want them to add only once and stop working. We want them to work until there is no more work. So we need a loop:
(future (while @work (dowork work sum)))
This will have them work until there is no more work. Note how in dowork
I use next
and not rest
, that’s really important here, because next will return nil when empty, and nil is falsy, which will make the condition in our loop false and thus have them stop.
Alright, but how do you know when they are done? And how do we wait for them to be done before doing something else? Like printing the result of the work being done? Well you can block on the future by derefing on them.
; Spawn two concurrent workers working until there is no more work
(let [work (ref (apply list (range 1e5)))
sum (ref 0)
worker1 (future (while @work (dowork work sum)))
worker2 (future (while @work (dowork work sum)))]
@worker1 ;wait for worker1
@worker2 ;wait for worker2
@sum) ;return the result of their work
There you go.
Now the question has you experiment further with using commute
instead of ref-set
and alter
. That’s an optimization, you can see how summing is a commutative operation, so when your modification is commutative, you can use commute instead of alter, and it’ll be faster. That’s because the workers won’t check that they are doing things in-order, which lets them go faster, but only works if you can do things in any order, like for sum.
Finally the question has you experiment between deref and ensure. That part I’m also a bit confused here, I guess the answer is no, normally you use ensure when you only read from a ref and care that the ref value changes while you were working. Which isn’t the case here, so I guess the answer is no, it won’t make a difference here. That said, I wonder if it could affect the profiling, I don’t know enough to say, but it’s possible ensure would be more strict about having the worker redo work.