Deterministic, parallel randomness in Clojure?

teodorlu · October 17, 2019, 10:30am

Hello!

Introduction

There’s this problem that’s been scratching my mind for a while. I’ve been toying with this Monte Carlo system. With the Monte Carlo method, you sample a bunch of random variables, calculate your results based on each simulation, and can then say something about the probability of the results, and how they relate together.

For this to be the case, I’d like to sample a sequence of maps. Let’s say we sample :x and :y between 0 and 1. Sampling 3 values might get us something like

[{:x 0.434 :y 0.342}
 {:x 0.936 :y 0.126}
 {:x 0.028 :y 0.123}]

Problem statement

I want to be able to generate the same sample (sequence of binding of variables) from a single seed, and I want the computation to be parallelizable.

Deterministic: The same sequence will be produced from the same seed and sample size.
Parallel: The sampling process should be possible to split across cores, where each core gets the global seed and the indexes it should produce (for instance, core 3 may be asked to generate values 200-399)

Ideally, I would like to request the j’th item from the sequence in a fast way.

I feel like I’m missing language to describe what I’m looking for here, and that there might be literature describing what I’m looking for. I’m also looking for work making this accessible in real world implementations.

There’s also test.check. I haven’t looked at the implementation, but with test.check you get deterministic generators. After each run, you are left with a seed, and you can use that seed to generate the same bindings. I’ll take “Go read test.check source” as a reply, but I’d prefer if that came with some reasoning.

Clojure context

I’m able to produce a deterministic, serial solution and a nondeterministic, parallel solution. Reprinting here for some context:

(ns th.scratch.deterministic-parallel)

;; Can we have deterministic, parallel randomness?

;; Deterministic serial randomness:

(defn sample-serial-deterministic [n seed]
  (let [rnd (java.util.Random. seed)]
    (repeatedly n
                (fn []
                  {:x (.nextDouble rnd)
                   :y (.nextDouble rnd)}))))

(let [seed 987324]
  (assert (= (sample-serial-deterministic 5 seed)
             (sample-serial-deterministic 5 seed))))

;; Nondeterministic parallel randomness

(defn sample-parallel-nondeterministic [n]
  (pmap (fn [_]
          (let [rnd (java.util.Random.)]
            {:x (.nextDouble rnd)
             :y (.nextDouble rnd)}))
        (range n)))

(assert (not= (sample-parallel-nondeterministic 5)
              (sample-parallel-nondeterministic 5)))

I’m hoping to get some feedback on this! And I’m putting this thread next to the watercooler, since I’m looking for more of a discussion then help coding up a solution.

Teodor

andy.fingerhut · October 17, 2019, 5:05pm

It seems straightforward to start each thread number with the same random seed deterministically across multiple runs of a program, and for each thread to have its own pseudo-random number generation state that is separate from the other threads, so that each thread will produce the same sequence of random numbers it did on a previous run.

However, if you want the entire parallel program to produce the same result deterministically, then if the multiple threads interact or communicate with each other in any way, e.g. they send messages to each other in a pattern that can change their computation, and the order they send/receive messages relative to generating random numbers can differ from one run to the next, then they might compute different things across different program runs for that reason.

If the threads are not communicating or interacting in any way at all, the results should be repeatable.

jackrusher · October 17, 2019, 6:49pm

Even if the threads don’t communicate with each other, I would be concerned about the scheduler behaving slightly differently between runs. This could cause the ThreadPool to differently distribute units of work among the threads (i.e. packet of work N becomes packet of work M in some thread T).

Assuming that one is operating in parallel to shorten processing time for some compute-heavy task, one trick is to deterministically generate and attach a sequence of random numbers to each work item before putting it in the work queue, thus decoupling randomness generation from the thread in which a given packet of work is later processed.

teodorlu · October 18, 2019, 2:07pm

@andy.fingerhut and @jackrusher,

Thanks for your input.

Slight specification: I don’t require threads communicating with each other, and I’m willing to spend a little effort making sure I don’t spawn randomness here and there.

I think I got a little further on the problem. I found the docs for java.util.SplittableRandom, which seems to help. With it, I could have one seed, create a SplittableRandom from it, and further split that such that I get one random instance that I can use in each thread.

I will have to be mindful of the order in which I call the random function, though. For instance, if I want to create maps, if I’m not careful, I might be depending on the order of the keys:

{:order-1 (let [r (java.util.Random. 101)]
            {:x (.nextDouble r)
             :y (.nextDouble r)})
 :order-2 (let [r (java.util.Random. 101)]
            {:y (.nextDouble r)
             :x (.nextDouble r)})}
;; => {:order-1 {:x 0.7219200581862772, :y 0.36452043826725966},
;;     :order-2 {:y 0.7219200581862772, :x 0.36452043826725966}}

jackrusher · October 19, 2019, 2:25pm

Hm. This doesn’t really help with the problem I mentioned above, so I suspect that I still don’t fully understand your use case.

teodorlu · October 19, 2019, 3:57pm

I don’t fully understand the problem myself, so I fully understand if you don’t! To some extent, I’m thinking aloud, hoping that if I miss something obvious, I might be corrected.

Rough solution sketch:

Fix the “chunk size” globally. For instance at 1000 “runs” per unit of work. If this is too low, there will be too much overhead with creating threads. If it’s too high, I won’t utilize cores sufficiently. Perhaps try to ensure that a “typical unit of work” takes 10-100 ms to finish.
Use a single global seed, and split that up to a random instance for each unit of work
Each unit of work can then generate their own random numbers, in order, based on the instance of Random that they got passed.

Pros:

Deterministic and parallel, if I’m not mistaken
Seems like a sound solution to me

Cons:

Required ordering of calls to .nextDouble within each thread

Something that might work just as well, is just to try to vectorize as much as possible instead. I’ve been looking at tech.v2.tensor, and I have a hypothesis that it might be better to just focus on faster storage.

This is really what I’m trying to do. I just want to do it by passing a seed around, instead of passing a bunch of numbers around, to reduce communication overhead. And I’m exploring ways to have the work done by each thread be deterministic from a single global seed.

Following, it makes sense to look at how to combine the results, without having to have too much overhead. One solution could be to put the data into buckets on the way back, so that instead of all sampled values, you get the number of values in the range (0.01, 0.02), the number of values in the range (0.02, 0.03), and so on.

Teodor

jackrusher · October 20, 2019, 2:37pm

The big question here is how you will make sure to serialize the assignment of SplittableRandom instances to bundles of work, which in turn depends on how you are processing the bundles.

The thing to avoid here is having the SplittableRandom instance be created by the function called by pmap (or, really, probably some other parallelizer), because you can’t be sure about the order in which the bundles will be processed and thus the order in which those random number generators will be assigned from the stateful parent generator.

If you iterate over your collection of bundles first, assigning seeded random number generators as you go, you’ll get more or less the solution I outlined above, except that you’ll be attaching a generator function rather than a sequence of numbers to each bundle.

system · April 20, 2020, 2:37am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.