Eratosthenes Party Time! (A.k.a. Feedback wanted on this implementation of Eratosthenes Sieve)


#1

In PurelyFunctional.tv Newsletter 314, Eric Normand suggests the puzzle/challenge of implementing Eratosthenes Sieve in Clojure. It seemed easy enough for me to take a stab at.

From the Wikipedia description of the basic steps (that is without consideration of the refinements mentioned in the article, I came up with this:

(defn not-divisible-by [n d]
  (when-not (integer? (/ n d))
    n))

(defn sieve [n]
  (loop [found-primes [2]
         candidates (range 3 (inc n))]
    (let [highest-found-prime (last found-primes)
          remaining (->> candidates
                         (map #(not-divisible-by % highest-found-prime))
                         (remove nil?))]
      (if (= remaining candidates)
        (concat found-primes candidates)
        (recur (conj found-primes (first remaining)) (rest remaining))))))

(comment
  (sieve 30) => (2 3 5 7 11 13 17 19 23 29))

As was Eric’s intention, I am sure, I had lots of fun learning stuff doing the exercise, and now I am eager to learn even more. Therefore this request for feedback. Bring it on, please. What am I doing right and what am I doing wrong? I do note that giving it n = 1000000 takes an awful lot of time, but I am not sure if that is because of the simple version of the algorithm, or if I am doing something really stupid.

I also would love to see other people’s implementations. Let’s throw an Eratosthenes Party!


#2

I’ve stumbled across a mystery. Looking for other people’s solutions I found this one, that I adapted just slightly to match Eric’s formulation of the puzzle:

(defn primes-to-n
  [n]
  (filter (fn [num]
            (loop [end (int (Math/sqrt num)), div 2, re (rem num div)]
              (cond
                (< num 2) false
                (= num 2) true
                (= re 0) false
                (> div end) true
                :else (recur end (inc div) (rem num div))))) (range (inc n))))

Since I can see that abhilater uses one of the refinements I saw mentioned on the Wikipedia article I added that refinement to my implementation (I was also inspired to use (rem) for determining divisibility):

(defn not-divisible-by [n d]
  (when-not (= 0 (rem n d))
    n))

(defn sieve-refined [n]
  (let [sqrt-n (Math/sqrt n)]
    (loop [found-primes [2]
           candidates (range 3 (inc n))]
      (let [highest-found-prime (last found-primes)]
        (if (> highest-found-prime sqrt-n)
          (concat found-primes candidates)
          (let [remaining
                (->> candidates
                     (map #(not-divisible-by % highest-found-prime))
                     (remove nil?))]
            (recur (conj found-primes (first remaining)) (rest remaining))))))))

I then decided to try benchmark my solution against abhilater’s, using Criterium. (Sorry for the verbosity here, but I think it could help in solving the mystery.) So, comparing the two solutions for primes up to 10,000, starting with mine:

eratosthenes=> (with-progress-reporting (quick-bench (sieve-refined 10000) :verbose))
Warming up for JIT optimisations 5000000000 ...
  compilation occurred before 1447 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
x86_64 Mac OS X 10.14.1 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 25.131-b11
Runtime arguments: -Dfile.encoding=UTF-8 -XX:-OmitStackTraceInFastThrow -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Dclojure.compile.path=/Users/petery/Projects/pftv/target/classes -Dpftv.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count : 4866 in 6 samples of 811 calls.
      Execution time sample mean : 138.937937 µs
             Execution time mean : 138.937937 µs
Execution time sample std-deviation : 13.163908 µs
    Execution time std-deviation : 14.087452 µs
   Execution time lower quantile : 124.000417 µs ( 2.5%)
   Execution time upper quantile : 159.560233 µs (97.5%)
                   Overhead used : 8.534024 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 30.5188 % Variance is moderately inflated by outliers

”Not too bad!”, was my reaction, based on no knowledge what so ever on the subject. Then I ran the test for the other solution:

eratosthenes=> (with-progress-reporting (quick-bench (primes-to-n 10000) :verbose))
Estimating sampling overhead
Warming up for JIT optimisations 10000000000 ...
  compilation occurred before 82447 iterations
  compilation occurred before 164838 iterations
  compilation occurred before 10710886 iterations
  compilation occurred before 20762588 iterations
  compilation occurred before 21009761 iterations
  compilation occurred before 31638200 iterations
  compilation occurred before 42184248 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
Warming up for JIT optimisations 5000000000 ...
  compilation occurred before 188813 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
x86_64 Mac OS X 10.14.1 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 25.131-b11
Runtime arguments: -Dfile.encoding=UTF-8 -XX:-OmitStackTraceInFastThrow -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Dclojure.compile.path=/Users/petery/Projects/pftv/target/classes -Dpftv.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count : 10074924 in 6 samples of 1679154 calls.
      Execution time sample mean : 54.026565 ns
             Execution time mean : 54.026528 ns
Execution time sample std-deviation : 3.897644 ns
    Execution time std-deviation : 3.935399 ns
   Execution time lower quantile : 50.770124 ns ( 2.5%)
   Execution time upper quantile : 60.040617 ns (97.5%)
                   Overhead used : 8.534024 ns

Wow, so my solution is 2,000 times slower??? Depressing! I wanted to feel this difference in speed so I ran just a regular (time) measured comparison using primes up to 1,000,000 on that speedy solution:

eratosthenes=> (time (str "@abhilater's: " (count (primes-to-n 1000000)) " primes"))
"Elapsed time: 15099.709893 msecs"
"@abhilater's: 78498 primes"

So the Criterium results indicates I would have to wait 8 hours for my solution to produce those 78,498 primes, right? However:

eratosthenes=> (time (str "mine: " (count (sieve-refined 1000000)) " primes"))
"Elapsed time: 7845.699169 msecs"
"mine: 78498 primes"

Yes, just one run each, here, but it seems pretty consistent when I have run it a few times as well, my solution is about twice as fast in delivering the results.

I have also timed a few runs of primes up to 10,000 in that simple manner and it’s a bit less conclusive, but the two solutions seem pretty equal then, and certainly not that mine is 2,000 worse.

Anyone have an idea about what is going on?


#3

Found it! My implementation is not lazy, so the sequence is realized, which does not happen for the other solution.

Are there things I can do to my solution to make it lazy?


#4

I’ll keep talking to myself here :smile:, since I am learning stuff everytime I iterate some on my sieve. Now I’ve gotten it to where it sieves out the primes for n = 1,000,000 in 2.5 seconds, and for n = 100,000 in 0.1 seconds, on my not-all-too-stellar MacBook.

(defn sieve [n]
  (cond
    (< n 2) ()
    (< n 3) (range 2 3)
    :else (let [sqrt-n (Math/sqrt n)]
            (loop [known (range 2 4)
                   candidates (range 5 (inc n) 2)
                   p 3]
              (if-not (< p sqrt-n)
                (concat known candidates)
                (let [remaining (remove #(= 0 (rem % p)) candidates)
                      next-p (take 1 remaining)]
                  (recur (concat known next-p) (drop 1 remaining) (first next-p))))))))

(comment
  (sieve 30) => (2 3 5 7 11 13 17 19 23 29)
  (time (str "mine: " (count (sieve 100000)) " primes"))
  (with-progress-reporting (quick-bench (sieve 100000) :verbose)))

A drawback with this way of doing it is that it gets clumsy to implement some of the refinements mentioned in the Wikipedia article. I will probably try to make a completely other solution as well, that lends itself beter to those refinements.

Still haven’t figured out how to make it lazy, but I have tried to use lazy ways of dealing with the seqs where I can get away with it.


#5

Hello!

Cool problem.

If you’re trying to speed things up, I’d have a look at how you’re storing your data. You’re creating sequences that you’re iterating over, that contain numbers. That means you’re storing some redundant data. In principle here, we should be able to just store bits that are true or false, where sieve[x] == true if x is prime, zero otherwise. Then we should be able to store a table of a million primes in a million bits which is about 100 kilobytes. Which should be quite fast.

A quick search gave me java.util.BitSet, which seems to work like that. And if we’re iterating from 1 to sqrt(n) n times, we should be able to achieve a runtime of O(n * sqrt(n)) (Or less, I think linear time is possible, but I don’t have a good reason at hand). First set all primes to true, then unset the numbers you find from your iteration.

… and possibly wrap that imperative code in something that creates a Clojure representation of the same thing, perhaps a set of integers identifying primes, or a record that also knows the number of primes that has been computed, so that it can expand the sieve if someone ask for identification of a prime larger than the 1,000,000 you already had computed. Or just throws an error.

Teodor


#6

Thanks! I’ll give those bitsets a try next time I return to this problem. Seems it will also allow me to add some of the refinements that my solution makes hard to implement. Or, hard is not the right word, I did implement one of them, but it cost me performance.


#7

So I coudn’t keep my hands off this …

(defn bitset-sieve [n]
  (let [primes (java.util.BitSet. n)
        imax (-> n Math/sqrt int inc)]
    ;; Start by assuming they are all primes
    (.set primes 2 n)
    ;; Now, iterate through the primes.
    (loop [i (.nextSetBit primes 0)]
      ;; Imperative code ...
      (if (or (> i imax)
              (= i -1))
        primes
        (do
          (doseq [j (range (+ i i) n i)]
            ;; Not prime -- multiple of i!
            (.clear primes j))
          (recur (.nextSetBit primes (inc i))))))))

(defn bitset->vec [bs]
  (->> (range (.size bs))
       (filter (fn [x] (.get bs x)))
       (into [])))

With that implementation, I take 20-50 msecs to produce a sieve up to a million. If I mash it into a vector afterwards (bitset->vec), it takes me 1220 msecs, so it seems faster to avoid immutable data structures here. I do start out with all the numbers (range (.size bs)), so that could be sped up, but I think I’d rather avoid creating the vector.

Note that I also use the precomputed prime sequence to jump ahead (recur (.nextSetBit primes (inc i))), not bothering to touch what I know not to be primes.

(defn make-prime-checker [lim]
  (let [sieve (bitset-sieve lim)]
    (fn [n]
      (assert (pos-int? n))
      (assert (< n lim))
      (.get sieve n))))

(time
 (let [checker (make-prime-checker (* 1000 1000))]
   (doseq [i [1001 10321 99097]]
     (prn i 'prime? (checker i))
     )))
;; 1001 prime? false
;; 10321 prime? true
;; 99097 prime? false
;; "Elapsed time: 22.153831 msecs"

Note to self: imperative Clojure code may be fast, and may also hurt the eyes. And I should probably check the result against a list of primes, but now it’s bedtime!


#8

Thanks for sharing that!

I wonder if you were too quick to judge your results from realizing the vector. I tried this:

(defn bitset-sieve-2 [n]
  (let [primes (java.util.BitSet. n)
        imax (-> n Math/sqrt int inc)]
    ;; Start by assuming they are all primes
    (.set primes 2 n)
    ;; Now, iterate through the primes.
    (loop [i (.nextSetBit primes 0)]
      ;; Imperative code ...
      (if (or (> i imax)
              (= i -1))
        (filter #(.get primes %)
                (range 2 n))
        (do
          (doseq [j (range (* i i) n i)]
            ;; Not prime -- multiple of i!
            (.clear primes j))
          (recur (.nextSetBit primes (inc i))))))))

And it gives me the seq from sieving through the first one million in 150 ms. (Just returning the bitset takes 90 ms on my machine.) Also note that this contains a slight refinement of your solution starting testings from i squared, which speeds things up a tiny bit when I try it.

I saw a solution using (boolean-array) that is really fast. So made a version of your solution using that instead:

(defn boolean-array-sieve [n]
  ;; Start by assuming they are all primes
  (let [primes (boolean-array n true)
        imax (-> n Math/sqrt int inc)]
    ;; Now, iterate through the primes.
    (loop [i 2]
      ;; Imperative code ...
      (if (or (> i imax)
              (= i -1))
        (filter #(.get primes %)
                (range 2 n))
        (do
          (doseq [j (range (* i i) n i)]
            ;; Not prime -- multiple of i!
            (aset primes j false))
          (recur (inc i)))))))

It produces the same sieve in 220 ms on my machine. It could be that the loss is due to that I couldn’t figure out an equivalent to .nextSetBit for boolean arrays.


#9

Nice find. I think you’re returning a sequence, though, and not a vector. Not sure how much that would affect performance, and returning a sequence is probably okay.

The .nextSetBit part allows skipping checking numbers like 4, 6 after we have ruled them out by iterating from 2. I think we could achieve something similar with a third condition in the if.


#10

I should have mentioned the sequence vs vector. I could not measure any difference in performance, when I tried it. So I went with less code. :smile:


#11

This week’s PurelyFunctional.tv newsletter gave some other solutions:

You were mentioned, @PEZ!


#12

I’m famous!!! Which was the whole point of course. :metal:

It was Val’s solution that pointed me at boolean arrays. I have now tweaked the array version of your solution so that it beats Val’s, even if just barely. Funny observation is that it looks very similar to his, so maybe I became more inspired by his solution than I first realized. But I’m stealing with pride. The interesting thing is that my latest iterwtiin is without the major optimization of thwtviteration that made me famous. So i should be able to go below half of Val’s times. We’ll see.

Another observation is that if I switch to a BitSet in my latest iteratiion, I lose some little performance.

Now I must try to understand the take away Eric mentions as well. What a rabbit hole! :grinning:


#13

So, this is my latest (and maybe my last for a while) iteration:

(defn sieve [^long n]
  (let [primes (boolean-array (inc n) true)
        sqrt-n (int (Math/ceil (Math/sqrt n)))]
    (if (< n 2)
      '()
      (loop [p 3]
        (if (< sqrt-n p)
          (concat '(2)
                  (filter #(aget primes %)
                          (range 3 (inc n) 2)))
          (do
            (when (aget primes p)
              (loop [i (* p p)]
                (if (<= i n)
                  (do
                    (aset primes i false)
                    (recur (+ i p))))))
            (recur  (+ p 2))))))))

It produces primes in the pace of 1ms per 100K sieved, on my old machine. (10ms for a 1 million, 100ms for 10 million, and so on.) Then if the sequence is realized that takes 2ms per 200K.

On my machine, it is three times faster than Val’s contribution. The main differences from that solution are:

  • The type hint on n
  • Skipping sieving through the even numbers
  • Works for n == 2 and n == some other prime (Eric’s test bench lacks these checks).

Boy, did I enjoy this puzzle!


#14

Nice! Quite some speedup.


#15

Well, I’m back. :blush:

First: I made a test with my latest implementation in ClojureScript. It works almost out-of-the-box, you only need to use int-array instead, and performance is quite good, even if not as fast as on the jvm. Thinking that for cljc a reader conditional can let you have the cake and eat it. It did seem flaky on cljs, though… If I find some time, I’ll investigate that flakyness.

In that process I returned to my first implementation of the sieve and it got totally different when I had the experience from many more iterations. I decided to try optimize for readability and here’s the result, implementing the description from that Wikipedia page, including refinements, in an almost more readable form than the wiki text (no pun intended):

(defn sieve [n]
  (if (< n 2)
    ()
    (let [sqrt-n (Math/sqrt n)]
      (loop [primes (set (range 3 (inc n) 2))
             p 3]
        (if-not (< p sqrt-n)
          (concat '(2) (sort primes))
          (recur (difference primes (set (range (* p p) n p))) (inc p)))))))

It loses performance even from my first iteration, probably mostly because I realize a quite huge sequence of primes to remove from the found primes in that innermost loop. But it is still nice to see the algorithm this clearly, w/o it being as imperative as the wiki page describes it.

Producing the sieve for 1 mil, using this on ClojureScript, only works sometimes. Which might be an nrepl thing, I am not sure.


#16

Looked at my ”more readable” implementation again and saw that I did not fully utilize the refinement of not considering even numbers, and could more than double its performance:

(defn sieve [n]
  (if (< n 2)
    ()
    (let [sqrt-n (Math/sqrt n)]
      (loop [primes (set (range 3 (inc n) 2))
             p 3]
        (if-not (< p sqrt-n)
          (concat '(2) (sort primes))
          (recur (difference primes (set (range (* p p) n (+ p p)))) (+ p 2)))))))

It now sieves through 100K in 110 ms on my machine.

Since I had part of the same bug in my fastest implementation I fixed that too:

(defn sieve [^long n]
  (let [primes (boolean-array (inc n) true)
        sqrt-n (int (Math/ceil (Math/sqrt n)))]
    (if (< n 2)
      '()
      (loop [p 3]
        (if (< sqrt-n p)
          (concat '(2)
                  (filter #(aget primes %)
                          (range 3 (inc n) 2)))
          (do
            (when (aget primes p)
              (loop [i (* p p)]
                (if (<= i n)
                  (do
                    (aset primes i false)
                    (recur (+ i p p))))))
            (recur  (+ p 2))))))))

As expected the gain is less, only 10%, but anyway :smile:. 100K in 2.9 ms.