How to efficiently convert a Java BitSet to a vector?

I’m stumbling across a strange (to me) performance issue, trying to convert a BitSet to a vector using this approach:

(defn bitset->vec
  [bits]
    (->> bits
       .size
       range
       (filter (fn [x] (.get bits x)))
       (into [])))

(comment
  (def bits (java.util.BitSet.))
  (.set bits 0 499999)
  (time (count (bitset->vec bits))))

"Elapsed time: 3258.975805 msecs"

Why does that takes such a long time?

Here is a faster way of doing it:

(defn bitset->vec2
  [bits]
  (as-> (.toString bits) v
    ((fn [s] (subs s 1 (dec (count s)))) v)
    (string/split v #", ")
    (mapv #(Integer/parseInt %) v)))

(comment
  (time (count (bitset->vec2 bits))))

"Elapsed time: 180.76525 msecs"

But it feels a bit insane to have to resort to that…

How would you do it?

Why are you converting to a vector in the first place? Given that Clojure defaults to using longs for numbers that would mean using 64bit to represent one bit? With the additional overhead of PersistentVector this seems like a mad way to represent the entire thing?

Why not use something like .toLongArray or .toByteArray from BitSet instead?

Yes, I should be, and am, looking for other ways to represent the data.

It represents pixels on a 2D grid, and I have an API that expects it as a vector of [x y] points. I’ve tried to go at it a couple of different ways, but haven’t come around the performance problem with reading the bits.

You’re just facing the reality of how slow reflection based interrop is:

(set! *warn-on-reflection* true)

(defn bitset->vec
   [bits]
   (->> bits
        .size
        range
        (filter (fn [x] (.get bits x)))
        (into [])))

;> Reflection warning, NO_SOURCE_PATH:6:24 - call to method get can't be resolved (target class is unknown).
;> Reflection warning, NO_SOURCE_PATH:6:8 - reference to field size can't be resolved.

(time (count (bitset->vec bits)))
"Elapsed time: 1016.910122 msecs"

(defn bitset->vec
   [^java.util.BitSet bits]
   (->> bits
        .size
        range
        (filter (fn [x] (.get bits x)))
        (into [])))

(time (count (bitset->vec bits)))
"Elapsed time: 33.692455 msecs"

Tl;DR: Just type hint bits and it will be fast again.

3 Likes

Oh, wow. Thanks!

*warn-on-reflections*… It showed the way for some more cpu cycles lost in that module! :smile:

Reflection is generally the #1 reason for slow Clojure code. Especially when used inside loops.

Other things to keep in mind sometimes is the use of checked math, for math heavy code, disabling the checks for overflows can help speed things up.

The fact that some reflection doesn’t show up with warn-on-reflection, specifically aget and aset don’t, and their multi-arity variant uses reflection under the hood making them super slow.

Then using sequences can slow things a bit if a lot of intermediate data is created, moving to transducers can help speed it up. Or loop/recur for utmost performance.

When using persistent data-structures and doing a lot of modification, transient can help speed things up. And some problems that really need fast updates or inserts/deletes can benefit from switching to arrays or using Java’s mutable data-structures instead.

Finally, primitive calls to loop/recur and functions can help speed boxing overhead in code that does a lot of primitive data manipulation.

Oh and I guess there’s also the use of protocols to speed up polymorphism if that’s an issue.

Anyways, for all these, I’d only reach for them if I notice slowness. And I like to use a profiler to identify where to improve things, such as https://github.com/clojure-goes-fast/clj-async-profiler or Visual VM

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.