# Partition by capacity

I’ve implemented a solution to a problem I’m calling “partition by capacity” (described below), which I believe is correct but not particularly idiomatic. I’m wondering if others have suggestions for a cleaner implementation.

Partition by capacity
I have a sequence of objects (specifically maps), where certain keys will be vectors. I want to partition these sequences such that the total size of all these vectors within the partition does not exceed a pre-specified capacity. There will in general be multiple keys with different capacities. In essence this is related to a Multi-dimensional knapsack problem (I think), while hopefully avoiding the NP-hardness by only requiring partitioning in order.

An example (the first argument is the capacities):

``````(partition-by-capacity {:a 4 :b 3} [{:a [1 2 3 4] :b []} {:a [1] :b []}])
``````

returns (e.g. two partitions)

``````(({:a [1 2 3 4], :b []}) ({:a [1], :b []}))
``````

because the first map completely fills the first partition (just through `:a` in this case)

My solution
Below is my current implementation, which I believe works as required but presumably could be improved for readability (and quite likely efficiency)

``````(defn partition-by-capacity
[capacities coll]
(let [part (loop [ret []
cur 0
coll coll
;; Before adding anything, initial size is zero
current-size (zipmap (keys capacities) (repeat 0))]
(if (seq coll)
(let [[c & coll] coll
;; calculate the updated sizes if we add this newest
;; collection
new-size (reduce-kv (fn [m k v] (update m k + (count v)))
current-size
(select-keys c (keys capacities)))]
;; Check if any new size is over capacity
(if (seq (filter identity
(vals (reduce-kv (fn [m k v] (update m k > v))
new-size
capacities))))
;; If it is, start a new partition
(recur (conj ret (inc cur))
(inc cur)
coll
(reduce-kv (fn [m k v] (assoc m k (count v))) {} c))
;; Otherwise continue
(recur (conj ret cur) cur coll new-size)))
ret))]
;; partition by the generated assignments, making sure to drop the
;; annotations before returning
(->> (map assoc coll (repeat ::partition) part)
(partition-by ::partition)
(map #(map dissoc % (repeat ::partition))))))
``````
1 Like

Semantics aren’t clear to me for cases where partition size is exceeded or the entries are not granular enough to pack properly.

What is the result for ?

``````(partition-by-capacity {:a 3 :b 3} [{:a [1 2 3 4] :b []} {:a [1] :b []}])
``````

for now I’m allowing cases that exceed capacity to just create their own partition. so the solution returned by my current implementation, which I’m happy with, would be:

``````(({:a [1 2 3 4], :b []}) ({:a [1], :b []}))
``````

I’m not sure I understand your second case about granularity, could you give an example?

my loosely verified take:

``````(defn partition-by-capacity [capacities coll]
(let [aux  (fn aux [caps part xs]
(if-let [x (first xs)]
(let [new-caps (reduce-kv
(fn [acc k v]
(let [remaining (- (acc k) (count v))]
(if (pos? remaining)
(assoc acc k remaining)
(reduced :empty))))
caps x)]
(if (= new-caps :empty)
(lazy-seq (cons (conj part x) (aux capacities [] (rest xs))))
(aux new-caps (conj part x) (rest xs))))
[part]))]
(aux capacities [] coll)))
``````
1 Like

this is great, thank you!

As a funny addendum, I looked at this and said “why not just leverage `partition-by` and get a transducer variant for free…just supply a custom key function…?”

So

``````(defn capacity-key [capacities]
(let [caps      (atom capacities)
n         (atom 0)]
(fn key-fn [x]
(let [p @n
res (reduce-kv
(fn [acc k v]
(let [remaining (- (acc k) (count v))]
(if (pos? remaining)
(assoc acc k remaining)
(reduced nil))))
@caps x)]
(if res
(reset! caps res)
(do (reset! caps capacities)
(swap! n inc)))
p))))

;;invokes the seq path of partition-by...this will matter later
(defn partition-by-capacity2 [capacities xs]
(partition-by (capacity-key capacities) xs))

;;some testing utils.
(def raw (->> (fn []
{:a (vec (range 1 (inc (rand-int 4))))
:b (vec (range 1 (inc (rand-int 3))))})
(repeatedly 10)))

(defn check [xs]
(for [part xs]
(->> part
(reduce (fn [acc {:keys [a b]}]
(-> acc
(update :a + (count a))
(update :b + (count b)))) {:a 0 :b 0}))))

(check (partition-by-capacity {:a 4 :b 3} raw))
;;({:a 5, :b 2} {:a 5, :b 4} {:a 2, :b 3} {:a 2, :b 3})
(check (partition-by-capacity2 {:a 4 :b 3} raw))
;;({:a 5, :b 2} {:a 2, :b 0} {:a 1, :b 2} {:a 2, :b 2} {:a 2, :b 1} {:a 0, :b 2} {:a 2, :b 2} {:a 0, :b 1})
(check (into [] (partition-by (capacity-key {:a 4 :b 3})) raw))
;;({:a 5, :b 2} {:a 5, :b 4} {:a 2, :b 3} {:a 2, :b 3})
``````

So I think the subtle issue I forgot about (and an undocumented one) is that the seq version of `partition-by` will invoke `f` up to 2x for an entry (at the partition boundary). The transducer variant does not. So we get an oddity where the legacy seq variant ends up giving unexpected behavior when we use a key-fn that has internal state and is expected to be called 1x per entry, while the transducer version is fine. I think there was a semi-recent thread where this popped up. In any case, here’s another way to do it - with caveats. The seq implementation could probably be made to conform though.

oh yea that’s great, I had been trying to think up a good way to make a key function for this! I definitely would have had a hard time debugging that issue though.

I also found a strange behavior from your first implementation, which for the life of me I can’t track down the source of:

``````(partition-by-capacity {:a 4, :b 3}
[{:a [], :b [1 2]} {:a [1 2 3], :b []} {:a [1], :b [1]}
{:a [1 2 3], :b []}])
``````

will return (the last element is not in a vector)

``````([{:a [], :b [1 2]} {:a [1 2 3], :b []} {:a [1], :b [1]}]
{:a [1 2 3], :b []})
``````

it seems to be a problem with ending the on a partition with only one element, as this is fine:

``````(partition-by-capacity {:a 4, :b 3}
[{:a [], :b [1 2]} {:a [1 2 3], :b []} {:a [1], :b [1]}
{:a [1 2 3], :b []} {:a [1 2 3], :b []}])
``````

returns

``````([{:a [], :b [1 2]} {:a [1 2 3], :b []} {:a [1], :b [1]}]
[{:a [1 2 3], :b []} {:a [1 2 3], :b []}])
``````

Yeah it was a screwup, should have returned [part] in that case. I updated the original answer.