4-item limit on Mapcat

I was running into a mysterious bug in my program, which populates generations of growth using mapcat and an atom. I was hitting an error where, on whatever the last designated step of the population growth, it was only parsing the first four items (seriously throwing off my population count). I examined my code closely and then started looking into mapcat, where finally I found the single note at the bottom of the documentation here: https://clojuredocs.org/clojure.core/mapcat. Using that as inspiration, I solved my problem by adding a doall that forced evaluation. My question is, why is there a magic number 4 with mapcat, and is there some way I should have known about that?

My code for anyone interested:

(defn create-generations
  "Create `num-generations` generations for `username` base-name; if `num-generations` is 0, no action is taken"
  [username num-generations]
  (when (< 0 num-generations)
    (let [parents (create-couple {:username username}) ;; create-couple returns a tuple [person-a person-b]
	  after-parents 2
	  CURRENT-COUPLES (atom parents)]
      (doseq [n (range after-parents (inc num-generations))]
	(log/info (str "Adding " (* 2 (count @CURRENT-COUPLES)) " people for generation " n
		       " :\n\t" (prn-str @CURRENT-COUPLES)))
	(let [new-couples (doall (mapcat ;; << right here, without the doall, is where it was short-circuiting after 4 
				  (fn mapcat-generations [[father-id mother-id]]
				    (log/info (str "Creating couple:" (prn-str {:fa father-id :mo mother-id})))
				    (create-couple {:username username
						    :father-id father-id
						    :mother-id mother-id
						    :generation-n n}))
				  @CURRENT-COUPLES))]
	  (reset! CURRENT-COUPLES new-couples))))))
1 Like

mapcat like many other Clojure functions returns a lazy sequence. Lazy sequences may sometimes be evaluated in “chunks” for performance reasons. I don’t know the specifics of mapcat but that really doesn’t matter here.

Fundamentally you should never put a lazy-sequence into an atom if you intend to consume it elsewhere. doall forces all lazy-seqs to realize all elements. It is also quite common to use (into [] lazy-seq) or (vec lazy-seq) which forces the lazy seq into a vector (instead of a list).

3 Likes

Ah! I think the key I was missing was that mapcat, like for, is lazy (which I hadn’t realized); I would have been much more cautious about basing my algorithm on a mutable thing had I realized that, which is why I ended up doall (could just have well used into [] or vec as you mentioned). I’d forgotten that laziness is one of the old “gotchas” I’ve heard new Clojure programmers bemoan.

Actually the issue with mapcat is that it’s not lazy enough. Which is why it works for the first 4 elements.

There’s a few things in Clojure which aren’t completely lazy, but mostly lazy.

I’m not sure of the cause of all of them, but I think it mostly boils down to apply not being lazy enough.

This is on top of chunked sequences.

In that sense, @thheller recommendation is probably the best. Just avoid mixing side effects and laziness. Everytime you have side effects, included I/O reads, make sure you properly force realize the laziness or use transducers, or eager fns like mapv, run, doseq, etc.

https://dev.clojure.org/jira/browse/CLJ-1218
https://dev.clojure.org/jira/browse/CLJ-1583

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.