4-item limit on Mapcat

Webdev_Tory · May 28, 2019, 3:37am

I was running into a mysterious bug in my program, which populates generations of growth using mapcat and an atom. I was hitting an error where, on whatever the last designated step of the population growth, it was only parsing the first four items (seriously throwing off my population count). I examined my code closely and then started looking into mapcat, where finally I found the single note at the bottom of the documentation here: https://clojuredocs.org/clojure.core/mapcat. Using that as inspiration, I solved my problem by adding a doall that forced evaluation. My question is, why is there a magic number 4 with mapcat, and is there some way I should have known about that?

My code for anyone interested:

(defn create-generations
  "Create `num-generations` generations for `username` base-name; if `num-generations` is 0, no action is taken"
  [username num-generations]
  (when (< 0 num-generations)
    (let [parents (create-couple {:username username}) ;; create-couple returns a tuple [person-a person-b]
	  after-parents 2
	  CURRENT-COUPLES (atom parents)]
      (doseq [n (range after-parents (inc num-generations))]
	(log/info (str "Adding " (* 2 (count @CURRENT-COUPLES)) " people for generation " n
		       " :\n\t" (prn-str @CURRENT-COUPLES)))
	(let [new-couples (doall (mapcat ;; << right here, without the doall, is where it was short-circuiting after 4 
				  (fn mapcat-generations [[father-id mother-id]]
				    (log/info (str "Creating couple:" (prn-str {:fa father-id :mo mother-id})))
				    (create-couple {:username username
						    :father-id father-id
						    :mother-id mother-id
						    :generation-n n}))
				  @CURRENT-COUPLES))]
	  (reset! CURRENT-COUPLES new-couples))))))

thheller · May 28, 2019, 8:28am

mapcat like many other Clojure functions returns a lazy sequence. Lazy sequences may sometimes be evaluated in “chunks” for performance reasons. I don’t know the specifics of mapcat but that really doesn’t matter here.

Fundamentally you should never put a lazy-sequence into an atom if you intend to consume it elsewhere. doall forces all lazy-seqs to realize all elements. It is also quite common to use (into [] lazy-seq) or (vec lazy-seq) which forces the lazy seq into a vector (instead of a list).

Webdev_Tory · May 28, 2019, 12:16pm

Ah! I think the key I was missing was that mapcat, like for, is lazy (which I hadn’t realized); I would have been much more cautious about basing my algorithm on a mutable thing had I realized that, which is why I ended up doall (could just have well used into [] or vec as you mentioned). I’d forgotten that laziness is one of the old “gotchas” I’ve heard new Clojure programmers bemoan.

didibus · May 28, 2019, 5:05pm

Actually the issue with mapcat is that it’s not lazy enough. Which is why it works for the first 4 elements.

There’s a few things in Clojure which aren’t completely lazy, but mostly lazy.

I’m not sure of the cause of all of them, but I think it mostly boils down to apply not being lazy enough.

This is on top of chunked sequences.

In that sense, @thheller recommendation is probably the best. Just avoid mixing side effects and laziness. Everytime you have side effects, included I/O reads, make sure you properly force realize the laziness or use transducers, or eager fns like mapv, run, doseq, etc.

https://dev.clojure.org/jira/browse/CLJ-1218
https://dev.clojure.org/jira/browse/CLJ-1583

system · November 27, 2019, 5:05am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.