Mapv & vec

taf · March 22, 2020, 3:50am

looking at mapv:

(defn mapv
  "Returns a vector consisting of the result of applying f to the
  set of first items of each coll, followed by applying f to the set
  of second items in each coll, until any one of the colls is
  exhausted.  Any remaining items in other colls are ignored. Function
  f should accept number-of-colls arguments."
  {:added "1.4"
   :static true}
  ([f coll]
     (-> (reduce (fn [v o] (conj! v (f o))) (transient []) coll)
         persistent!))
  ([f c1 c2]
     (into [] (map f c1 c2)))
  ([f c1 c2 c3]
     (into [] (map f c1 c2 c3)))
  ([f c1 c2 c3 & colls]
     (into [] (apply map f c1 c2 c3 colls))))

looking at vec:

(defn vec
  "Creates a new vector containing the contents of coll. Java arrays
  will be aliased and should not be modified."
  {:added "1.0"
   :static true}
  ([coll]
   (if (vector? coll)
     (if (instance? clojure.lang.IObj coll)
       (with-meta coll nil)
       (clojure.lang.LazilyPersistentVector/create coll))
     (clojure.lang.LazilyPersistentVector/create coll))))

how does one think about the role / relation of these two?

for example, looking at:

(->> [42 424 2 332 45 6]
     (sort >)
     (mapv inc)
     (mapv #(* % 2)))

(->> [42 424 2 332 45 6]
     (sort >)
     (map inc)
     (map #(* % 2))
     vec)

how do these expressions compare?

andy.fingerhut · March 22, 2020, 6:14am

The return value of (vec coll) is pretty much the return value of (mapv identity coll), but the former is more efficient in many cases, as it does not call identity on every element, and depending upon the type of coll, it might avoid other computational work that mapv would do. mapv is significantly more general in what it can do.

andy.fingerhut · March 22, 2020, 6:16am

Also mapv is eager, realizing the entire result and returning it, whereas map returns a lazy sequence, which may or may not ever traverse its entire input, depending upon what you do with the returned value.

dave.liepmann · March 22, 2020, 10:08am

I generally don’t use mapv inside data pipelines unless there’s a specific reason, so the second code example is closer to what I would reach for. Leave everything lazy until it needs to be realized.

Another difference is that I usually reach for (into []) over vec. I end up using into more than mapv just because into can always go at the end, whereas mapv might not make sense if the code changes so it’s not the outermost/last function. I believe I saw a mailing list discussion a couple years ago involving some of the core team that convinced me (into []) was modern idiom, but I can’t find a link and my memory could be completely backwards.

It’s probably an artifact of your example, but if sorting at the beginning and the end are equivalent I would prefer sorting at the end. Here’s how I might write it:

(->> [42 424 2 332 45 6]
     (map (comp #(* % 2) inc)) ; do all the calculation
     (sort >) ; prepare for output: sort then put in the desired data structure
     (into []))

I considered not threading the expressions, but since everything stays as a sequence until the end (which is a great reason not to mapv) I think threading is superior.

taf · March 23, 2020, 1:40am

absolutely love your take on things! thx for sharing!

system · September 21, 2020, 1:40pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.