Turn a “flat” data structure into a “normalized” one

Say I have a data structure like:

["2021-01-01" "2022-01-01" "2022-01-02" "2022-01-03" "2022-02-01" "2022-02-02"]

How would I turn it into:

  {1 [1]}
  {1 [1 2 3]
   2 [1 2]}}


I am looking for what is idiomatic Clojure… that is to say, I know how to do this, but I’m looking for something that is a beautiful piece of code, as I think there must be an elegant way of expressing this in Clojure.

1 Like

Try this:

(defn normalize [dates]
  (->> dates
       (map #(mapv parse-long (s/split % #"-")))
       (group-by (juxt first second))
       (reduce-kv (fn [m k v] (assoc-in m k (mapv #(% 2) v))) {})))

(normalize ["2021-01-01" "2022-01-01" "2022-01-02" "2022-01-03" "2022-02-01" "2022-02-02"])
=> {2021 {1 [1]}, 2022 {1 [1 2 3], 2 [1 2]}}

My friend, the clarity with which you must understand Clojure, to suggest what you suggested, is impressive. I already doubt that anybody can do better!

I want to suggest that the reason why I am spending time figuring out how to do this rather basic data maniuplation, after years of learning Clojure, as something holding our language back.

Parsing as datetimes produces results similar to splitting on -. I wanted to try updating map values instead of reducing to a new map.

(require '[clojure.instant :refer [parse-timestamp]])

(defn normalize
  (let [third #(% 2)
        years (->> (mapv #(parse-timestamp vector %) dates)
                   (group-by first))
        months #(group-by second %)
        days #(update-vals % (partial mapv third))]
    (update-vals years (comp days months))))

(normalize test-dates)
;; => {2021 {1 [1]}, 2022 {1 [1 2 3], 2 [1 2]}}

How can I improve this?
It becomes nested access gymnastics fast.

1 Like

No group-by, just single-pass using reduce/update-in and map. seq and transducer versions. I really wish clojure.core had injected function arity information in fn (actually fn* I think) forms so we could dispense with the need for completing entirely and just infer it from the arities provided.

(def xs ["2021-01-01" "2022-01-01" "2022-01-02" "2022-01-03" "2022-02-01" "2022-02-02"])

(->> xs
     (map #(map parse-long (clojure.string/split % #"-")))
     (reduce (fn [acc [y m d]]
               (update-in acc [y m] (fn [x] (conj (or x []) d)))) {}))

 (map #(eduction (map parse-long) (clojure.string/split % #"-")))
  (fn [acc [y m d]]
    (update-in acc [y m] (fn [x] (conj (or x []) d))))) {} xs)

I bet meander or similar libraries (maybe specter or even malli) offer some really nice declarative ways to do this, but whether they are idiomatic is up for debate.


Thanks everyone, some solid food-for-thought here!

The best friend of update-in is probably fnil, that makes it very convenient to initialize with some initial value when nil. (fnil conj []) intializes a vector when reaching an empty key.