Turn a “flat” data structure into a “normalized” one

rmschindler · November 28, 2022, 12:06am

Say I have a data structure like:

["2021-01-01" "2022-01-01" "2022-01-02" "2022-01-03" "2022-02-01" "2022-02-02"]

How would I turn it into:

{2021
  {1 [1]}
 2022
  {1 [1 2 3]
   2 [1 2]}}

?

I am looking for what is idiomatic Clojure… that is to say, I know how to do this, but I’m looking for something that is a beautiful piece of code, as I think there must be an elegant way of expressing this in Clojure.

MartinPuda · November 28, 2022, 12:44am

Try this:

(defn normalize [dates]
  (->> dates
       (map #(mapv parse-long (s/split % #"-")))
       (group-by (juxt first second))
       (reduce-kv (fn [m k v] (assoc-in m k (mapv #(% 2) v))) {})))

(normalize ["2021-01-01" "2022-01-01" "2022-01-02" "2022-01-03" "2022-02-01" "2022-02-02"])
=> {2021 {1 [1]}, 2022 {1 [1 2 3], 2 [1 2]}}

rmschindler · November 28, 2022, 12:56am

My friend, the clarity with which you must understand Clojure, to suggest what you suggested, is impressive. I already doubt that anybody can do better!

rmschindler · November 28, 2022, 1:47am

I want to suggest that the reason why I am spending time figuring out how to do this rather basic data maniuplation, after years of learning Clojure, as something holding our language back.

kees · November 28, 2022, 4:46am

Parsing as datetimes produces results similar to splitting on -. I wanted to try updating map values instead of reducing to a new map.

(require '[clojure.instant :refer [parse-timestamp]])

(defn normalize
  [dates]
  (let [third #(% 2)
        years (->> (mapv #(parse-timestamp vector %) dates)
                   (group-by first))
        months #(group-by second %)
        days #(update-vals % (partial mapv third))]
    (update-vals years (comp days months))))

(normalize test-dates)
;; => {2021 {1 [1]}, 2022 {1 [1 2 3], 2 [1 2]}}

How can I improve this?
It becomes nested access gymnastics fast.

joinr · November 28, 2022, 5:26am

No group-by, just single-pass using reduce/update-in and map. seq and transducer versions. I really wish clojure.core had injected function arity information in fn (actually fn* I think) forms so we could dispense with the need for completing entirely and just infer it from the arities provided.

(def xs ["2021-01-01" "2022-01-01" "2022-01-02" "2022-01-03" "2022-02-01" "2022-02-02"])

;;seq
(->> xs
     (map #(map parse-long (clojure.string/split % #"-")))
     (reduce (fn [acc [y m d]]
               (update-in acc [y m] (fn [x] (conj (or x []) d)))) {}))

;;xducer
(transduce
 (map #(eduction (map parse-long) (clojure.string/split % #"-")))
 (completing
  (fn [acc [y m d]]
    (update-in acc [y m] (fn [x] (conj (or x []) d))))) {} xs)

I bet meander or similar libraries (maybe specter or even malli) offer some really nice declarative ways to do this, but whether they are idiomatic is up for debate.

rmschindler · November 30, 2022, 12:20am

Thanks everyone, some solid food-for-thought here!

Linus_Ericsson · January 23, 2023, 11:59pm

The best friend of update-in is probably fnil, that makes it very convenient to initialize with some initial value when nil. (fnil conj []) intializes a vector when reaching an empty key.

joinr:

(def xs ["2021-01-01" "2022-01-01" "2022-01-02" "2022-01-03" "2022-02-01" "2022-02-02"])

;;seq
(->> xs
     (map #(map parse-long (clojure.string/split % #"-")))
     (reduce (fn [acc [y m d]]
               (update-in acc [y m] (fnil conj []) d)) {}))

system · July 25, 2023, 12:00pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.