# How to transform nested map into flat sequence of [path, value] pairs

Hi!

Some conversation on the Clojurians Slack got me curious about normalizing maps into a vector of [path value] pairs.

``````(defn todo [m])

;; such that

(= (todo
{:a 99
:x {:b 999
:y {:z 123
:w 456}}})
[[:a] 99
[:x :b] 999
[:x :y :z] 123
[:x :y :w] 456])
``````

How would you write `todo`?

On the Clojurians Slack, this started in a thread, then I asked a separate question.

Teodor

3 Likes

I resorted to dumb recursion:

``````ed. recovery implementation was incorrect per sample output, now is correct.
``````
``````(defn recover [xs]
(reduce (fn [acc x]
(if  (or (vector? x) (not (seq x)))
(into acc x)
(into acc (recover x))))
[] xs))

(defn normalize
([path m]
(if (map? m)
(for [[k v] m]
(normalize (conj path k) v))
[path m]))
([m] (-> (normalize [] m)
recover)))

(def m {:a 99, :x {:b 999, :y {:z 123, :w 456}}})
;;user=> (normalize m)
;;[[:a] 99 [:x :b] 999 [:x :y :z] 123 [:x :y :w] 456]
``````

seems like this is probably abstract-able into some kind of general reducible path walk thing.

1 Like

whereâ€™s the faster, asynchronous (and obviously more superior) transducer implementation?

@zcaudate I would love to see you give it a shot

Bonus points if arbitrary deep maps donâ€™t blow up the stack.

Have you tried already? Awesome. How deep do the maps have to get to blow up?

Iâ€™m not feeling masochistic enough to try but hereâ€™s a reduce implementation that may come in handy.

hereâ€™s a js version thatâ€™s faster:

``````function nested_keys(m,path){
let out = [];
for(let [k,v] of Object.entries(m)){
if((null != v) && ("object" == (typeof v)) && !Array.isArray(v)){
for(let e of nested_keys(v,[...path,k])){
out.push(e);
};
}
else{
out.push([[...path,k],v]);
}
};
return out;
}

nested_keys({"a":{"b":{"c":1,"d":2},"e":{"f":4,"g":5}}},[]);
=> [[["a", "b", "c"], 1]
[["a", "b", "d"], 2]
[["a", "e", "f"], 4]
[["a", "e", "g"], 5]]
``````

I didnâ€™t catch the arbitrary nesting requirement. I think 0 seqs, with the ability to handle arbitrary nesting (I also think the docstring for clj jvm iterate is outdated, as it returns an internally reducible object that can optionally be seq coerced) [happy to be shown otherwise]:

``````(require '[net.cgrand.xforms :as x])

(defn compute-paths [{:keys [paths]}]
(->> (x/for [[p c] paths]
(if (map? c)
(eduction cat
(x/for [[chld xs] c]
[(conj p chld) xs]))
[p c]))
(eduction cat (partition-all 2))))

(defn update-paths [{:keys [acc paths]} new-paths]
(group-by (fn [[_ v]]
(if (map? v) :mods :adds)) new-paths)
:paths mods}]
(if (empty? mods)
(reduced res)
res)))

;;swap (into [] (comp x/last (map :acc) cat cat)))) to eduction.
(defn normalize [m]
(->> {:acc [] :paths (into {} (map (fn [[k v]] [[k] v])) m)}
(iterate (fn [seed]
(->> seed
compute-paths
(update-paths seed))))
(eduction x/last (map :acc) cat cat)
(into [])))

;;user=> (normalize m)
;;[[:a] 99 [:x :b] 999 [:x :y :z] 123 [:x :y :w] 456]
``````
2 Likes

shorter, faster recursive impl with mutation (unsafe results), possible blown stack.

``````(require '[net.cgrand.xforms :as x])
(import '[java.util List ArrayList])

(defn normalize [m]
(let [aux (fn normalize! [^List acc ^List p m]
(if (map? m)
(->> (x/for [[k v] m]
(normalize! acc (doto p (.add k)) v))
(reduce (fn [_ _] acc)))
(aux (ArrayList.) (ArrayList.) m)))
``````
1 Like

I think this one will leverage some parallelism (although retaining order) while allowing arbitrary nesting depth. An unordered variant (pipeline preserves order) would get more gainz. Could be a fold variant around here somewhere.

``````(require '[net.cgrand.xforms :as x])
(require '[clojure.core.async :as a])

(defn normalize [m]
(let [out        (a/chan 10)
pending    (a/chan 10)
remaining  (atom 0)
push-work! (fn [n xs]
(do (swap! remaining + n)
(a/onto-chan! pending xs false)))
_          (push-work! (count m) (x/for [[k v] m] [[k] v]))
walk    (fn [[p xs]]
(let [res (if (map? xs)
(do (push-work! (count xs)
(x/for [[k v] xs]
[(conj p k) v]))
:ignore)
[p xs])
_   (when (zero? (swap! remaining dec))
(a/close! pending))]
res))]
(a/pipeline-blocking
(+ 2 (.availableProcessors (Runtime/getRuntime)))
out
(comp (map walk)
(filter #(not= % :ignore))
cat)
pending)
(->> out
(a/into [])
a/<!!)))
``````
2 Likes

Using core.async internally for this is really interesting - I would never have thought of that.

Fun toy problem.

1 Like

@joinr, itâ€™s really interesting what youâ€™ve done.

Iâ€™ll put what Iâ€™ve extracted out from your example as a pseudo lisp targeting js (hopefully itâ€™s pretty readable). Because JS is single threaded, Iâ€™m using a standard array for accumulation but on the JVM, either an atom or a concurrent deque would work.

The most important bit is the step function for `nested-keys-step-fn`.
It takes three arguments â†’ `args`, `queue` and accumulated `output`

It linearises a nested loop by taking an entry from a queue, doing an operation and putting more tasks on the queue if a child is an object:

``````(defn.js nested-keys-step-fn
[args queue out]
(var [path m] args)
(k/for:object [[k v] m]
(cond (k/obj? v)
;; add to queue if object
(x:arr-push queue [[(:.. path) k] v])

:else
;; add to accumulated output otherwise
(x:arr-push out [[(:.. path) k] v])))
(return [queue out]))
``````

Examples:

``````(walk-step-fn [[] {"a" {"b" 2 "c" 3}}]
[]
[])
=> [[[["a"] {"b" 2, "c" 3}]]  ;; queue
[]] ;; output

(walk-step-fn [["a"] {"b" 2, "c" 3}]
[]
[])
=> [[] ;; queue
[[["a" "b"] 2] [["a" "c"] 3]]] ;;output
``````

Because the step function turns a nested loop into a linear one, we can use a generic `walk` implementation (using iterators, transducers, parallell threads, whatever).

The simplest is a while loop:

``````(defn.js walk [step-fn queue]
(var out [])
(while (<  0 (k/len queue))
(var args (x:arr-pop queue))
(step-fn args queue out))
(return out))
``````

and nested keys can be defined in terms of the walk function:

``````(defn.js nested-keys
[m]
(return (-/walk -/walk-step-fn [[[] m]])))

(nested-keys {:a {:b 1 :c 2}
:d {:e 1 :f 2}})
=> [[["d" "e"] 1]
[["d" "f"] 2]
[["a" "b"] 1]
[["a" "c"] 2]]
``````

The simplest async version of walk on JS looks like this:

``````(defn.js walk-async [step-fn queue out]
(return (j/future
(var args (x:arr-pop queue))
(step-fn args queue out)
(if (< 0 (k/len queue))
(return (-/walk-async step-fn queue out))
(return out)))))

(defn.js nested-keys-async
[m]
(return (-/walk-async -/walk-step-fn [[[] m]] [])))
``````

and output:

``````(j/<! (-/nested-keys-async {:a {:b 1 :c 2}
:d {:e 1 :f 2}}))
[[["d" "e"] 1] [["d" "f"] 2] [["a" "b"] 1] [["a" "c"] 2]]
``````

I donâ€™t think it blows the stack (because of the micro-tasklet implementation) but am not completely sure.

1 Like

The runnable javascript is here:

``````function walk_step_fn(args,queue,out){
let [path,m] = args;
for(let [k,v] of Object.entries(m)){
if((null != v) && ("object" == (typeof v)) && !Array.isArray(v)){
queue.push([[...path,k],v]);
}
else{
out.push([[...path,k],v]);
}
};
return [queue,out];
}

function walk(step_fn,queue){
let out = [];
while(0 < (queue).length){
let args = queue.pop();
step_fn(args,queue,out);
}
return out;
}

function walk_async(step_fn,queue,out){
return new Promise(function (resolve,reject){
try{
resolve(      (function (){
let args = queue.pop();
step_fn(args,queue,out);
if(0 < (queue).length){
return walk_async(step_fn,queue,out);
}
else{
return out;
}
})());
}
catch(e){
reject(e);
}
});
}

function nested_keys(m){
return walk(walk_step_fn,[[[],m]]);
}

function nested_keys_async(m){
return walk_async(walk_step_fn,[[[],m]],[]);
}
``````

What part of â€śtreeâ€ť do you not respect? Resistance is futile!

1 Like

a simple and stack-safe solution

``````(defn todo [m]
(let [out (transient [])]
(loop [path []
stack (list m)]
(if (not-empty stack)
(let [[x & xs] stack]
(if (empty? x)
(if (not-empty path)
(recur (pop path) xs))
(let [[k v] (first x)
kvs (rest x)
next-path (conj path k)]
(if (map? v)
(do
(recur next-path (conj xs kvs v)))
(do
(conj! out [next-path v])
(recur path (conj xs kvs)))))))))
(persistent! out)))
``````
4 Likes

I think (in general) you need to put the `out` transient into an atom and `swap!` via `conj!` to mutate it (and capture the result), or include it in the loop bindings and explicitly pass the result of `conj!` to maintain a reference to the returned result. Depending on undefined behaviors, if you treat it as a â€śbash in placeâ€ť mutable container and apply updates via a side-effecting model without capturing the result of `conj!` (as in the current `do` form), you can end up with incorrect data since the original reference may be modified under the hood (the result of `conj!` - the current state of the transient - may be a different object than the original bound value - which is not reflected in the binding). I had this error affect me years ago; uncertain what the exact failure semantics are, but the docs deter the bash-in-place pattern: â€śNote in particular that transients are not designed to be bashed in-place. You must capture and use the return value in the next call. In this way, they support the same code structure as the functional persistent code they replace.â€ť

incorrect example reference

3 Likes

Ah, so each time you conj! onto a mutable container, you get the â€śnewâ€ť container back (but the last one canâ€™t be used), and have to use that?

Thanks for explaining.

Shouldnâ€™t this be possible to fix in @Josh_Lemerâ€™s answer by adding the transient as a loop/recur argument?

Shouldnâ€™t this be possible to fix in @Josh_Lemerâ€™s answer by adding the transient as a loop/recur argument?

Yes, that was an alternative in the original suggestion.

Using `swap!` and an atom probably requires slightly less transformation of the original solution though.

1 Like

Sorry I always forget about that!

``````(defn todo [m]
(loop [path []
stack (list m)
out (transient [])]
(if (not-empty stack)
(let [[x & xs] stack]
(if (empty? x)
(if (empty? path)
(persistent! out)
(recur (pop path) xs out))
(let [[[k v] & kvs] x
next-path (conj path k)]
(if (map? v)
(recur next-path (conj xs kvs v) out)
(recur path (conj xs kvs) (conj! out [next-path v])))))))))
``````
1 Like