Extract text content from hickory data using specter


I have the following data structure:

    (def root
      {:tag :a,
       [{:tag :b,
         [{:tag :c, :content [1 2 "da" {:tag :d, :content ["hello" "c"]}]}]}]})

How can I select or extract text from the above data structure? I tried something like below which wouldn’t work.

    (select [??SOME-SELECTOR  ;; Couldn't figure out what to put here


Welcome to Clojureverse!

A reason you haven’t got any replies yet might be that you haven’t described what you expect to get out of your call. Should your call return just a list of the strings like ["da" "hello" "c"]? Or do you want to keep any other information?

From a quick review, I couldn’t find any way to do “arbitrary depth” traversals with Specter. Here’s a way to get all strings recursively from a map using the standard library:

(require '[clojure.walk])

(def root
  {:tag :a,
   [{:tag :b,
     [{:tag :c, :content [1 2 "da" {:tag :d,
                                    :content ["hello" "c"]}]}]}]})

(defn filter-recursive [pred coll]
  (let [matches (atom [])]
    (clojure.walk/postwalk (fn [el]
                             (when (pred el)
                               (swap! matches conj el))

(filter-recursive string? root)
;; => ["da" "hello" "c"]

;; Or try to handle just the "valid" nodes
(->> root
     (filter-recursive (fn [m]
                         (and (map? m)
                              (contains? m :content))))
     (map :content)
     (mapcat #(filter string? %)))
;; => ("hello" "c" "da")

Does this work for you? It doesn’t use Specter, but personally, I wouldn’t pull in a library for this. Feel free to ask if you have any questions.


1 Like

Thanks! Yes I only need the strings in the :content vector.

I finally managed to get the following using Specter,

(def STR-VAL 
  (recursive-path [] p (cond-path 
                        vector? [ALL p]
                        map? [:content ALL p]
                        string? STAY)))

(select STR-VAL root)
;; ["da" "hello" "c"]
1 Like

Glad you got it working! recursive-path was the magic sauce, it seems. For further interest, Using Specter Recursively seems to cover our use case.