Extract text content from hickory data using specter

#1

I have the following data structure:

    (def root
      {:tag :a,
       :content
       [{:tag :b,
         :content
         [{:tag :c, :content [1 2 "da" {:tag :d, :content ["hello" "c"]}]}]}]})

How can I select or extract text from the above data structure? I tried something like below which wouldn’t work.

    (select [??SOME-SELECTOR  ;; Couldn't figure out what to put here
           :content
           ALL
           string?]
          root)
#2

Hello!

Welcome to Clojureverse!

A reason you haven’t got any replies yet might be that you haven’t described what you expect to get out of your call. Should your call return just a list of the strings like ["da" "hello" "c"]? Or do you want to keep any other information?


From a quick review, I couldn’t find any way to do “arbitrary depth” traversals with Specter. Here’s a way to get all strings recursively from a map using the standard library:

(require '[clojure.walk])

(def root
  {:tag :a,
   :content
   [{:tag :b,
     :content
     [{:tag :c, :content [1 2 "da" {:tag :d,
                                    :content ["hello" "c"]}]}]}]})

(defn filter-recursive [pred coll]
  (let [matches (atom [])]
    (clojure.walk/postwalk (fn [el]
                             (when (pred el)
                               (swap! matches conj el))
                             el)
                           coll)
    @matches))

(filter-recursive string? root)
;; => ["da" "hello" "c"]

;; Or try to handle just the "valid" nodes
(->> root
     (filter-recursive (fn [m]
                         (and (map? m)
                              (contains? m :content))))
     (map :content)
     (mapcat #(filter string? %)))
;; => ("hello" "c" "da")

Does this work for you? It doesn’t use Specter, but personally, I wouldn’t pull in a library for this. Feel free to ask if you have any questions.

Teodor

1 Like
#3

Thanks! Yes I only need the strings in the :content vector.

I finally managed to get the following using Specter,

(def STR-VAL 
  (recursive-path [] p (cond-path 
                        vector? [ALL p]
                        map? [:content ALL p]
                        string? STAY)))

(select STR-VAL root)
;; ["da" "hello" "c"]
1 Like
#4

Glad you got it working! recursive-path was the magic sauce, it seems. For further interest, Using Specter Recursively seems to cover our use case.