How to search XML in cljs?

I’m having trouble searching xml in a pure client-side fashion. I’ve been vacillating between using browser-based DOM functionality for this, trying to leverage Closure, and trying to leverage clojure.data.xml. I can get and read the XML in each of these ways, but I’m struggling to search it. In my example, I want to find every <title> element and obtain the string of what the element is titled. Even this seems difficult, though. Here’s what I’ve scratched up so far, with limited success:

;; this is all cljs
;; with clojure.data.xml, but is non-trivially nested without search capabilities (css/hiccup style would be best, or at least xpath)
(let [x (xml/parse-str "<title>Tech.ToryAnderson.com</title>")]
  (-> x :content) ; ("Tech.ToryAnderson.com")
  #_(js/console.log x))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; With raw javascript
(let [s "<title>Tech.ToryAnderson.com</title>"
      p (js/DOMParser.)
      doc (.parseFromString p s "text/xml")]
  (-> (.getElementsByTagName doc "title")
					;vec ; Can't get to a place to use cljs (map).
					; ;; repl/invoke error Error: [object HTMLCollection] is not ISeqable
      #_((aget 0)
	 .-innerHTML) ; "Tech.ToryAnderson.com" ;; works for just one 
      ))
;; but how to do this for a large collection with nested data? 

Yeah, navigating XML is tedious… Feel free to contribute a CLJS or CLJC engine to xml-pull :wink:

1 Like

I’m thinking of trying to port Enlive to CLJS; a real shame it isn’t here already

Have you considered an XPath library?

2 Likes

Ooh – that’s lovely

You can convert the HTMLCollection object to a clojure sequence with array-seq, and then map over like you normally would.

(comment
  (let [s "<title>one</title> <title>two</title> <title>three</title>"
        p (js/DOMParser.)
        doc (.parseFromString p s "text/html")
        html-collection (.querySelectorAll doc "title")]
    (map #(.-innerHTML %) (array-seq html-collection)))))
; => ("one" "two" "three")

See

Alternatively, consider using Hickory and its selectors:

1 Like

Hickory example:

(ns demo.scratch
  (:require [hickory.core :as h]
            [hickory.select :as s]))

(comment
  (let [s "<title>one</title> <title>two</title> <div><title>three</title></div>"
        tree (-> s h/parse h/as-hickory)
        title-elements (s/select (s/tag :title) tree)]
    (map #(first (get % :content)) title-elements)))
; => ("one" "two" "three")
1 Like

You can also use zippers. Here’s an example from Stack Overflow. Probably don’t need io/reader and instead of x/parse might have to use x/parse-str. Caveat: I haven’t tried this out yet myself.

In ClojureScript, clojure.data.xml uses the browser’s DOMParser.

If the run-time cost of conversion to Clojure data is OK, then zippers are the Cadillac of next steps. Traversing the zipper, you can “see” up and down and all directions from the current node, which can be convenient. At the other extreme is the standard library’s best-kept secret: xml-seq! Demonstrated here on an “RSS” feed which has its own title, in addition to items with titles. We select only the items’ titles:

user> (let [x (xml/parse-str "<rss><channel><title>Channel title</title><item><title>Tech.ToryAnderson.com</title></item><item><title>Second item</title><guid>foo</guid></item></channel></rss>")]
         (->> (xml-seq x)
              (filter #(= (:tag %) :item))
              (mapcat :content)
              (filter #(= (:tag %) :title))
              (mapcat :content)))

("Tech.ToryAnderson.com" "Second item")
3 Likes

+1 for showing me xml-seq. This works as a clojure-native searching method, but still lacks the advanced searching of something like xpath e.g. it’s non-trivial to perform a query like “All title nodes that are under doc.type=movie”. Or maybe I just need to embrace a more clojure way of thinking here.

Working in ClojureScript, in a browser, there’s no dishonor in using the browser’s built-in XPath. In “pure Clojure”, Enlive accomplished something more flexible than XPath with zippers, but Enlive’s notation will seem abstruse unless it’s obvious that XPath would have been harder. (The zippery part of Enlive is here: https://github.com/cgrand/enlive/blob/master/src/net/cgrand/enlive_html.clj)

1 Like

sadly, xml-seq appears to be CLJ only, not cljs

How strange! This deficiency of ClojureScript is not mentioned on “Differences from Clojure” https://clojurescript.org/about/differences.

On the bright side, xml-seq is a one-liner, an application of tree-seq, which appears to be in ClojureScript.

Big thanks to the comments and suggestions here. I learned much and gained some strong opinions/appreciations. Block link forthcoming.