Getting content from parsed XML

Consider the following XML string:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title type="text">Economia e finanças</title>
    <subtitle type="text"></subtitle>
    <id>painel_indicadores</id>
    <updated>2019-09-08T14:38:38-03:00</updated>
    <category term="Economia e finanças" />
    <category term="" />
    <link rel="alternate" href="https://www.bcb.gov.br/" />
    <entry>
        <id>painel_indicadores_JUROS</id>
        <title type="text">Taxa Selic</title>
        <updated>2019-09-06T22:56:03-03:00</updated>
        <link rel="alternate" href="https://www.bcb.gov.br/" />
        <content type="html">&lt;div id=label&gt;Meta:&lt;/div&gt;&lt;div id=rate&gt;&lt;div id=ratevalue&gt;6&lt;/div&gt;&lt;div=ratedate&gt;31/07/2019&lt;/div&gt;&lt;/div&gt;&lt;div id=label&gt;Diária:&lt;/div&gt;&lt;div id=dailyrate&gt;&lt;div id=dailyratevalue&gt;5,9&lt;/div&gt;&lt;div id=dailyratedate&gt;06/09/2019&lt;/div&gt;&lt;/div&gt;</content>
    </entry>
</feed>

I’m writing an app that parses this XML text to a map so that I can get the content of the content tag:

<content type="html">&lt;div id=label&gt;Meta:&lt;/div&gt;&lt;div id=rate&gt;&lt;div id=ratevalue&gt;6&lt;/div&gt;&lt;div=ratedate&gt;31/07/2019&lt;/div&gt;&lt;/div&gt;&lt;div id=label&gt;Diária:&lt;/div&gt;&lt;div id=dailyrate&gt;&lt;div id=dailyratevalue&gt;5,9&lt;/div&gt;&lt;div id=dailyratedate&gt;06/09/2019&lt;/div&gt;&lt;/div&gt;</content>

Currently I’m doing this way:

(:require
  [clojure.xml :as xml])

(defn decode-xml-response [response-body]
  (let [xml-response (xml/parse
                      (ByteArrayInputStream. (.getBytes response-body)))]
    (first (:content (last (:content (last (:content xml-response))))))))

Then I call decode-xml-response passing the XML string. It works as I wish, but I wonder if there is a more elegant/idiomatic way to get what is inside the content tag from the top of the question.

You might consider using zippers. There is an example on Stackoverflow here.

Hello!

What library are you using?

I’m asking because the import for org.clojure/data.xml is

(require '[clojure.data.xml])

Teodor

I gave it a shot using org.clojure/data.xml. Strategy: try to make it simple to step down one level. I’d do that by asking for the first tag of a given type.

(ns th.scratch.usexml
  (:require [clojure.java.io :as io]
            [clojure.data.xml :as xml]))

(def sample-resource-path
  "th/scratch/usexml/sample-1.xml")

(defn load-xml-resource [resource-path]
  (-> resource-path
      io/resource
      io/reader
      xml/parse))

(defn first-child-of [xml tag]
  (->> xml
       :content
       (filter (fn [item]
                 (= tag (:tag item))))
       first))

(def xml-ns "xmlns.http%3A%2F%2Fwww.w3.org%2F2005%2FAtom")

(defn qualify [kw]
  (keyword xml-ns (name kw)))

(qualify :feed)
;; => :xmlns.http%3A%2F%2Fwww.w3.org%2F2005%2FAtom/feed

(-> (load-xml-resource sample-resource-path)
    (first-child-of (qualify :entry))
    (first-child-of (qualify :content))
    :content
    first)
;; => "<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>"

Was this something akin to what you were looking for?

Teodor

2 Likes

Hello! Well, that’s just how I saw in Clojure docs about XML. I’m using Clojure 1.10.0.

In this REPL also works.

Very clever @teodorlu! Even though I marked @xfthhxk as the solution because I tested it first, your solution works perfectly as well.

Thank you!

I like to use Jsoup for this kind of thing. Here’s how I’d do it for your example. Because there’s only one <content> tag you can jump straight to it instead of navigating the whole structure.

In project.clj add [org.jsoup/jsoup "1.11.3"] to dependencies.

In src file:

(ns my-project.xml-parse
  (:import [org.jsoup Jsoup]))

(defn decode-xml-response-v2
  [xml-string]
  (-> (Jsoup/parse xml-string)
      (.select "content")
      (.text)))

(comment
  (decode-xml-response-v2 example-xml-string))
  ; => "<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>""<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>"

I think the intended use case of Jsoup is html, but it works for xml too. I like it because the syntax is very similar to css selector syntax which I’m already familiar with from web development.

More jsoup examples here (not my article).
https://paultopia.github.io/posts-output/jsoup-is-awesome/

1 Like