Getting content from parsed XML

Consider the following XML string:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title type="text">Economia e finanças</title>
    <subtitle type="text"></subtitle>
    <id>painel_indicadores</id>
    <updated>2019-09-08T14:38:38-03:00</updated>
    <category term="Economia e finanças" />
    <category term="" />
    <link rel="alternate" href="https://www.bcb.gov.br/" />
    <entry>
        <id>painel_indicadores_JUROS</id>
        <title type="text">Taxa Selic</title>
        <updated>2019-09-06T22:56:03-03:00</updated>
        <link rel="alternate" href="https://www.bcb.gov.br/" />
        <content type="html">&lt;div id=label&gt;Meta:&lt;/div&gt;&lt;div id=rate&gt;&lt;div id=ratevalue&gt;6&lt;/div&gt;&lt;div=ratedate&gt;31/07/2019&lt;/div&gt;&lt;/div&gt;&lt;div id=label&gt;Diária:&lt;/div&gt;&lt;div id=dailyrate&gt;&lt;div id=dailyratevalue&gt;5,9&lt;/div&gt;&lt;div id=dailyratedate&gt;06/09/2019&lt;/div&gt;&lt;/div&gt;</content>
    </entry>
</feed>

I’m writing an app that parses this XML text to a map so that I can get the content of the content tag:

<content type="html">&lt;div id=label&gt;Meta:&lt;/div&gt;&lt;div id=rate&gt;&lt;div id=ratevalue&gt;6&lt;/div&gt;&lt;div=ratedate&gt;31/07/2019&lt;/div&gt;&lt;/div&gt;&lt;div id=label&gt;Diária:&lt;/div&gt;&lt;div id=dailyrate&gt;&lt;div id=dailyratevalue&gt;5,9&lt;/div&gt;&lt;div id=dailyratedate&gt;06/09/2019&lt;/div&gt;&lt;/div&gt;</content>

Currently I’m doing this way:

(:require
  [clojure.xml :as xml])

(defn decode-xml-response [response-body]
  (let [xml-response (xml/parse
                      (ByteArrayInputStream. (.getBytes response-body)))]
    (first (:content (last (:content (last (:content xml-response))))))))

Then I call decode-xml-response passing the XML string. It works as I wish, but I wonder if there is a more elegant/idiomatic way to get what is inside the content tag from the top of the question.

1 Like

You might consider using zippers. There is an example on Stackoverflow here.

Hello!

What library are you using?

I’m asking because the import for org.clojure/data.xml is

(require '[clojure.data.xml])

Teodor

I gave it a shot using org.clojure/data.xml. Strategy: try to make it simple to step down one level. I’d do that by asking for the first tag of a given type.

(ns th.scratch.usexml
  (:require [clojure.java.io :as io]
            [clojure.data.xml :as xml]))

(def sample-resource-path
  "th/scratch/usexml/sample-1.xml")

(defn load-xml-resource [resource-path]
  (-> resource-path
      io/resource
      io/reader
      xml/parse))

(defn first-child-of [xml tag]
  (->> xml
       :content
       (filter (fn [item]
                 (= tag (:tag item))))
       first))

(def xml-ns "xmlns.http%3A%2F%2Fwww.w3.org%2F2005%2FAtom")

(defn qualify [kw]
  (keyword xml-ns (name kw)))

(qualify :feed)
;; => :xmlns.http%3A%2F%2Fwww.w3.org%2F2005%2FAtom/feed

(-> (load-xml-resource sample-resource-path)
    (first-child-of (qualify :entry))
    (first-child-of (qualify :content))
    :content
    first)
;; => "<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>"

Was this something akin to what you were looking for?

Teodor

2 Likes

Hello! Well, that’s just how I saw in Clojure docs about XML. I’m using Clojure 1.10.0.

In this REPL also works.

Very clever @teodorlu! Even though I marked @xfthhxk as the solution because I tested it first, your solution works perfectly as well.

Thank you!

I like to use Jsoup for this kind of thing. Here’s how I’d do it for your example. Because there’s only one <content> tag you can jump straight to it instead of navigating the whole structure.

In project.clj add [org.jsoup/jsoup "1.11.3"] to dependencies.

In src file:

(ns my-project.xml-parse
  (:import [org.jsoup Jsoup]))

(defn decode-xml-response-v2
  [xml-string]
  (-> (Jsoup/parse xml-string)
      (.select "content")
      (.text)))

(comment
  (decode-xml-response-v2 example-xml-string))
  ; => "<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>""<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>"

I think the intended use case of Jsoup is html, but it works for xml too. I like it because the syntax is very similar to css selector syntax which I’m already familiar with from web development.

More jsoup examples here (not my article).
https://paultopia.github.io/posts-output/jsoup-is-awesome/

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.