Consider the following XML string:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">Economia e finanças</title>
<subtitle type="text"></subtitle>
<id>painel_indicadores</id>
<updated>2019-09-08T14:38:38-03:00</updated>
<category term="Economia e finanças" />
<category term="" />
<link rel="alternate" href="https://www.bcb.gov.br/" />
<entry>
<id>painel_indicadores_JUROS</id>
<title type="text">Taxa Selic</title>
<updated>2019-09-06T22:56:03-03:00</updated>
<link rel="alternate" href="https://www.bcb.gov.br/" />
<content type="html"><div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div></content>
</entry>
</feed>
I’m writing an app that parses this XML text to a map
so that I can get the content of the content
tag:
<content type="html"><div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div></content>
Currently I’m doing this way:
(:require
[clojure.xml :as xml])
(defn decode-xml-response [response-body]
(let [xml-response (xml/parse
(ByteArrayInputStream. (.getBytes response-body)))]
(first (:content (last (:content (last (:content xml-response))))))))
Then I call decode-xml-response
passing the XML string. It works as I wish, but I wonder if there is a more elegant/idiomatic way to get what is inside the content
tag from the top of the question.
1 Like
xfthhxk
September 8, 2019, 6:10pm
2
You might consider using zippers. There is an example on Stackoverflow here .
Hello!
What library are you using?
I’m asking because the import for org.clojure/data.xml is
(require '[clojure.data.xml])
Teodor
I gave it a shot using org.clojure/data.xml
. Strategy: try to make it simple to step down one level. I’d do that by asking for the first tag of a given type.
(ns th.scratch.usexml
(:require [clojure.java.io :as io]
[clojure.data.xml :as xml]))
(def sample-resource-path
"th/scratch/usexml/sample-1.xml")
(defn load-xml-resource [resource-path]
(-> resource-path
io/resource
io/reader
xml/parse))
(defn first-child-of [xml tag]
(->> xml
:content
(filter (fn [item]
(= tag (:tag item))))
first))
(def xml-ns "xmlns.http%3A%2F%2Fwww.w3.org%2F2005%2FAtom")
(defn qualify [kw]
(keyword xml-ns (name kw)))
(qualify :feed)
;; => :xmlns.http%3A%2F%2Fwww.w3.org%2F2005%2FAtom/feed
(-> (load-xml-resource sample-resource-path)
(first-child-of (qualify :entry))
(first-child-of (qualify :content))
:content
first)
;; => "<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>"
Was this something akin to what you were looking for?
Teodor
2 Likes
Hello! Well, that’s just how I saw in Clojure docs about XML. I’m using Clojure 1.10.0.
In this REPL also works.
Very clever @teodorlu ! Even though I marked @xfthhxk as the solution because I tested it first, your solution works perfectly as well.
Thank you!
I like to use Jsoup for this kind of thing. Here’s how I’d do it for your example. Because there’s only one <content>
tag you can jump straight to it instead of navigating the whole structure.
In project.clj add [org.jsoup/jsoup "1.11.3"]
to dependencies.
In src file:
(ns my-project.xml-parse
(:import [org.jsoup Jsoup]))
(defn decode-xml-response-v2
[xml-string]
(-> (Jsoup/parse xml-string)
(.select "content")
(.text)))
(comment
(decode-xml-response-v2 example-xml-string))
; => "<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>""<div id=label>Meta:</div><div id=rate><div id=ratevalue>6</div><div=ratedate>31/07/2019</div></div><div id=label>Diária:</div><div id=dailyrate><div id=dailyratevalue>5,9</div><div id=dailyratedate>06/09/2019</div></div>"
I think the intended use case of Jsoup is html, but it works for xml too. I like it because the syntax is very similar to css selector syntax which I’m already familiar with from web development.
More jsoup examples here (not my article).
https://paultopia.github.io/posts-output/jsoup-is-awesome/
1 Like
system
Closed
March 10, 2020, 5:06am
8
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.