How to get latest XML record from the list

It turns out, the presence of heavily namespaced xml in this makes the prior answer a bit harder to munge to xml. Normally you could just call xml/emit-str on the returned element and get a formatted xml string back, but since namespaces are involved (they show up in the xml element for the entry under the map #:m{:type …} which translates to {:m/type …} where data.xml doesn’t know what alias the namespace m refers to) if you want to preserve the exact xml from the input, you need to include these aliases or bake the namespaces into the keywords in the xml element. So xml/emit-str (under data.xml 0.0.8) will complain about not knowing what alias ‘m’ means in that context. You can still trivially munge the data, you just have unknown namespace aliases that have to be provided (for every one) when you emit the xml, and data.xml v 0.0.8 doesn’t support namespaces directly (although the underlying java reader/writer does).

user=> (-> xs latest-entry second xml/emit-str)
Execution error (StreamExceptionBase) at com.fasterxml.aalto.out.StreamWriterBase/throwOutputError (StreamWriterBase.java:1662).
Unbound namespace URI 'm'

We can do one of two things: walk the xml tree and coerce all qualified (e.g. namespaced) tags to unqualified tags, which should avoid any namespacing altogether (at the expense of changing the output, e.g. eliding m: xmlns stuff) which may or may not matter, or we can upgrade to the 0.2.0-alpha which supports xmlns directly.

Using 0.2.0-alpha violates a few assumptions since the namespacing is accomplished again by qualified keywords, except this time they are fully baked into the tag:

{:tag :entry ...} ;;v 0.0.8
{:tag :xmlns.http%3A%2F%2Fwww.w3.org%2F2005%2FAtom/entry ...} ;;v 0.2.0-alpha

So we need to change tag? accordingly to compare the namespace keywords with the submitted unqualified tag:

(defn tag? [k]
  (fn [e]
    (when
        (and (not (string? e)) ;;data.xml 0.2.0 pulls in a bunch of \n strings now.
             (= (name (:tag e)) (name k)))
      e)))

An added surprise in 0.2.0-alpha, is that content may now return space items directly, so we get values like “\n” where before we had xml nodes. So we define a function to help querying instead of using the raw :content key:

(defn child-elements [xs]
  (->> xs :content (filter #(instance? clojure.data.xml.node.Element %))))

and update the other functions to use it:

(defn entries [xs]
  (->> xs child-elements (filter (tag? :entry))))

(defn entry->date [e]
  (some->> e
           child-elements
           (some (tag? :content))
           child-elements
           first
           child-elements
           (some (tag? :NEW_DATE))
           :content
           first
           clojure.instant/read-instant-date))

For completeness, the other functions don’t change but here they are, with an added function to get an xml string:

(defn ordered-entries [root]
  (->> (for [e (entries root)]
         [(entry->date e) e])
       (sort-by first)))

(defn latest-entry [root]
  (->> root ordered-entries last))

(defn latest-entry-xml [root]
  (->> root latest-entry second xml/emit-str))
(def xs
  (->> url
       slurp
       xml/parse-str))

user=> (latest-entry-xml xs)
"<?xml version='1.0' encoding='UTF-8'?><entry xmlns:d=\"http://schemas.microsoft.com/ado/2007/08/dataservices\" xmlns:m=\"http://schemas.microsoft.com/ado/2007/08/dataservices/metadata\" xmlns=\"http://www.w3.org/2005/Atom\">\n<title type=\"text\"/>\n<updated>2022-11-15T15:54:28Z</updated>\n<author><name/></author>\n<category term=\"TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum\" scheme=\"http://schemas.microsoft.com/ado/2007/08/dataservices/scheme\"/>\n<content type=\"application/xml\">\n<m:properties>\n<d:NEW_DATE m:type=\"Edm.DateTime\">2022-11-15T00:00:00</d:NEW_DATE>\n<d:BC_1MONTH m:type=\"Edm.Double\">3.77</d:BC_1MONTH>\n<d:BC_2MONTH m:type=\"Edm.Double\">4.10</d:BC_2MONTH>\n<d:BC_3MONTH m:type=\"Edm.Double\">4.31</d:BC_3MONTH>\n<d:BC_4MONTH m:type=\"Edm.Double\">4.40</d:BC_4MONTH>\n<d:BC_6MONTH m:type=\"Edm.Double\">4.54</d:BC_6MONTH>\n<d:BC_1YEAR m:type=\"Edm.Double\">4.60</d:BC_1YEAR>\n<d:BC_2YEAR m:type=\"Edm.Double\">4.37</d:BC_2YEAR>\n<d:BC_3YEAR m:type=\"Edm.Double\">4.17</d:BC_3YEAR>\n<d:BC_5YEAR m:type=\"Edm.Double\">3.93</d:BC_5YEAR>\n<d:BC_7YEAR m:type=\"Edm.Double\">3.88</d:BC_7YEAR>\n<d:BC_10YEAR m:type=\"Edm.Double\">3.80</d:BC_10YEAR>\n<d:BC_20YEAR m:type=\"Edm.Double\">4.20</d:BC_20YEAR>\n<d:BC_30YEAR m:type=\"Edm.Double\">3.98</d:BC_30YEAR>\n<d:BC_30YEARDISPLAY m:type=\"Edm.Double\">3.98</d:BC_30YEARDISPLAY>\n</m:properties>\n</content>\n</entry>"

You are getting nil because I did not specify I was using data.xml version 0.0.8 (stable), and if you pull in the 0.2.0-alpha instead, it will use namespacing (present in your xml) and change the tag keywords to namespace-qualified keywords, which quietly invalidates assumptions I had when using 0.0.8 and returns nil. See updated answer for one with 0.2.0-alpha that appears to work with xmlns.

Thats so great of you for helping me here. one last thing can we create a file out of this say for ex - (str “tycr-” (s/join “-” (filter identity [year month day])) “.xml”)})

sure why not

@joinr Sorry to tag you again. Its there a way we can remove / and /n from XML file, as presence of these makes it an invalid xml file.

I don’t think it matters. When I spit the output to an xml file, the whitespace/newlines do not impact parsing the xml (either by the web browser or by emacs):

(def xs
  (-> url
      slurp
      xml/parse-str))

user=>  (re-seq #"\n" (latest-entry-xml xs))
("\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n" "\n")

(spit "out.xml" (latest-entry-xml xs))
<?xml version='1.0' encoding='UTF-8'?><entry xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
<title type="text"/>
<updated>2022-11-16T16:07:02Z</updated>
<author><name/></author>
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<content type="application/xml">
<m:properties>
<d:NEW_DATE m:type="Edm.DateTime">2022-11-16T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">3.81</d:BC_1MONTH>
<d:BC_2MONTH m:type="Edm.Double">4.15</d:BC_2MONTH>
<d:BC_3MONTH m:type="Edm.Double">4.32</d:BC_3MONTH>
<d:BC_4MONTH m:type="Edm.Double">4.43</d:BC_4MONTH>
<d:BC_6MONTH m:type="Edm.Double">4.54</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">4.62</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">4.35</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">4.13</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">3.83</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">3.77</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">3.67</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">4.03</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">3.85</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">3.85</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>

You could also trivially just replace all the "\n" substrings in the xml string with "" (e.g. using clojure.string/replace) as a quick hack.

Apparently there is a (still) undocumented option :skip-whitespace that you can pass into any of the parse functions (like parse-str) that is acknowledged by the parser the strips them during parsing. It appears to fix this issue (and would simplify the additional cases I added in processing where there were whitespace elements included):

(def xs
  (-> url
      slurp
      (xml/parse-str :skip-whitespace true)))

user=>  (re-seq #"\n" (latest-entry-xml xs))
nil
1 Like

Thanks a lot :slight_smile:

Syntax error (ClassNotFoundException) compiling at (test_treasury_dot_gov.clj:36:29).
clojure.data.xml.node.Element

I already have this in the classpath.

doesn’t add up. push your project to a repo. There are too many unknowns to answer at this point. Must use clojure.data.xml 0.2.0-alpha to reproduce what I did.

Can we connect over some other platform like instagram?

No, I don’t use social media outside of here, reddit, ask.clojure, etc. for clojure related stuff specifically. Best option for now (for me at least) is to submit all the files you are using so that I or anyone else can look at what your setup is like, and try to reproduce the error directly. I would typically create a repository on github and put all the files there (e.g. if you are using leiningen, add the project.clj and the folders like src etc., if you are using deps.edn do the same, except include the deps.edn and any source files).

There is this mentoring setup that just posted, maybe that is more useful.

Requested help on Unstuck

@joinr But no one is responding there.


i am not able to find this dependency clojure.data.xml 0.2.0-alpha, the first dependency which is available is clojure.data.xml 0.2.0-alpha1
@joinr

leiningen coords

deps.edn coords

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.