Roundtrip of XML data

habruening · April 6, 2024, 12:19pm

I have problems creating XML files with apostrophs in attributes.

(let [input (java.io.ByteArrayInputStream.
             (.getBytes "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
                         <foo x=\"x'y\"></foo>"))]
  (-> input
      clojure.xml/parse
      clojure.xml/emit))

That works, but the output is incorrect. The foo node gets <foo x='x'y'/>. I read in a comment here that this problem is known for 12 years. I read in the Clojure Bug tracker that this problem will not be fixed, because emit is not part of the public API. What is the internal use of emit if the API is never meant to generate strings?

I tried the same with clojure.data.xml with the functions parse and emit-str. That works. But clojure.data.xml is much different. It does not use simple maps. That makes everything more complicated. I have to programmatically work with the data. And that works best with maps. But clojure.data.xml/emit-str does not accept the maps that are created by clojure.xml/parse.

So my question. Is there a way to convert something that has been parsed by by clojure.xml back into a string?

I have a completely other question about this. The emit function takes a writer and writes to it. Isn’t this regarded as a side effect? Is there a reason for this? I find this very unusual for Clojure.

p-himik · April 6, 2024, 12:32pm

But clojure.data.xml is much different. It does not use simple maps.

It uses records, yes.

I have to programmatically work with the data. And that works best with maps.

Records work just like maps. I think their only relevant limitation is that you can’t call them the way you can call maps, but you can still use them as arguments to keywords or get or any function that expects a collection of any kind.

The emit function takes a writer and writes to it. Isn’t this regarded as a side effect? Is there a reason for this? I find this very unusual for Clojure.

Yes, it’s a side-effect.
Because it’s pragmatic. And if you need to end up with a string, there’s clojure.data.xml/emit-str.
It’s not unusual at all, it’s everywhere in Clojure (spit, add-tap, set!, and so on and so forth). Maybe you’re confusing it with Haskell?

seancorfield · April 6, 2024, 8:08pm

clojure.xml is older, legacy XML processing code that doesn’t handle all cases and will not be updated. That’s why what you tried doesn’t work.

clojure.data.xml is more modern XML processing code that is intended to handle all cases and is actively maintained. That’s why using the clojure.data.xml functions works. But you cannot mix’n’match the old clojure.xml functions and the newer clojure.data.xml functions because, to properly support all the cases efficiently, a different data representation is used internally: the element and element* functions can be used to create those records from raw data.

habruening · April 7, 2024, 10:31am

I understand now, that the internal data representation is completely different. So I switched to data.xml.
But just one further question. The following command in the REPL prints the same as clojure.xml/parse. Why does it look identical? Never trust the REPL output?

(-> (java.io.ByteArrayInputStream.
       (.getBytes "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
                               <foo x=\"x'y\"></foo>"))
      clojure.data.xml/parse)
-> {:tag :foo, :attrs {:x "x'y"}, :content ()}

seancorfield · April 7, 2024, 4:48pm

It depends on your REPL… With a plain Clojure REPL, you can see that this produces a record:

(~/clojure)-(!2007)-> clj -Sdeps '{:deps {org.clojure/data.xml {:mvn/version "RELEASE"}}}'
Downloading: org/clojure/data.xml/maven-metadata.xml from central
Downloading: org/clojure/data.xml/maven-metadata.xml from sonatype
Clojure 1.12.0-alpha9
user=> (require 'clojure.data.xml)
nil
(-> (java.io.ByteArrayInputStream.
       (.getBytes "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
                               <foo x=\"x'y\"></foo>"))
      clojure.data.xml/parse)
#xml/element{:tag :foo, :attrs {:x "x'y"}}
user=> (type *1)
clojure.data.xml.node.Element
user=>

system · October 7, 2024, 4:48am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.