Now I’m curious, what well formed xml did it choke on?
It was nothing weird. Here’s a sanitized version of the offending markup:
<?xml version="1.0" encoding="UTF-8"?> <secret-tag FIRST-ATTR="en-US00001" attr2="XYY-Y000" attr3="XY1234567" xml:lang="en-US"> <title> <other-tag category="" id="an ID" phrase-urn="urn:secret-stuff">Some text about some things</phrase> </title> </secret-tag>
It turns out that JSoup has some HTML-specific hacks around
title tags that causes all tags nested under any
title tag to be converted into a single HTML-escaped text node, which was… unexpected.