scraper.npr> (defn comic-titles [n] (let [dom (html/html-resource (java.net.URL. "http://xkcd.com/archive")) title-nodes (html/select dom [:#middleContainer :a]) titles (map html/text title-nodes)] (take n titles))) #'scraper.npr/comic-titles scraper.npr> (comic-titles 5) UnknownServiceException no content-type java.net.URLConnection.getContentHandler (URLConnection.java:1241)
This is the documented example from enlive, https://github.com/clojure-cookbook/clojure-cookbook/blob/master/07_webapps/7-11_enlive.asciidoc. As you’ll find, the problem is that the target URL resource doesn’t include a content-type in its header, which breaks Java.URL.getContent(). You’ll note that this works on compliant pages like google.com.
This is a really stupid error and doesn’t seem to be a problem in any non-java languages I know of; I just want the HTML! How have folks got around this?