I don’t have all the answers here, but some thoughts:
It’s understandable that Codox requires all source files given the dynamic nature of Clojure, but this opens this poses some serious sandboxing challenges. How long before someone uploads a jar that, when required, starts mining bitcoins?
The vast majority of libraries are well-behaved enough that they could be analyzed statically to pull out the docs. Presumably it should be possible to feed that into codox’s generation backend. Not saying this is necessarily the best way to go, but I wouldn’t rule it out, as it would simplify things a lot.
I don’t think it’s necessary to eagerly generate all docs for all packages and all versions, instead make it lazy. Whne someone request package x version y for the first time show a message saying “your docs will be ready in a minute”, and then fetch and generate the docs in the background.
Being able to read people’s Codox config is pretty important, as it specifies which format the docstrings are in (e.g. Markdown), and also lists extra files to be included (e.g. doc/GETTING_STARTED.md
). Either we read those from lein.clj or we create some standardized way to include this metadata in a jar, so going forward library authors could use that.
I think it’s fine to start with a subset that we can easily handle, e.g. projects with a Github link on Clojars that use leiningen or boot. That should already cover a pretty large amount of libs, you can deal with other cases later.