Should Linux distributions ship Clojure byte compiled (AOT) or not?

I’m working on packaging Clojure tools and libraries for a Linux distribution called Guix. We compile Java programs and libraries to byte-code. The Clojure community seems to favour NOT compiling to byte-code.

  • Why does Clojure not compile to byte-code for distribution?
  • What is the technical limitation that requires this difference from Java libs?
  • Can we distribute mixed byte-compiled clojure libraries and non-byte compiled libraries in different packages?

We byte-compile Java to improve performance and to avoid the wasted resources of many users doing it. A change to NOT aot’ing by default (email thread for context) is quite controversial so I’m seeking clarity.

As a Linux distribution we have both users and developers using what we package. A user using an application is the final step of distribution (e.g clj-tools, babashka or puppet-server). A developer using what we package will use our libraries (and their own) to put code into production (e.g. clojure-data-csv, clojure.java-time)

We are also not Clojure experts - we package from many development communities :slight_smile:

Resources I found that address this area - but none address the situation of a ‘intermediate distributor’ :

  • Deploying AOT compiled code
    This is for someone who’s deploying to final production, not an intermediate distributor like a Linux distribution.

  • Problems with Clojure AOT compilation: summarises that the ABI is unstable so " code compiled with Clojure version X is incompatible with code compiled with Clojure version Y". But, I couldn’t find any public reference to WHY this is the case - or official statement from Clojure to specify this.

@alexmiller, @seancorfield, @didibus, @thheller - would appreciate your comments if possible!

2 Likes

Since you summoned me: I’ll repeat that libraries should not be AOT compiled. Tools can be.

For all the reasons already outlined in your email thread. I don’t really have anything new to add.

1 Like

@thheller really appreciate you commenting - thank-you (and sorry for the summoning). Maybe one additional thing you could answer that I don’t understand:

  • Is it possible to mix byte-compiled Clojure libraries and non-byte-compiled Clojure libraries?
    I’m thinking of a situation in Guix where one library package has been byte-compiled, and a different library has not been byte-compiled. I’m unclear if the two libraries can be used together or not.

  • What is the underlying cause of Clojure not being able to guarantee the ABI?
    The packagers who come from a Java background, are used to byte-compiling packages. So it’s unclear why Clojure is different.

They can only partially. You quoted the answer to this. AOT compiled classes can only load other AOT compiled classes. If one transitive dependency is missing those it’ll fail. A non-AOT library can load both, but once AOT always AOT.

The compiler and runtime are built arround the REPL. Meaning at any point you are free to re-define almost everything at runtime. Loading a file is exactly identical to eval’ing every single clojure form in the REPL manually. Producing the exact same thing byte for byte just hasn’t been a priority, as far as I can tell. Achievable sure, just not with the current compiler.

Let me ask you back: What value do you see in “repackaging” library packages from one package manager to another? And how would anyone even use it?

Let’s assume the user downloads your linked clj-tools package, which then uses deps.edn to resolve further dependencies. It’ll always get them via maven repositories, not yours. Same for all the other “tools” that know how to build a classpath (e.g. shadow-cljs, lein). So, why even go through the ordeal of packaging them in the first place?

Tools are fine, that is useful and can be AOT compiled. Intermediate libraries not so much, unless of course you also offer a way to build a classpath out of those and run a JVM with that? Effectively replacing clj-tools? If you want to build a “custom” guix ecosystem you are free to do so of course, but why?

3 Likes

I read this thread earlier and was waiting for a time when I could write feedback, and this is exactly what I wanted to write.

1 Like

I think Guix is great, and I am actively exploring it to see how I would replace my current dev workflow based on toolbox (the Fedora container thing). As part of that I have needed to upgrade the Maven package, and was surprised to see that it seems to be byte compiled using an Ant script, even though the Maven team removed that way of building Maven in favour of a bootstrapped approach. This surprised me. I am not trying to turn this into a discussion of Guix with this comment, but rather to highlight that I think maybe (some?) packagers are making things harder than they should be.

Thinking about it, I might see why you want the individual dependencies to be native Guix packages, since that would allow sharing them across multiple tools through the Guix store. So if your goal is to treat them only as dependencies for building packages for other tools, that might make sense in a Java context. However, many tools in the Clojure ecosystem are designed to auto update by pulling a new jar-file on request, so there is a mismatch between the Guix way of thinking about packages and how much of the JVM ecosystem works. Maybe?

You could also argue that packaging a JVM library for Guix could amount to downloading the library jar from the repository where it is released and putting that in the store. That would allow you to interop more transparently because the jars can be byte-compiled or not and you would not have to care. It still leaves building the classpath for other packaged applications though, which may or may not be difficult.

Is it worth the effort, when what you could do instead to package a JVM-based tool using e.g. clj-tools is to use clj-tools to pull the dependencies and wrap the result in a Guix package? I don’t know; that depends on the level of control Guix wants over the packages, and the vision of the project. I don’t have any involvement in Guix so I can only provide the perspective of a potential user, and from that perspective I don’t think it is worth repackaging everything in every language ecosystem.

1 Like

That’s great, I’m looking forward to the Guix integration.

Here is an example I came across recently:

This Clojure namespace dynamically creates new functions when it is loaded. I created a checkpoint of the JVM. During this phase non of the Security providers are registered. Therefore the dynamic functions were missing, which broke other parts of our code. Like others already said Clojure can be very dynamic, therefore it’s not possible to AOT compile all Clojure code.

1 Like

Since you also summoned me…

I cannot understand the obsession Linux package manager folks have with trying to “package the world” from scratch. The JVM world simply doesn’t work like that – and there is a perfectly good (and well-trusted) repository of libraries already that avoids all this duplication.

I can see some benefit in a distro packaging the clojure / clj CLI so that it’s available via the distro package manager, but these things always seem to lag behind the official installer and that then leads to all sorts of confusion when people install the CLI via the package manager and then can’t get a lot of stuff to work because it’s an old version. For a while there was a rogue clojure package on at least one distro that was completely unrelated to the official CLI (and did not work at all the same way).

Packaging up the Clojure CLI also means that every package manager maintainer is a) duplicating work since there’s a perfectly good Posix installer script for the CLI already and b) setting themselves up for a lot of future maintenance tracking all the new versions of the CLI. You link to the Guix clj-tools package: that is almost 18 months out of date – see Clojure - Tools Releases for all the subsequent releases. Packaging old versions and not keeping them maintained is actively counter-productive and causes more problems for developers than it solves. Packaging those other libraries is even more counter-productive since that’s just not how they get used in the JVM world.

The question in the subject line doesn’t even really make sense. The Clojure core libraries on Maven are already AOT-compiled but pretty much all other Clojure libraries (even the Contrib stuff on Maven) are all source code by design. AOT-compiling Clojure libraries is an anti-pattern.

Even the core AOT-compiled library isn’t repeatable, I believe, which is something some of the package manager folks seem very concerned about. It’s a non-goal. The only reason people AOT stuff in Clojure is for improved startup speed so, aside from core itself, only applications tend to be AOT’d (not libraries).

3 Likes

This specific part is why I proposed changing the default in Guix.

All the modern language ecosystems and tools that “just” download new versions of libraries for the developer (e.g. Rust, Node/JS, Clojure).

It means IF the distribution has libraries for a language ecosystem in it, Developers might use it in a few ways:

  1. Developer installs tools from the distribution and mixes libraries
    For Clojure install Clj from Guix. Guix’s tooling provides some libraries. The developer installs the rest from Clojars. As I understand it, the libraries coming from the distribution would have to be source (not byte-compiled) so that they can be used correctly.

  2. Developer installs tools from outside the distribution, and mixes libraries
    This is the same as (1) from the perspective of library usage.

  3. Developer installs tools from distribution and all libraries come from the language’s tools.
    Install clj from Guix, and use clj-tools to download from Clojars. This is what you’re pointing to.

  4. Developer installs tools and libraries form the language’s tools.
    Perhaps the ‘normal’ way these days if you are coming from a specific language ecosystem.

  5. Developer installs all tools and libraries form their Linux distribution (or general package manager)
    This is how distribution developers and package management developers think of the world.

It follows that: IF a Linux distribution (or general package manager) is going to have libraries, they have to be usable in a mixed situation (1) and (2) above: presented a version of this on the Guix mailing list. If the libraries are not mixable, then they won’t be used since the easiest path for a Developer will be (4).

Which is why I believe the byte-compiling Clojure libraries is a problem as it makes them unusable in (1) and (2) from what I understand.

Ultimately, whether someone prefers ‘language specific tooling’ or ‘packages and environments across multiple language ecosystems’ will come down to their situation. These are two different types of tools really.

Guix is like Nix. Our tools can create an environment, a docker container, or a whole system that is guaranteed to be reproducible. Given the same inputs it’s always the same output. This can be applied to packages, services and configuration - It’s the same functional DSL to configure the system and all software. It’s the same to deploy to containers, VM’s or multiple systems - whether you’re setting up your laptop, or deployment a HPC system. Finally, it’s the same tooling and workflow across every language ecosystem.

Some way this can be used:

  1. Simplify development environments: code, tools and utilities that are the same and reproducible across every language ecosystem
  2. Deploy whole environments (system, databases, libs and code) using a single, transactional system
  3. Have security through a guaranteed supply-chain with SBOM capabilities
  4. Guarantee reproducibility of environments for things like science or regulatory reasons

I know I’ve abstracted way above the ‘why package libraries when your distro won’t have them all’. I’m not trying to convince you that any particular approach is better. As I am familiar with Clojure it felt natural to look at it!

And, I know I’ve hand-waved through the whole ‘changing the classpath’ question.

1 Like

Hi Sean,

Thank-you. I appreciate the view points of Linux package managers and Clojure developers are very different. That’s actually the bridge I feel I’ve failed to get across which is why I’m asking for clarity from Clojure developers who are much more knowledgable than me.

I’ll happily address why Guix might be useful - but before the thread spins off what I really need is clarity on.

  1. What is the Clojure team/community position on Linux distributions byte-compiling libraries. I think you’ve given that in this:

If you are not a Clojure developer and deal with many different language communities then it seems ‘odd’ that Clojure is not byte-compiling while Java does. I have been unable to clearly answer the questions of other packagers in Guix as to:

  1. Why?

As I said - I’ve done my best explaining - I linked the thread - but I’m not being clear enough - so looking for help and context which is being read by people that don’t “do clojure” full time.

In both cases, libraries come from an existing, trusted repository – they are opaque as far as the host O/S and any package manager should be concerned. And there are plenty of languages for which libraries are distributed in source form – where they are either interpreted or compiled-on-demand when the source is loaded.

The “must compile everything” thinking is very outdated: it’s the thinking of 80s and earlier. I started with assembler, COBOL, C back in the 80s. Yes, everything had to be compiled to some sort of bytecode (or binary) and distributed as compiled libraries and then linked together to make “programs”. I was a compiler writer myself back then.

Nowadays, programming languages pretty much all have tools that handle dependency management and there are centralized repositories that can be used across pretty much any environment – even macOS and Windows :slight_smile: Linux distros should not be doing this.

Your linked email thread says:

Guix’s clojure-build-system turns on AOT compilation by default. I would like
to advocate that ‘as a distributor’ we should not ship Clojure code AOT’d, so
we should change the default.

I would go further: “as a distributor” you should not be shipping Clojure libraries at all, nor any other JVM libraries. It’s a completely wrong-headed approach.

And you should only be shipping the CLI tooling if you commit to keeping it up-to-date on every distro that decides to ship it and that just doesn’t happen. Therefore, “distributors” are actively making things worse for developers in this regard.

2 Likes

You sort of skipped the real #1, which is to create a system that can actually do that?

Doesn’t matter one bit if the distribution provides some packages, byte compiled or not, if the common tooling doesn’t use them?

This isn’t C, where some application you install might be dependent on some .so library provided by the system in some way. I even find you arguments for Java pre-built libraries a bit weird, since the last time I checked most Java projects also use mvn for package management.

As a tool author I welcome others trying to build “better” tooling, so you are absolutely welcome to try.

I second this. Getting this up to date is far more relevant and beneficial, than a distro offering some pre-built (or not) libraries, which the official tools can already get themselves.

2 Likes

I’m reading over more of the linked email thread. Maxime says:

This reasoning does not follow – yes, it is tied to the Clojure version, so what? Guix automatically rebuilds dependents when the dependency (in this case, the Clojure compiler) changes.

A developer can pick any version of Clojure to use – they specify it in each project – and libraries generally work with a wide range of Clojure versions. It’s common for projects to include multi-version testing where they specify which versions of Clojure to test against but the exact same versions of the libraries.

A Clojure library is not tied to a specific version of Clojure and developers often update library versions and Clojure versions independently of each other.

In Maxime’s world of “rebuilding”, every version of every Clojure library would need to be built (AOT’d) with every version of Clojure and they would all need to be available in Guix… That’s… just not a sane approach.

1 Like

Agreed that, that approach would not be sane. But, why does a Clojure library have to AOT compiled for each version of Clojure it’s used with?

In Java Guix byte-compile’s (AOT) the library and it works with every version of Java that we have in the archive. Since distributions byte-compile Java libraries, what is different about Clojure (the technical reason) that means it’s not possible - how can that be explained? If you’re used to languages other VM-based languages where you can byte-compile, then Clojure not being able to do so looks like a bug.

As seen here, the Core team has never prioritized making the compiler produce deterministic bytecode.

I don’t have an answer for Clojure.

Generally, Guix is able to install libraries into a system directory and point the language’s compiler/interpreter at that location.

If you are a Guix person and you use Python, we put the Python libs that you install through Guix into a directory that’s part of site-packages. As a Python developer I can either choose to use those libraries or I can use tools from my language community (e.g Pip and env) to ignore the libraries that came with the ‘system’.

Why shouldn’t it be possible to do the same thing with Clojure?

I agree that if you’re shipping tools they need to be up to date. That was what I thought was going to take a couple of hours when I started looking at this last week :rofl:

That’s a great link @NoahTheDuke - Thanks!

1 Like

Scala is another language that doesn’t promise bytecode compatibility across different versions. The 2.7 to 2.8 migration didn’t even maintain compatibility across milestone builds of a single release – the entire tool chain had to be rebuilt for each new milestone build, including libraries. 2.9 wasn’t much better. We abandoned Scala at work at that point.

I’m sure there are other examples even on the JVM. If anything, Java is the odd one out since they have tried to promise this compatibility (they haven’t always been successful but they do pretty well).

Clojure tries very hard to provide source compatibility but not bytecode compatibility across versions.

Clojure’s compiler is part of its runtime – even in production with a fully AOT’d application you can still run a REPL and add new code, which is compiled on-the-fly from source as you enter it. In 1.12, as long as you have the CLI installed, your can even add new dependencies on-the-fly and it will fetch them from Maven/Clojars and load them into the running JVM (and compile them from source if necessary).

1 Like

Hi everyone,

Since I have a good amount of Clojure experience, and enough Guix experience to be dangerous, I thought I’d take a swing at responding broadly to some of the points raised here.

In short: instead of dealing with multiple language-specific tools to set up dev and build environments, using Guix means you only need one.

At my work, we have a bunch of projects which are written in Clojure; with IaC definitions written in Python; which get deployed to AWS with CDK. In order for this to work, you need:

  • NVM, to install the right version of Node (CDK is implemented in NodeJS, and runs on Node, even if your IaC isn’t JS).
  • NPM to install the CDK CLI and its dependencies.
  • pyenv, to install the right version of Python.
  • Poetry, to install Python dependencies for the project IaC.
  • OpenJDK.
  • Leiningen, to install Clojure & Java dependencies, build the code, etc.
  • Homebrew, to install the language tooling, so it can install the

Basically, every language’s tooling is solving extremely similar problems, but only within the narrow scope of that one language. Now, I understand why this is so, and the tools can be nice within the scope they’re applicable, but the complexity — particularly when these systems combine & overlap — leaves much to be desired.

For example, Poetry clobbers a bunch of the environment when you activate its venv, so you learn the hard way that to get a working dev environment, the correct order is Poetry, then NVM, then aws sso login, because stuff breaks in inconsistent and non-obvious ways if you don’t.

For another example, the versions of the Node CDK CLI and the Python CDK SDK need to match, otherwise the SDK creates a version of the output the CLI doesn’t understand how to process.

Guix solves these problems by, essentially, giving you a system-wide virtual environment. You can write a manifest for a project saying that it needs OpenJDK 11, and Node 18, and Clojure 1.11.1, and left-pad 5.4.99, and then guix shell -m manifest.scm and you get a subshell where those are installed. And if another project needs OpenJDK 17 and Node 21, you can handle that the same way, and the concerns of the other project just don’t apply, because nothing is installed “on the system” like it is with other distros.

Just to connect this up to a mechanism you’re probably already familiar with: When Leiningen downloads a project’s dependencies, they all land in ~/.m2/repository. This can contain any mix of artifacts, different versions of the same library, etc. And this isn’t a problem, because Leiningen knows which JARs a project needs and constructs a classpath which points at them. Guix is the same idea, applied to the whole system: everything goes into /gnu/store, and guix shell changes the environment so (I’m grossly simplifying here, but this is the essence) $PATH points to the right locations, such that java runs the desired version within a given context.

There are, of course, other tools addressed at this same problem space; asdf seems to be a popular one. I haven’t used it, so I can’t compare it; I’ll just note that it’s yet another third-party addon, and it still relies on OS-level packages in order to work, so it can’t be a single-system solution in the way Guix (when run as a complete distro) is.

While it’s certainly possible to use language-specific tooling, it’s disharmonious with the way things are supposed to work in Guix, and forfeits the most compelling features. For Guix to manage projects in the way I outlined, it needs to install all required tooling, libraries, etc. To do that, it needs Guix packages.

I can’t say I’m always satisfied with the Guix journey, but the destination is pretty nice.

Adding,

Isn’t the REPL the most common usecase where mixing AOT (libraries) and non-AOT (whatever you’re hacking on) is likely to cause issues? I’ve seen some other situations mentioned where mixing AOT/not can be a problem, but haven’t seen mention of REPL development. I’m not sure if it’s not a problem there, or got overlooked.