Resources for learning how the Clojure compiler works?

Hi, I’m starting a low-level study of how the Clojure compiler works, and I want to be open to writing about it as I go if it makes sense. I’m reading through the compiler’s core.clj and Compiler.java of course, and I know that tools.reader/analyzer/emitter are also available for study. But I’d like to also read or watch anything available about compiler internals that others have shared.

I found what looks like an old wiki for design docs here:
https://archive.clojure.org/design-wiki/display/design/Home.html

And I really enjoyed this low-level guide here:

Are there any more resources out there like this? Thank you so much!

4 Likes

Not many. There are a few old talks that are probably useful:

In general, the compiler is actually less complicated than you may think based on the quantity of code. It’s given Clojure data (supplied by the reader), analyzed into expressions (all the nested classes in the Compiler), then each of those is emitted into bytecode. The structure of each of those nested classes is almost the same - you’ll see a Parser (given a form, make one of the class exprs), an eval if it’s possible to run this without compilation, and an emit() method which gen’s bytecode into an asm context. The bytecode expansion is usually pretty straight forward - there is very little optimization done, this is really more of a translator.

tools.analyzer is basically the same thing but uses nested classes instead of Java classes to represent the expressions.

7 Likes

Thank you, Alex for your rundown and for the video links! I just watched them— exactly what I was looking for.

Daniel’s talk is fun. He motivates the deep-dive as a way to discover the “magic” behind Rich Hickey’s ant demo (all chalking up to these humble things called Vars). It’s full of good related nuggets that are squeezed into the talk, and the end is a fun historical picture of the genesis of the “direct linking” feature I believe.

Gary’s talk is a great explanation of all the side effects of eval’ing a namespace, and how the *__init.class file is created to try to perform all the same side effects as a full eval. It seems there can be a disparity between loading that vs a full eval though sometimes [1][2], which I’ve also experienced but haven’t been able to isolate well enough yet. This talk is a great starting point though:

Related to bytecode caching— Alex, I noticed you said this a few years ago:

The place that is the most interesting to me is in reducing the time per-ns and per-var to load code… If code is not AOT compiled (which is where we live at dev time), then we need to look at the time to read, compile, and load the code. There might be opportunities to cache certain parts of this process (given that most of our code is identical every time we load it).

We have a large codebase where 68% of our 40s startup time is spent in clojure.lang.Compiler.analyze*. Maybe a naive (but non foolproof) idea for faster dev reloads is to cache the analyzed expression of each top-level form in a namespace, just using source text— is this what you meant? Still trying to get my head around what might be possible.

4 Likes

I’ve worked on several spikes related to this over the years. I’ll summarize by saying… it’s tricky. :slight_smile: The last time we worked on this we decided the mechanism to compile and cache already existed … aot! And we wrote Clojure - Improving Development Startup Time to help people use it for dev time work. Generally, I find that many people have been able to use that technique to get a big percentage of the effects (by precompiling their deps, which aren’t changing) without any of the fragileness of the approaches I’ve looked at.

2 Likes

This talk has some insights

1 Like

Here’s a map I made of the expression classes from Compiler.java. It’s pretty small:

SYMBOL         INTERN                EXPR CLASS   SUBCLASSES                      INTERFACES

DEF            def                      DefExpr
ASSIGN         set!                  AssignExpr
THE_VAR        var                   TheVarExpr
IMPORT         core/import*          ImportExpr
DOT            .                       HostExpr > FieldExpr  > InstanceFieldExpr  (AssignableExpr) (MaybePrimitiveExpr)
                                       HostExpr > FieldExpr  >   StaticFieldExpr  (AssignableExpr) (MaybePrimitiveExpr)
                                       HostExpr > MethodExpr > InstanceMethodExpr                  (MaybePrimitiveExpr)
                                       HostExpr > MethodExpr >   StaticMethodExpr                  (MaybePrimitiveExpr)
MONITOR_ENTER  monitor_enter        UntypedExpr > MonitorEnterExpr
MONITOR_EXIT   monitor_exit         UntypedExpr > MonitorExitExpr
THROW          throw                UntypedExpr > ThrowExpr
TRY            try                      TryExpr
NEW            new                      NewExpr
IF             if                        IfExpr                                                    (MaybePrimitiveExpr)
INSTANCE       core/instance?    InstanceOfExpr                                                    (MaybePrimitiveExpr)
FN             fn*                      ObjExpr > FnExpr
DEFTYPE        deftype*                 ObjExpr > NewInstanceExpr.DeftypeParser
REIFY          reify*                   ObjExpr > NewInstanceExpr.ReifyParser
DO             do                      BodyExpr                                                    (MaybePrimitiveExpr)
LETFN          letfn*                 LetFnExpr
LET            let*                     LetExpr                                                    (MaybePrimitiveExpr)
LOOP           loop*                    LetExpr                                                    (MaybePrimitiveExpr)
RECUR          recur                  RecurExpr                                                    (MaybePrimitiveExpr)
CASE           case*                   CaseExpr                                                    (MaybePrimitiveExpr)
                                    LiteralExpr > KeywordExpr
                                    LiteralExpr > NumberExpr                                       (MaybePrimitiveExpr)
QUOTE          quote                LiteralExpr > ConstantExpr
                                    LiteralExpr > NilExpr
                                    LiteralExpr > BooleanExpr
                                    LiteralExpr > StringExpr
(Collections)
                                      EmptyExpr
                                       ListExpr (unused)
                                        MapExpr
                                        SetExpr
                                     VectorExpr
                                       MetaExpr
(Resolved symbols)
                                        VarExpr                                   (AssignableExpr)
                               LocalBindingExpr                                   (AssignableExpr) (MaybePrimitiveExpr)
                              UnresolvedVarExpr
                                MethodParamExpr                                                    (MaybePrimitiveExpr)
(Calls)
                              KeywordInvokeExpr
                               StaticInvokeExpr                                                    (MaybePrimitiveExpr)
                                     InvokeExpr

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.