CLJ Commons: Building a formatter like gofmt for Clojure

As you can tell from the almost absurd level of configurability in zprint, I am skeptical of the overall utility of a formatter which enforces the “one true way” for formatting Clojure source. However, it seems clear that there is at least some interest in a Clojure source formatter which doesn’t have any options.

Personally, I haven’t yet found a set of formatter options I like 100% of the time for my own code, which is why I implemented the ;!zprint {} directive capability in source files – so I could sometimes change the formatting for a particular function or, more usually, a particular data definition in a file.

I have put considerable effort into creating the capability in zprint to implement the community style, and to get that you just say {:style :community}. That said, I have seen very little code in the wild that actually adheres to my understanding of the community style. But maybe the time has come for a single style.

I am currently working on an issue in zprint which involves some source code rewriting in Leiningen project.clj files, and my implementation approach would equally support modifying ns declarations, so zprint is already moving in that direction. If there are other things it can’t do, I’d be more than happy to look into enhancing it.

I don’t imagine removing the current level of configurability from zprint, but I would certainly be open to both:

  1. Creating a new style which embodies the “fixed” formatting that you all agree on.
  2. Creating a version of zprint which has no configurability but just does that fixed style.

With graalvm it runs very fast (<55ms startup), and for environments where graalvm isn’t suitable, the appcds/jvm approach works well (<900ms startup). It also runs fine in cljs.

I will be happy to do considerable implementation to help realize the goals that you have set out. Just to be clear, I’m not volunteering to play a significant role in driving the community to a consensus on what the format should look like. Not least because my personal opinions on the subject (e.g. the current zprint defaults) are moderately far afield from the current community approach, though I greatly respect the work it has taken to create the community approach.

2 Likes

Thanks @danielcompton for bringing up this idea. I am generally all in favor of having a zero-configuration formatter, with a few caveats.

I consider code a form of personal expression, and so I’ve often felt strong resistance to tools like these, but having worked on teams where four different editors are used, and doing a lot of open source, I absolutely see the value in this. Sometimes making a whole category of bikeshedding go away is extremely valuable.

I think it’s worth remembering that a tool like this will be adopted (or not) on a per-project basis. If you don’t want it, don’t use it. If you don’t agree with it, use something configurable like zprint. If you don’t want to think about it and just want the discussion to go away then pick the default.

The more projects adopt “the standard” the more benefit you get, as it lessens everyone’s cognitive overhead, and we should encourage people to use it, but it has to be ok to do things differently. I hope we can recognize that up front and not start shaming projects into adopting “standard” formatting.

I 100% agree with this. The tool should not have more rules than I can keep in my head while coding. Once it is stable and starts getting adoption it should never change its rules or add new rules.

For this I present the case of Rubocop (more a linter than a formatter, admittedly). Every version of Rubocop would add more rules. Every time you update it you need to spend time updating code that used to be just fine. It also became a pain to work with, as there was always some rule you had no clue about that you accidentally ended up breaking. Rubocop is one of the reasons I don’t do Ruby anymore, it took all the joy out of programming.

To me these things have nothing to do with formatting. A formatter should limit itself to rearranging whitespace. That’s it. If you want checks on how the language is used, use a linter.

Here I think is where it will be very hard to get consensus about. There are really two schools of thought when it comes to formatting code: optimize for readability, or optimize for smaller diffs. I am 100% on the readability side. I read code more often than I read diffs. I like my maps and let blocks aligned for instance.

This I think is an important feature, sometimes hand-formatting is preferable. Think: big literal EDN data structures. Having a safety hatch so you can selectively opt-out will go a long way towards lessening resistance.

4 Likes

This is the exact thing that should be avoided by this tool. We can’t rely on certain forms being formatted differently, as we need this tool to be “once and for all”, not something that changes and grows and evolves. Once rules are set, this is it.

I also think it’s pretty possible to alter Clojure Style Guide so that no forms has special treatment. It’ll make rules simpler and more consistent.

2 Likes

I’m not sure I follow. The let form usually has special indentation rules, where the body is not aligned with the vector:

(let [result :foo]
  (if result
    (do-something-with result)
    (do-something-else)))

If let followed other forms, the (if would align with the [result. Are you proposing that this tool should format everything the same, or are you proposing that any let-like user macros would be formatted differently from the built in let forms?

Then there’s the question of how to deal with macros like this (which you may be familiar with :smile:):

(rum/defcs component
  < rum/static
    rum/reactive
    (rum/local 0 ::count)
    (rum/local "" ::text)
  [state label]
  (let [count-atom (::count state)
        text-atom  (::text state)]
    [:div])
4 Likes

Today at the Conj @kkinnear gave an unsession of zprint and I was seriously impressed by it.

It seemed to me it provides a reasonable starting point to all the above requirements except the zero conf part, which Kim would probably happily would add, meaning he has the right attitude for such an important piece of the ecosystem IMHO. The options are there with reasonable defaults but everybody can tweak them.

I also got to see how to configure the zprint indent-by-function-style options and I am kind of sold :slight_smile:: contrary to cider’s indent based metadata the function style let’s you choose a function, say cond, to apply to your own custom function myothercond. All passed in the config map or a static .zprintrc.

I think this has an advantage over indent only config or at least we might think the two ways of customizing could happily cohexist.

So my :+1: goes to expanding on zprint (if even necessary).

3 Likes

I tried to use zprint in Calva Formatter but had to give up on trying to behave like I needed it to behave. I mentioned it in this thread, which asked for ways to make zprint to not alter newlines. It could be that zprint can meet my format-as-you-type requirements, and I just haven’t found out how. I also mentioned in earlier in this thread that I think a formatter that should support the editor use cases need to be able to consider the cursor position and the selection before and after reformatting the text. For perfomance reasons I would also wish for a way to ask zprint to only format a given range of the code as well as a ”minimal” range from the cursor.

I mentioned a few other things that the editor use cases need, early in the thread. If zprint can be made to support being used by the editors while code is being edited, I would say it is definitely a way to implement this Commons formatter, that should be considered.

I have been using zprint for years now at work projects. I really like it even when I do not agree with how it formats things because it just solves the problem of people commenting on formatting on code review.
It is not perfect though. It makes some macros illegible out of the box. There are cases that I could not find a good solution even with configuration and maybe there isn’t one like for clojure.test/are forms. We overwrite the default configuration for several forms but almost all of them are macros. Most of those are for small gains in legibility but a few are very hard to read in the default configuration. I would think that macro formatting would be a problem for having a standard way to format clojure(script) code.
Also the zprint native images are really fast unless you have a giant nested form on the file. I run those on save without a problem.

@pez I apologize for missing the thread that you mentioned. I’ve only recently started using Clojureverse, and when I searched for zprint issues the other day, I became distracted by this thread and didn’t continue searching further. I don’t know if you are still interested in exploring what zprint can do for Calva, but I will go to that thread today to respond about the specific issues that you mention there. For this thread, I want to highlight a particular issue right now:

  • We should probably make a distinction between an “indenter” and a “formatter”. An indenter preserves the existing line breaks and within those constraints it places the elements of the code on the right place in the line. A formatter ignores the existing line breaks and moves things around in order use the available space efficiently within the constraints of the formatting rules. I’m not wedded to the terms indenter and formatter, but we do need some way to discuss these two concepts. There is something of a spectrum of behavior between an indenter and a formatter, and it isn’t clear if one tool can be tuned in fine steps between these two poles. zprint is clearly a formatter, and due to a recent issue, I’m currently exploring if and how much it can move toward the indenter end of the spectrum. I honestly don’t yet know how that is going to work out. For the purposes of this thread – a “no config” Clojure source code tool – I rather thought that the goal was for a formatter, not an indenter, but maybe that is something that is yet to be decided.

@mynomoto – I’m glad you have found some value in using zprint. Regardless of whether or not zprint ends up meeting the needs for a “no config” formatter, please file issues when you find something that zprint doesn’t do well! I’ll go off and look at clojure.test/are forms, but if you have some particular examples that would be great to see. I use expectations, so I haven’t run into that.

In general, if you use zprint, please file issues for things zprint doesn’t format well! I only know how well it does on code that I try it on, and I only spend a certain amount of my time looking at other people’s code and seeing what zprint will do with it. When filing an issue, it is great if it comes with “this is what it does now” as well as “this is how it should look”. I can’t guarantee that I will be able to fix everything, though so far I’ve done pretty well. Sometimes I can just say “use this configuration”, and sometimes I have to enhance zprint in some way (often with more configuration). Thanks!

1 Like

It’s good with the distinction you make between formatting and indenting, @kkinnear. I guess that what I mean with ”relaxed mode” while typing, is that it mostly about indenting then (and aligning/justifying, but that might be a separate discussion). I see indenting as a subset of the formatter’s tasks. The formatter will also be doing indenting, right? To me it makes sense to use the same tool for both needs, I already have the situation that the tools I use have different ideas about formatting and it creates a less than stellar experience.

(Yes, I am still very interested in using zprint for Calva, btw. See you in that other thread! :heart:)

@kkinnear To be clear, I find zprint a very valuable tool. Thank you so much for that.

About clojure.test/are formatting, it should ideally depend on the first argument. That’s a template for the rest of the form. A couple of examples:

(are [x y] (= x y)  
  2 (+ 1 1)
  4 (* 2 2))

(are [x y z] (= x y z)  
  2 (+ 1 1) (- 4 2)
  4 (* 2 2) (/ 8 2))

Notice how in the second example there should be 3 forms on each line if possible because the first argument of are is a vector with 3 elements.

Thanks. Isn’t that a pain. Not only does the format vary, but I have nothing that does triples or more of things. Pairs, sure. Other counts, no. Perhaps the best sort term approach would be to somehow leave the (are ...) the same – just skip formatting them altogether. Though even that is an enhancement. I’ll give this some thought. Thanks for the insight.

Some thoughts from an old-time Lisper who sometimes gets frustrated by what he sees as people ignoring established Lisp-y practice…

These are important for me:

I think the above things would be massively useful, and I believe, or at least hope, that most people would be happy with them.

Compared to other Lisps, there’s one thing that often causes me problems: the indentation of let forms, cond forms, etc, where the syntax involves pairs of things that are not bracketed. When the second item of a pair starts on a newline, I’d like a little extra indentation (probably two spaces). IMO that would be a significant improvement on what I’m used to, and I hope it would be something most people would be happy with.

For any rules beyond that, there are probably always going to be times I would want to break them.

I do think it would be useful to have options for some things, especially when working on projects where there may be a lack of understanding of or consensus on good style.

I wonder if it would be possible to have options that could be chosen independently, but to also have a set of defined formatting “levels” that progressively define stricter and more contentious sets of options. Then people could choose different levels depending on the needs of different projects. I’d like to be able to say “we’ll go for level-N formatting” for this project rather than fighting over lots of options. There would probably be a big fight over how many levels to have and what goes into what level, but that would be a big fight done just once when designing this new formatter rather than a big fight for every new Clojure adopter or project.

EDIT:
Removed mention of line breaks, because it’s more general than that. I think by default whitespace shouldn’t be changed except to change indentation and, perhaps, to remove trailing whitespace and empty lines at the end of a file. That’s because I think it’s sometimes good to use whitespace in non-standard ways to improve readability. There could be options to do further things with whitespace.

3 Likes

Most of forms that I skip are of two categories. The first one is triples and in general those are test helpers like juxt/iota or metosin/testit and the other one is large forms that take a while to format. I think that triples have enough usage to be considered on the formatting options but they need to be combined with arg1, and arg2 at least.

Interesting that you mentioned that. I have exactly the same issue. I suggested the idea of some indentation for these cases a long time ago for the community standards — and the response was that people wouldn’t be happy with that. There are a variety of ways that people get around this confusion — blank lines being the most common, though some kind of indentation is also pretty common.

This issue was the original reason that I modified several code formatters and that led me to ultimately write zprint. zprint does exactly what you suggest by default. It can be turned off, of course. Given your statements about changing line-breaks, I suspect zprint isn’t going to be your tool of choice since it totally ignores existing line breaks, but it does indent the second of a pair of things if they would otherwise end up starting in the same column.

1 Like

I suggested the idea of some indentation for these cases a long time ago for the community standards — and the response was that people wouldn’t be happy with that. There are a variety of ways that people get around this confusion — blank lines being the most common, though some kind of indentation is also pretty common.

Interesting thought that people wouldn’t be happy with it. I can’t really see any downsides. Oh well.

I tend to use blank lines or a “;;” on a line by itself to separate things, but I don’t like either of these.

1 Like

I collected my thoughts here: http://tonsky.me/blog/clojurefmt/. I also propose a simpler formatting rules that do not rely on any custom forms formatting.

6 Likes

I like your suggestion! I think I’ll start writing multiline and/or all on different lines if your suggested style will be adopted, should look nice:

(or 
  (dog? x)
  (cat? x))
1 Like

May I also suggest removing extra spaces between forms on one line?

(+  1 2) 
;; becomes
(+ 1 2)

{:short              1
 :very-very-long-key 2}
;; becomes
{:short 1
 :very-very-long-key 2}

1 Like

Nah. I like to format let bindings and map keys this way. Easier to read

1 Like

It sounds totally wild and crazy to me at first, but having thought about it shortly it starts to make sense. I think I’ll try to make an experimental formatter for VS Code that works like this and just see what it feels like using it. (My experience with the indent syntax of VS Code is that it is not very useful for this task, but I might be wrong.)

Agree! With the addition that I think that any rules for folding the paren trail and deleting newlines should be relaxed while you type and can be applied more strictly by an explicit format command and on save.

However, your article does not even mention the paren trail nor empty lines. Does that mean you think that should not be part of the job description for clojurefmt?

Have you used the Format and Align Current Form command in Calva? (It’s experimental, but often does the right thing, IMO, (and when it doesn’t it is just an undo away to restore things.)

1 Like