CLJ Commons: Building a formatter like gofmt for Clojure

For what it’s worth, I wound up forking cljfmt a while ago and rewriting most of it. This was primarily to support some long-omitted features like namespace-reformatting, format-ignore metadata, and to address some differences I had with the common style-guide (like varying indents for macros vs functions). I never quite got time to polish it to my release standards, but may be worth checking out for an extended cljfmt featureset.

It doesn’t (currently) meet all of the goals described here, and in particular I think exploring Graal as a native-image compiler has a lot of promise for speed - something I’ve been working on in a few other tools. The zipper approach would also lend itself to other static-analysis checks like linting and idioms.

We have adopted a fork of the fork at Amperity as our standard formatter, to reasonable success, so this is definitely an open area that the Clojure toolset could benefit from.

2 Likes

Related to reordering of namespaces, it seems that cljfmt has already 2 prs that promise to do that to some extent:

That is about exactly the same API as is present in the client. It’s a bit off-topic in this thread, but I’d be interested in listening to the case for LSP for formatting in some other thread somewhere. :smile:

Great initiative! I strongly support having a standardized way to format Clojure code. For something like this to be successful, I think it should have 0 configuration options, and as much as possible, should optimize for smaller diffs.

For those who don’t know me, I develop Cursive.

I really like this idea, and I’ve argued online that I wish something like this had existed from the start for Clojure. If I ever develop my own language, it will definitely have a gofmt style thing from the start. But I suspect that that ship has sailed now for Clojure, if only because I suspect the core team would never use it for Clojure itself and because people have had time to develop bad habits :slight_smile:

I think this is a fairly common reaction, and probably more so in the lisp world which for better or worse encourages individual approaches to pretty much everything (as opposed to Java or Go, which have features and culture encouraging team development and inter-developer consistency). Here’s my favourite quote about this from Kent Beck:

We didn’t miss formatting. We are both fussy about code formatting, but almost as soon as we were constrained to what the pretty-printer (esprima) gave us, we didn’t waste any more thought on it.

Formatting is something that devs tend to think is very important, until they can’t control it and then in general as long as the output is decent they don’t care. I think the popularity of gofmt, pretty and the like is a testament to this - consistency really is more important than any individual style choice.

That said, I think there are some significant obstacles, some because this is now fairly late in Clojure’s development and some just due to the nature of the language:

  • I think consensus will be hard now. Map and let value alignment is a good example - some people value the readability it provides more than the fact that it creates larger diffs than necessary, some are the other way around. Your reaction to this may depend on the diff tool you use - whitespace-only diffs aren’t a problem for me because the IntelliJ diff tool is great at identifying them, but that’s not true for all tools.
  • Formatting Clojure accurately really requires symbol resolution, which is a hard problem. e.g. you may want to format clojure.core/defn differently from schema.core/defn but someone using Schema may have the second one referred so if you’re looking at (defn ... ) you don’t know which rule to apply.
  • Similarly, the indentation spec has one main problem for this use case which is that it lives in metadata, and thus requires you to eval code in order to format. Worse, the macro you’re using may exist in some other part of your project or a library, so you need your whole project configured in order to find the source of the macro and eval it so that you can get the spec from the metadata. I think the format spec is good and could be used, but the config for forms really needs to be provided to the formatter in some external way (and also relies on symbol resolution, as above).
  • Once you go to rearranging ns forms then it’s a massive can of worms, especially if you want to support CLJC (which is really required). This is both from an implementation and a consensus point of view. Coming up with a canonical way to rearrange reader conditionals in an ns form is a super hard problem.
  • Many of these choices affect technical aspects of the formatter, for example anything using alignment or rearranging forms can’t be purely top-down. This was linked above, but I’d encourage everyone interested to read both the linked article and the comments - they’re fascinating and frightening in equal measure.

Realistically I don’t have time to work on implementing this but I’d be interested in helping out with feedback. Hopefully the above isn’t too discouraging - even at this stage I think a tool like this would be useful.

10 Likes

I think if someone wants to make yet another code formatter, that’s great. But I don’t think that’s what I’ve been personally missing when it comes to the greater Clojure ecosystem, and the chances are very high that I don’t bother using it or even checking it out.

Something I think would be more valuable is a pre-commit hook + existing formatter and integrate that in standard lein templates.

There’s no point to these formatter without integration with source control or the compiler itself, with regards to standardization that is.

On my team, I’ve been wanting to set one up, but never bothered with figuring out how to get it integrated with our source control. And if we did, the priority would be that it should never ever fail and become a blocker for us pushing code.

1 Like

I think the tension arises with the term “canonical”. To me that implies universality and proselytizing. If that’s the case then my vote is for minimalism: cover the least and most agreed-upon subset of the formatting problem space.

Part of that is my personal preference to keep my personal preferences but another part is agreeing with Colin that this ship has sailed, which means that we’re stuck in xkcd’s “There are now 15 competing standards” situation. To me that means a tool would need to have either:

  • many strong opinions that doesn’t try to be canonical
  • few strong opinions that aims to be the minimal agreed subset, even for stubborn folks like me
  • lots of configuration options

For instance, in my mind there’s tension in the Clojure Style Guide between being canonical and what seems to be continual accretion of more opinionated (and therefore less canonical) rules. My preference is for it to stay on the minimal/canonical side so I can recommend it without caveats.

I totally get that, it’s entirely valid to target that space, and in fact I want a version of this tool…if its strong opinions aren’t too large a change from my preferences. For instance, Colin pointed out some hard parts of formatting ns, but things like alphabetizing requires should get everyone on board. But even setting aside the technical challenges and just focusing on ns, it’s instructive to consider Stuart’s rules. I mostly follow them, but when I don’t, I don’t want to be bothered:

  • I too abhor :use and think it should probably be disallowed by the formatter
  • I too discourage :refer :all…but sometimes it’s the best choice and I don’t want a formatter getting in my way for those rare cases
  • I too discourage :rename…but in some rare cases it’s the right choice
  • I agree that non-keyword use and require in ns are flat wrong and should be disallowed by the formatter
  • his rule to have a line break between :require and the first vector is not my preference and is based on an implementation detail; I’d vote to omit it
  • :as before :refer sounds like a fine rule to make, but if some contingent disagrees, then in the spirit of minimalism we should drop it or make it configurable

I am not going to chime in with an opinion here, but I think that the tool/tools used need to let this be configurable from an editor integration point of view. There the formatter should be ready to help pretty print evaluation results or big chunks of EDN or whatever.

I think gofmt experience is pretty clear re: configuration. There should be exactly 0 knobs. gofmt’s style is nobody’s favorite, gofmt itself is everyone’s favorite.

4 Likes

I use prettier in javascript, and I can’t imagine ever not using it now. Many, me included, actually hook it into the save command of our editors. There is some kind of evil delight in writing a badly formatted piece of code, hitting save, and having it all reformatted perfectly for you.

For people who are worried about having another’s style forced on you, I’ll direct you to Joel Spolsky’s great essay, “Choices.” Sometimes there is real tension in style decisions, but most of the time, it’s more important to make any choice and stick with it. For me, the surprising result of prettier is that virtually no style decision actually matters. For a language as complex as javascript, I only count 10 options that actually relate to style.

However, you’re never going to get a zero-config formatter in clojure because of what @colinfleming raised: namely the practice of formatting certain forms differently (e.g., defn vs let) and the need to have custom macros follow those formatting templates.

1 Like

2 small points:

Prettier started off with zero options and stayed that way for a long time. It’s what differentiated it from other JS formatters and was the main reason for its massive success.

When Prettier was started, JS was already 20 years old. It’s never too late.

3 Likes

As you can tell from the almost absurd level of configurability in zprint, I am skeptical of the overall utility of a formatter which enforces the “one true way” for formatting Clojure source. However, it seems clear that there is at least some interest in a Clojure source formatter which doesn’t have any options.

Personally, I haven’t yet found a set of formatter options I like 100% of the time for my own code, which is why I implemented the ;!zprint {} directive capability in source files – so I could sometimes change the formatting for a particular function or, more usually, a particular data definition in a file.

I have put considerable effort into creating the capability in zprint to implement the community style, and to get that you just say {:style :community}. That said, I have seen very little code in the wild that actually adheres to my understanding of the community style. But maybe the time has come for a single style.

I am currently working on an issue in zprint which involves some source code rewriting in Leiningen project.clj files, and my implementation approach would equally support modifying ns declarations, so zprint is already moving in that direction. If there are other things it can’t do, I’d be more than happy to look into enhancing it.

I don’t imagine removing the current level of configurability from zprint, but I would certainly be open to both:

  1. Creating a new style which embodies the “fixed” formatting that you all agree on.
  2. Creating a version of zprint which has no configurability but just does that fixed style.

With graalvm it runs very fast (<55ms startup), and for environments where graalvm isn’t suitable, the appcds/jvm approach works well (<900ms startup). It also runs fine in cljs.

I will be happy to do considerable implementation to help realize the goals that you have set out. Just to be clear, I’m not volunteering to play a significant role in driving the community to a consensus on what the format should look like. Not least because my personal opinions on the subject (e.g. the current zprint defaults) are moderately far afield from the current community approach, though I greatly respect the work it has taken to create the community approach.

2 Likes

Thanks @danielcompton for bringing up this idea. I am generally all in favor of having a zero-configuration formatter, with a few caveats.

I consider code a form of personal expression, and so I’ve often felt strong resistance to tools like these, but having worked on teams where four different editors are used, and doing a lot of open source, I absolutely see the value in this. Sometimes making a whole category of bikeshedding go away is extremely valuable.

I think it’s worth remembering that a tool like this will be adopted (or not) on a per-project basis. If you don’t want it, don’t use it. If you don’t agree with it, use something configurable like zprint. If you don’t want to think about it and just want the discussion to go away then pick the default.

The more projects adopt “the standard” the more benefit you get, as it lessens everyone’s cognitive overhead, and we should encourage people to use it, but it has to be ok to do things differently. I hope we can recognize that up front and not start shaming projects into adopting “standard” formatting.

I 100% agree with this. The tool should not have more rules than I can keep in my head while coding. Once it is stable and starts getting adoption it should never change its rules or add new rules.

For this I present the case of Rubocop (more a linter than a formatter, admittedly). Every version of Rubocop would add more rules. Every time you update it you need to spend time updating code that used to be just fine. It also became a pain to work with, as there was always some rule you had no clue about that you accidentally ended up breaking. Rubocop is one of the reasons I don’t do Ruby anymore, it took all the joy out of programming.

To me these things have nothing to do with formatting. A formatter should limit itself to rearranging whitespace. That’s it. If you want checks on how the language is used, use a linter.

Here I think is where it will be very hard to get consensus about. There are really two schools of thought when it comes to formatting code: optimize for readability, or optimize for smaller diffs. I am 100% on the readability side. I read code more often than I read diffs. I like my maps and let blocks aligned for instance.

This I think is an important feature, sometimes hand-formatting is preferable. Think: big literal EDN data structures. Having a safety hatch so you can selectively opt-out will go a long way towards lessening resistance.

4 Likes

This is the exact thing that should be avoided by this tool. We can’t rely on certain forms being formatted differently, as we need this tool to be “once and for all”, not something that changes and grows and evolves. Once rules are set, this is it.

I also think it’s pretty possible to alter Clojure Style Guide so that no forms has special treatment. It’ll make rules simpler and more consistent.

2 Likes

I’m not sure I follow. The let form usually has special indentation rules, where the body is not aligned with the vector:

(let [result :foo]
  (if result
    (do-something-with result)
    (do-something-else)))

If let followed other forms, the (if would align with the [result. Are you proposing that this tool should format everything the same, or are you proposing that any let-like user macros would be formatted differently from the built in let forms?

Then there’s the question of how to deal with macros like this (which you may be familiar with :smile:):

(rum/defcs component
  < rum/static
    rum/reactive
    (rum/local 0 ::count)
    (rum/local "" ::text)
  [state label]
  (let [count-atom (::count state)
        text-atom  (::text state)]
    [:div])
4 Likes

Today at the Conj @kkinnear gave an unsession of zprint and I was seriously impressed by it.

It seemed to me it provides a reasonable starting point to all the above requirements except the zero conf part, which Kim would probably happily would add, meaning he has the right attitude for such an important piece of the ecosystem IMHO. The options are there with reasonable defaults but everybody can tweak them.

I also got to see how to configure the zprint indent-by-function-style options and I am kind of sold :slight_smile:: contrary to cider’s indent based metadata the function style let’s you choose a function, say cond, to apply to your own custom function myothercond. All passed in the config map or a static .zprintrc.

I think this has an advantage over indent only config or at least we might think the two ways of customizing could happily cohexist.

So my :+1: goes to expanding on zprint (if even necessary).

3 Likes

I tried to use zprint in Calva Formatter but had to give up on trying to behave like I needed it to behave. I mentioned it in this thread, which asked for ways to make zprint to not alter newlines. It could be that zprint can meet my format-as-you-type requirements, and I just haven’t found out how. I also mentioned in earlier in this thread that I think a formatter that should support the editor use cases need to be able to consider the cursor position and the selection before and after reformatting the text. For perfomance reasons I would also wish for a way to ask zprint to only format a given range of the code as well as a ”minimal” range from the cursor.

I mentioned a few other things that the editor use cases need, early in the thread. If zprint can be made to support being used by the editors while code is being edited, I would say it is definitely a way to implement this Commons formatter, that should be considered.

I have been using zprint for years now at work projects. I really like it even when I do not agree with how it formats things because it just solves the problem of people commenting on formatting on code review.
It is not perfect though. It makes some macros illegible out of the box. There are cases that I could not find a good solution even with configuration and maybe there isn’t one like for clojure.test/are forms. We overwrite the default configuration for several forms but almost all of them are macros. Most of those are for small gains in legibility but a few are very hard to read in the default configuration. I would think that macro formatting would be a problem for having a standard way to format clojure(script) code.
Also the zprint native images are really fast unless you have a giant nested form on the file. I run those on save without a problem.

@pez I apologize for missing the thread that you mentioned. I’ve only recently started using Clojureverse, and when I searched for zprint issues the other day, I became distracted by this thread and didn’t continue searching further. I don’t know if you are still interested in exploring what zprint can do for Calva, but I will go to that thread today to respond about the specific issues that you mention there. For this thread, I want to highlight a particular issue right now:

  • We should probably make a distinction between an “indenter” and a “formatter”. An indenter preserves the existing line breaks and within those constraints it places the elements of the code on the right place in the line. A formatter ignores the existing line breaks and moves things around in order use the available space efficiently within the constraints of the formatting rules. I’m not wedded to the terms indenter and formatter, but we do need some way to discuss these two concepts. There is something of a spectrum of behavior between an indenter and a formatter, and it isn’t clear if one tool can be tuned in fine steps between these two poles. zprint is clearly a formatter, and due to a recent issue, I’m currently exploring if and how much it can move toward the indenter end of the spectrum. I honestly don’t yet know how that is going to work out. For the purposes of this thread – a “no config” Clojure source code tool – I rather thought that the goal was for a formatter, not an indenter, but maybe that is something that is yet to be decided.

@mynomoto – I’m glad you have found some value in using zprint. Regardless of whether or not zprint ends up meeting the needs for a “no config” formatter, please file issues when you find something that zprint doesn’t do well! I’ll go off and look at clojure.test/are forms, but if you have some particular examples that would be great to see. I use expectations, so I haven’t run into that.

In general, if you use zprint, please file issues for things zprint doesn’t format well! I only know how well it does on code that I try it on, and I only spend a certain amount of my time looking at other people’s code and seeing what zprint will do with it. When filing an issue, it is great if it comes with “this is what it does now” as well as “this is how it should look”. I can’t guarantee that I will be able to fix everything, though so far I’ve done pretty well. Sometimes I can just say “use this configuration”, and sometimes I have to enhance zprint in some way (often with more configuration). Thanks!

1 Like

It’s good with the distinction you make between formatting and indenting, @kkinnear. I guess that what I mean with ”relaxed mode” while typing, is that it mostly about indenting then (and aligning/justifying, but that might be a separate discussion). I see indenting as a subset of the formatter’s tasks. The formatter will also be doing indenting, right? To me it makes sense to use the same tool for both needs, I already have the situation that the tools I use have different ideas about formatting and it creates a less than stellar experience.

(Yes, I am still very interested in using zprint for Calva, btw. See you in that other thread! :heart:)