CLJ Commons: Building a formatter like gofmt for Clojure

Elixir has also recently introduced a standard formatter in core. See https://hexdocs.pm/mix/1.6.0/Mix.Tasks.Format.html

1 Like

I think this is where the CIDER indentation specification comes into play.

On our team we have Cursive users and a Vim user. There’s some differences in how code is formatted, which sometimes causes pointless git diffs and tiresome discussions. It’d be great to have a tool that Just Works for any editor, and that prevents the seemingly inevitable bikeshedding about formatting on teams that uses several editors. I would be more than happy to trade control for consistency when it comes to code formatting.

I like the idea of starting with something simple like ns formatting. There shouldn’t be too much controversy surrounding those rules. There’s a lot of infrastructure and integration with editors and plugins that need to be written at the same time.

3 Likes

I think that the source code layout section of bbatsov/clojure-style-guide is a very good resource. It shows that there are options. I think that these options should be used judiciously.

I would welcome a tool that does the same as Emacs/CIDER: reindent, remove trailing whitespace.

I would accept removing newlines before closing parentheses/braces/brackets.

I am wary of removing/adding other newlines. Allow the user to put the newlines, then indent rigorously.

gofmt goes too far.

2 Likes

I’m extremely wary of any attempt to establish universal canonical formatting, for the simple reason that the rules for these tools inevitably accrete and never shrink. I don’t think I’m unusual in that I generally follow orthodox Clojure formatting, but I differ quite a bit from Stuart’s style guide, and in a few specific cases I use whitespace in slightly idiosyncratic ways. I don’t want some far-away format committee-dictatorship deciding that those are unacceptable and I need to conform. If a client or my team decides, fine, no problem, that’s a different story.

As an analogy, I use lein-bikeshed and love that Bozhidar established the Clojure Style Guide, but with both of them I totally agree with 95%, but I find that the remaining 5% is unconvincing overreach and I never want that 5% imposed on me.

My vote is to dampen our enthusiasm at making a lot of universal rules. The formatter should stick to the very few rules that are truly universal, and the rest (like Stuart’s style guide or really any of the individual questions in the OP) should be configuration options that need to be explicitly flipped on.

9 Likes

Oh, wow, this is awesome. I am very interested in trying to help the Community Guidelines influence more code out there. There are so many things I want to comment on, but most of it will have to wait.

However, let me +1 on the importance of speed and ability to integrate into editors.

You say:

I would say that this is really important in the editor case, especially for matters of speed with format-as-you-type. For this Calva Formatter only reformats the current enclosing, list-ish, form. Since I want this formatter to help the Community Standard, I use cljfmt as the formatting engine. Cljfmt isn’t really meant for this to begin with so does not have the super performance that is really needed, making this minimal-range formatting extra important, but I think it will always be needed for large files.

Other things needed by the editor integrations are:

  • That the formatting rules can be relaxed when used as you type. (Entering a newline in a paren-trail, moving some brackets down, should not immediately cause them to fold up again, for instance.)
  • That the formatter can take the current cursor position and selection into account, and inform about where the cursor should move on the reformatted text. Todays editors with multiple cursors and selections makes this extra interesting, but we can start with the single cursor/selection case.
  • The formatter should offer some low-level API to it’s AST/zipper or whatever it uses so that the integration doesn’t need to parse the text itself for figuring out things that the formatter already has figured out. (I think I sometimes have three parellell ASTs and allocations of the entire buffer text.)

There’s more on my mind regarding this, but I’m short of time and will have to return to this. Again thanks for picking up this torch!

No, one more thing. I really hope this formatter can be made available on Clojars for consumption by ClojureScript programs, because that’s what Calva needs. Zarro startup times and no managment of extrenal processes, please. :slight_smile:

There has been a suggestion to use fipp but it specifically says on the README:

Fipp is great for printing large data files and debugging macros, but it is not suitable as a code reformatting tool.

With a link to the explanation why – in essence, fipp wants to maintain linear time complexity.

Counter argument is prettier that has a handful of config options but has still seen very wide adoption in the js community.

1 Like

Would be very interested in hearing what @colinfleming thinks about this since ideally it would be a part of Cursive. Otherwise I guess it could be a separate cursive plugin altogether.

1 Like

Way into this idea, I think there’s a strong value proposition in making static analysis more reliable and decoupling the format for ‘code at rest’ and ‘code in an editor’.

1 Like

I would use this thing, hands down. I really like the no (or very limited) configuration idea. zprint is cool, but yeah the config options are overwhelming.

1 Like

Yes!
Keep the number of options low, if any.
Whatever the default is will be suboptimal for most users and that is okay.
The value of keeping the code consistent between persons and editors is more than some small formatting gains.

The idea would be to go even further than those tools in formatting. One good example would be formatting and reordering namespaces so all of these namespaces would be reformatted to the same thing.

(ns test1
     (:require [clojure.edn :as edn]
               [my.app :as app]
               [clojure.java.io :as io]))

(ns test1 (:require [my.app :as app] [clojure.edn :as edn] [clojure.java.io :as io]))

(ns test1
(:require [clojure.edn :as edn] [my.app :as app])
            (:require [clojure.java.io :as io]))

;; All format to this (for example) =>

(ns test1
  (:require [clojure.edn :as edn]
            [clojure.java.io :as io]
            [my.app :as app]))

I’ve updated my original post, but I should have made this clearer, I think zprint is a great tool and is a good candidate as a starting point to build a more opinionated tool on. We would need to investigate the different contexts that this tool needs to run in and whether zprint is suitable in those spaces, but it’s definitely a front-runner in my mind.

This is going to be a purely optional tool, I don’t imagine ever enforcing this on anybody else (nor can I even think how you’d do that). I totally get the desire for flexibility in how you format your code, and it seems like this kind of tool is not something you’re after. However, I do see people in many programming language communities who enjoy the constraints of having a tool make formatting decisions for them, even if those decisions are suboptimal. That’s the space that I’m targeting here.

This is a really key point. In VS Code I have “auto format on save” and “auto save on switch window” set. The only issue I have with with this is if I am partway through a sentence in Markdown and have typed a space, when I switch away to check something and switch back, my space is gone. Slightly tangential to your point, but I think being editor aware is really important.


Another thing I just thought to check was if the Language Server Protocol has any support for document formatting, and it does: https://microsoft.github.io/language-server-protocol/specification#textDocument_formatting.

For what it’s worth, I wound up forking cljfmt a while ago and rewriting most of it. This was primarily to support some long-omitted features like namespace-reformatting, format-ignore metadata, and to address some differences I had with the common style-guide (like varying indents for macros vs functions). I never quite got time to polish it to my release standards, but may be worth checking out for an extended cljfmt featureset.

It doesn’t (currently) meet all of the goals described here, and in particular I think exploring Graal as a native-image compiler has a lot of promise for speed - something I’ve been working on in a few other tools. The zipper approach would also lend itself to other static-analysis checks like linting and idioms.

We have adopted a fork of the fork at Amperity as our standard formatter, to reasonable success, so this is definitely an open area that the Clojure toolset could benefit from.

2 Likes

Related to reordering of namespaces, it seems that cljfmt has already 2 prs that promise to do that to some extent:

That is about exactly the same API as is present in the client. It’s a bit off-topic in this thread, but I’d be interested in listening to the case for LSP for formatting in some other thread somewhere. :smile:

Great initiative! I strongly support having a standardized way to format Clojure code. For something like this to be successful, I think it should have 0 configuration options, and as much as possible, should optimize for smaller diffs.

For those who don’t know me, I develop Cursive.

I really like this idea, and I’ve argued online that I wish something like this had existed from the start for Clojure. If I ever develop my own language, it will definitely have a gofmt style thing from the start. But I suspect that that ship has sailed now for Clojure, if only because I suspect the core team would never use it for Clojure itself and because people have had time to develop bad habits :slight_smile:

I think this is a fairly common reaction, and probably more so in the lisp world which for better or worse encourages individual approaches to pretty much everything (as opposed to Java or Go, which have features and culture encouraging team development and inter-developer consistency). Here’s my favourite quote about this from Kent Beck:

We didn’t miss formatting. We are both fussy about code formatting, but almost as soon as we were constrained to what the pretty-printer (esprima) gave us, we didn’t waste any more thought on it.

Formatting is something that devs tend to think is very important, until they can’t control it and then in general as long as the output is decent they don’t care. I think the popularity of gofmt, pretty and the like is a testament to this - consistency really is more important than any individual style choice.

That said, I think there are some significant obstacles, some because this is now fairly late in Clojure’s development and some just due to the nature of the language:

  • I think consensus will be hard now. Map and let value alignment is a good example - some people value the readability it provides more than the fact that it creates larger diffs than necessary, some are the other way around. Your reaction to this may depend on the diff tool you use - whitespace-only diffs aren’t a problem for me because the IntelliJ diff tool is great at identifying them, but that’s not true for all tools.
  • Formatting Clojure accurately really requires symbol resolution, which is a hard problem. e.g. you may want to format clojure.core/defn differently from schema.core/defn but someone using Schema may have the second one referred so if you’re looking at (defn ... ) you don’t know which rule to apply.
  • Similarly, the indentation spec has one main problem for this use case which is that it lives in metadata, and thus requires you to eval code in order to format. Worse, the macro you’re using may exist in some other part of your project or a library, so you need your whole project configured in order to find the source of the macro and eval it so that you can get the spec from the metadata. I think the format spec is good and could be used, but the config for forms really needs to be provided to the formatter in some external way (and also relies on symbol resolution, as above).
  • Once you go to rearranging ns forms then it’s a massive can of worms, especially if you want to support CLJC (which is really required). This is both from an implementation and a consensus point of view. Coming up with a canonical way to rearrange reader conditionals in an ns form is a super hard problem.
  • Many of these choices affect technical aspects of the formatter, for example anything using alignment or rearranging forms can’t be purely top-down. This was linked above, but I’d encourage everyone interested to read both the linked article and the comments - they’re fascinating and frightening in equal measure.

Realistically I don’t have time to work on implementing this but I’d be interested in helping out with feedback. Hopefully the above isn’t too discouraging - even at this stage I think a tool like this would be useful.

10 Likes

I think if someone wants to make yet another code formatter, that’s great. But I don’t think that’s what I’ve been personally missing when it comes to the greater Clojure ecosystem, and the chances are very high that I don’t bother using it or even checking it out.

Something I think would be more valuable is a pre-commit hook + existing formatter and integrate that in standard lein templates.

There’s no point to these formatter without integration with source control or the compiler itself, with regards to standardization that is.

On my team, I’ve been wanting to set one up, but never bothered with figuring out how to get it integrated with our source control. And if we did, the priority would be that it should never ever fail and become a blocker for us pushing code.

1 Like

I think the tension arises with the term “canonical”. To me that implies universality and proselytizing. If that’s the case then my vote is for minimalism: cover the least and most agreed-upon subset of the formatting problem space.

Part of that is my personal preference to keep my personal preferences but another part is agreeing with Colin that this ship has sailed, which means that we’re stuck in xkcd’s “There are now 15 competing standards” situation. To me that means a tool would need to have either:

  • many strong opinions that doesn’t try to be canonical
  • few strong opinions that aims to be the minimal agreed subset, even for stubborn folks like me
  • lots of configuration options

For instance, in my mind there’s tension in the Clojure Style Guide between being canonical and what seems to be continual accretion of more opinionated (and therefore less canonical) rules. My preference is for it to stay on the minimal/canonical side so I can recommend it without caveats.

I totally get that, it’s entirely valid to target that space, and in fact I want a version of this tool…if its strong opinions aren’t too large a change from my preferences. For instance, Colin pointed out some hard parts of formatting ns, but things like alphabetizing requires should get everyone on board. But even setting aside the technical challenges and just focusing on ns, it’s instructive to consider Stuart’s rules. I mostly follow them, but when I don’t, I don’t want to be bothered:

  • I too abhor :use and think it should probably be disallowed by the formatter
  • I too discourage :refer :all…but sometimes it’s the best choice and I don’t want a formatter getting in my way for those rare cases
  • I too discourage :rename…but in some rare cases it’s the right choice
  • I agree that non-keyword use and require in ns are flat wrong and should be disallowed by the formatter
  • his rule to have a line break between :require and the first vector is not my preference and is based on an implementation detail; I’d vote to omit it
  • :as before :refer sounds like a fine rule to make, but if some contingent disagrees, then in the spirit of minimalism we should drop it or make it configurable