CLJ Commons: Building a formatter like gofmt for Clojure


#1

In the last ten years or so, source code formatters with limited/no configuration have become popular. Go is the most well known example, shipping with gofmt, but there are similar tools in Rust, JavaScript, Python.

Clojure and Lisps in general have historically allowed very flexible formatting of s-expressions. This can aid readability, but adds a cognitive overhead for readers used to different styles. It can also be challenging to match existing source code formatting if you are using a different editor to the original author.

I believe it would be useful for the Clojure community to be able to develop (or adopt) a single source-code formatter which is able to format Clojure source code to a canonical format. The purpose of this thread is to help develop the problem space, hear from different stakeholders, and determine whether such a tool is desired, possible, and likely to be useful. It seems unlikely that 100% of the community would want such a tool, but I feel like there is enough desire for a common formatting tool that this could still be valuable.

The goals and thoughts put down here are a starting point for discussion, not the final word. I’ve been thinking about this for a while, but there are lots of other people who have also thought about this kind of thing. Many Clojurists bring valuable experiences from other language communities. I’m really interested in finding common ground to build a tool that the community can get behind.

This effort would be part of CLJ Commons, a community effort to build up the supporting infrastructure around Clojure to make a better experience for Clojurists.

Why not use “existing tool X”?

There are several existing tools for formatting Clojure source code: cljfmt, zprint, emacs, fipp, Cursive (and other editors have formatters too). Each of these doesn’t quite fit the goals I have for this project.

  • zprint is extremely configurable which is great for some use-cases, but doesn’t move towards the goal of having a single common format. (zprint looks like a very good base to build this kind of tool on though)
  • cljfmt doesn’t have a goal of providing a canonical format
  • emacs and Cursive formatters are both part of the editors and don’t have an easy way to run outside of the tools.
  • fipp isn’t currently suited for code formatting, but could be in the future with more work.

Stakeholders:

These are the stakeholders I’ve identified when thinking about building a tool like this. For a tool like this to get adoption it needs to have support from a wide section of the community, not just a single stakeholder or tool.

  • Clojure developers, i.e. You!
  • IDE authors: CIDER, Cursive, Counterclockwise, Calva, e.t.c.
  • Other tooling authors, e.g. Parinfer, cljfmt, zprint
  • Clojure Core may want to provide input
  • Others? Who else should be involved here?

Goals:

Here are some of the aspirational goals and outcomes I could imagine coming from this:

  • Fast cold start time
  • Fast to run - 10-100k LOC/second seems ambitious, but probably doable.
  • The production of a reference implementation formatter
  • The production of a specification which different editors and tools can use to implement a common code formatting style. This spec should be independent from the reference implementation, i.e. the formatting rules are not defined by the behaviour of the formatter.
  • The spec should be opinionated over being flexible, providing the fewest config rules as possible, ideally none.
  • Creating tooling to be able to report deviations in continuous integration, pre-commit hooks, e.t.c.
  • Able to run on a single file, maybe even a subset of a file?
  • Able to run without having to evaluate the Clojure code
  • Works across Clojure, ClojureC, and ClojureScript
  • A free service that can be installed as a Check in GitHub to check formatting and suggest formatting changes for open source projects.
  • Able to provide the same output even in the face of many whitespace changes, i.e. whitespace (mostly) doesn’t matter
  • An online playground for reformatting people’s Clojure code, similar to the Prettier playground
  • Ability to use the CIDER indentation specification for controlling custom indentation.
  • Ability for IDE authors to build tooling that follows the spec. The supporting tooling has to be a first-class citizen.

Inspirations/Prior Art:

I haven’t seen many of these kinds of very opinionated formatters for Lisps, but I’d be very interested if anyone knew of any. Does the Lisp culture select against these kinds of rigid tools?

Contexts where formatting needs to run:

Formatting happens in different places, we should design a solution which can work well for these different contexts.

  • Running a reformatting command in an editor
  • Typing in an editor
  • Command line usage for detecting format deviation
  • Command line usage for fixing format deviation
  • In an online playground like environment?

Benefits:

  • Consistent formatting when reading code
  • Consistent code formatting amongst team members using different editors (or even the same editors!)
  • Reduced git diff noise when making changes as formatting and whitespace is consistently applied
  • Eliminate time spent worrying about formatting, or nitpicking it in code reviews

Non goals:

  • Maintaining compatibility with any particular code base

Things to consider when making decisions about formatting rules:

  • Readability of the code
  • How it impacts common code idioms, i.e. look at examples of what it would do to real code
  • Git diff impact when things change, e.g. lining up map values will often make extra whitespace changes if you add/remove map keys
  • Community conventions, both written and unwritten.
  • Pathological cases
  • Difficulty to implement
  • Lisp heritage

Decisions:

The purpose of this thread isn’t to figure out the answer to all of the different formatting decisions that would need to be made, but here are some of the kinds of big and little decisions that would need to be made.

  • Do we only support UTF-8 files?
  • Should we break lines at a certain line length or not? If so, should the line length be configurable? - https://news.ycombinator.com/item?id=17273616
  • Should trailing whitespace be removed?
  • Should we format to a single trailing newline at the end of a file?
  • Do we want to reorder ns forms to follow something like Stuart Sierra’s style guide? (I’m really in favour of this personally)
  • Do you remove the whitespace on a blank line between two indented lines or keep it in?
  • Should there be a space between #_ and the next form? What should happen with multiple #_s?
  • How do we deal with comments? Is there a difference between ; and ;;?
  • How should reader conditionals be formatted? Should the conditional itself be outdented so that the code flows better, or just treat it as a regular form?
  • What point on the continuum of formatter and linter do we want to hit?

Further tools that could be built from this

In the future, I could imagine this kind of tool may be useful for other tooling that works with source code like:

  • Building automatic rewriters for upgrading libraries or tools, e.g. an automatic migrator from clojure.spec.alpha to clojure.spec.alpha2.
  • Linting Clojure code (e.g. Eastwood)
  • Spotting better Clojure idioms (e.g. Kibit)

Process:

The process I imagined building this tool would work would be:

  • Discussion continues on this thread from interested parties about the idea
  • Find a group of people who want to be part of building it
  • Do a survey of existing formatting tools to determine if one of the existing ones is suitable to adapt/modify/collaborate with on the goals of this project
  • Work together to figure out the shared values and goals of the project, to make sure we are aligned before beginning work
  • Start a GitHub repository in clj-commons and create some issues for the different formatting rules that would need to be decided
  • In parallel, start building/adapting a formatter to implement the formatting rules. At this point it might be useful to do a few spikes in different directions to see what is the most promising route.
  • Find common ground on formatting rules and incrementally add more and more rules over time, releasing early and often, developing a spec and a reference implementation. I had thought that starting with an ns formatter based on Stuart Sierra’s guide (or similar) would be a really useful starting point, as I don’t think anything like that currently exists.

Whats next?

What haven’t I thought about? Does this tool already exist in a form I’m unaware of? Are there other people who we should be talking to about this? Do Clojurists value flexibility over regularity so much that you would never use a tool like this? Is such a tool impossible to make in Clojure? Is this a tool that you’d like to use? Is this a tool you’d like to help build? What are your thoughts?


#2

I have quickly glanced at this and the first thing I have thought about was: static analysis tool yes! :smile:

One opinionated opinion :smile: excluding Clojure cause of the startup time, I would develop the tool with the only other language we care about - ClojureScript.

Distribution can happen over npm and we have things like abio that can keep interop far away.

More input coming in the next couple of days, I will also bring this up in conversations at the Conj. Wonderful idea!


#3

GraalVM’s native-image could be used for faster startup time.

I am pretty happy with existing tools/conventions though. As for conventions I use: https://github.com/bbatsov/clojure-style-guide, as editor I use Emacs+cider, on ci I use cljfmt and I don’t remember any clashes so far between cider and cljfmt.

So for now I am unsure what should that new formatter offer that existing ones (I use default config in both cider and cljfmt - I didn’t tweak anything. Possibility to tweak stuff is nice though) don’t.


#4

There is also Joker which I believe was made for this use case in mind. Only a subset of Clojure-looking code, but also a single binary in the end.

It already has a linter so perhaps adding a formatter might be a natural extension.


#5

There is a nice thread about clojure doc and zprint How to best integrate zprint as pre-commit hook?

Is there any point in building a new tool since zprint seems really well designed & fast https://github.com/kkinnear/zprint/blob/master/doc/graalvm.md ?

My first thought would be contributing to zprint so that it can handle all the formatting rules. Then have a fork or build of zprint with no config options (just the style-guide) or having the zprint default be the style-guide.

Think it was mentioned that using something like zprint in cursive is a bit tricky (since cursive has it’s own internal formatting stuff). Thinking that the idea of a common export/import format between cursive & zprint could be a nice solution.


#6

I’ve been talking to Joker author about using the tool for code formatting and he said it’s a non goal for the project. But perhaps a fork can be used to build formatter on top.


#7

What about formatting userland macros that introduce “custom” syntax and thus bring own formatting/indentation rules? I’m not sure if common rules does cover all possible variations. Perhaps this is worth more thought/studying?

As an example I can immediately think of Rum’s components syntax.


#8

Elixir has also recently introduced a standard formatter in core. See https://hexdocs.pm/mix/1.6.0/Mix.Tasks.Format.html


#9

I think this is where the CIDER indentation specification comes into play.


#10

On our team we have Cursive users and a Vim user. There’s some differences in how code is formatted, which sometimes causes pointless git diffs and tiresome discussions. It’d be great to have a tool that Just Works for any editor, and that prevents the seemingly inevitable bikeshedding about formatting on teams that uses several editors. I would be more than happy to trade control for consistency when it comes to code formatting.

I like the idea of starting with something simple like ns formatting. There shouldn’t be too much controversy surrounding those rules. There’s a lot of infrastructure and integration with editors and plugins that need to be written at the same time.


#11

I think that the source code layout section of bbatsov/clojure-style-guide is a very good resource. It shows that there are options. I think that these options should be used judiciously.

I would welcome a tool that does the same as Emacs/CIDER: reindent, remove trailing whitespace.

I would accept removing newlines before closing parentheses/braces/brackets.

I am wary of removing/adding other newlines. Allow the user to put the newlines, then indent rigorously.

gofmt goes too far.


#12

I’m extremely wary of any attempt to establish universal canonical formatting, for the simple reason that the rules for these tools inevitably accrete and never shrink. I don’t think I’m unusual in that I generally follow orthodox Clojure formatting, but I differ quite a bit from Stuart’s style guide, and in a few specific cases I use whitespace in slightly idiosyncratic ways. I don’t want some far-away format committee-dictatorship deciding that those are unacceptable and I need to conform. If a client or my team decides, fine, no problem, that’s a different story.

As an analogy, I use lein-bikeshed and love that Bozhidar established the Clojure Style Guide, but with both of them I totally agree with 95%, but I find that the remaining 5% is unconvincing overreach and I never want that 5% imposed on me.

My vote is to dampen our enthusiasm at making a lot of universal rules. The formatter should stick to the very few rules that are truly universal, and the rest (like Stuart’s style guide or really any of the individual questions in the OP) should be configuration options that need to be explicitly flipped on.


#13

Oh, wow, this is awesome. I am very interested in trying to help the Community Guidelines influence more code out there. There are so many things I want to comment on, but most of it will have to wait.

However, let me +1 on the importance of speed and ability to integrate into editors.

You say:

I would say that this is really important in the editor case, especially for matters of speed with format-as-you-type. For this Calva Formatter only reformats the current enclosing, list-ish, form. Since I want this formatter to help the Community Standard, I use cljfmt as the formatting engine. Cljfmt isn’t really meant for this to begin with so does not have the super performance that is really needed, making this minimal-range formatting extra important, but I think it will always be needed for large files.

Other things needed by the editor integrations are:

  • That the formatting rules can be relaxed when used as you type. (Entering a newline in a paren-trail, moving some brackets down, should not immediately cause them to fold up again, for instance.)
  • That the formatter can take the current cursor position and selection into account, and inform about where the cursor should move on the reformatted text. Todays editors with multiple cursors and selections makes this extra interesting, but we can start with the single cursor/selection case.
  • The formatter should offer some low-level API to it’s AST/zipper or whatever it uses so that the integration doesn’t need to parse the text itself for figuring out things that the formatter already has figured out. (I think I sometimes have three parellell ASTs and allocations of the entire buffer text.)

There’s more on my mind regarding this, but I’m short of time and will have to return to this. Again thanks for picking up this torch!

No, one more thing. I really hope this formatter can be made available on Clojars for consumption by ClojureScript programs, because that’s what Calva needs. Zarro startup times and no managment of extrenal processes, please. :slight_smile:


I need some zprint config examples
#14

There has been a suggestion to use fipp but it specifically says on the README:

Fipp is great for printing large data files and debugging macros, but it is not suitable as a code reformatting tool.

With a link to the explanation why – in essence, fipp wants to maintain linear time complexity.


#15

Counter argument is prettier that has a handful of config options but has still seen very wide adoption in the js community.


#16

Would be very interested in hearing what @colinfleming thinks about this since ideally it would be a part of Cursive. Otherwise I guess it could be a separate cursive plugin altogether.


#17

Way into this idea, I think there’s a strong value proposition in making static analysis more reliable and decoupling the format for ‘code at rest’ and ‘code in an editor’.


#18

I would use this thing, hands down. I really like the no (or very limited) configuration idea. zprint is cool, but yeah the config options are overwhelming.


#19

Yes!
Keep the number of options low, if any.
Whatever the default is will be suboptimal for most users and that is okay.
The value of keeping the code consistent between persons and editors is more than some small formatting gains.


#20

The idea would be to go even further than those tools in formatting. One good example would be formatting and reordering namespaces so all of these namespaces would be reformatted to the same thing.

(ns test1
     (:require [clojure.edn :as edn]
               [my.app :as app]
               [clojure.java.io :as io]))

(ns test1 (:require [my.app :as app] [clojure.edn :as edn] [clojure.java.io :as io]))

(ns test1
(:require [clojure.edn :as edn] [my.app :as app])
            (:require [clojure.java.io :as io]))

;; All format to this (for example) =>

(ns test1
  (:require [clojure.edn :as edn]
            [clojure.java.io :as io]
            [my.app :as app]))

I’ve updated my original post, but I should have made this clearer, I think zprint is a great tool and is a good candidate as a starting point to build a more opinionated tool on. We would need to investigate the different contexts that this tool needs to run in and whether zprint is suitable in those spaces, but it’s definitely a front-runner in my mind.

This is going to be a purely optional tool, I don’t imagine ever enforcing this on anybody else (nor can I even think how you’d do that). I totally get the desire for flexibility in how you format your code, and it seems like this kind of tool is not something you’re after. However, I do see people in many programming language communities who enjoy the constraints of having a tool make formatting decisions for them, even if those decisions are suboptimal. That’s the space that I’m targeting here.

This is a really key point. In VS Code I have “auto format on save” and “auto save on switch window” set. The only issue I have with with this is if I am partway through a sentence in Markdown and have typed a space, when I switch away to check something and switch back, my space is gone. Slightly tangential to your point, but I think being editor aware is really important.


Another thing I just thought to check was if the Language Server Protocol has any support for document formatting, and it does: https://microsoft.github.io/language-server-protocol/specification#textDocument_formatting.