CLJ Commons: Building a formatter like gofmt for Clojure

I totally agree with @tonsky and have been formatting my code the simple way he suggests for a while now. For people who use Emacs and want to try out this formatting instead of the default way that Emacs’ clojure-mode indents, set clojure-defun-style-default-indent to t in your emacs config.

1 Like

True. I guess for me I just feel like there are two types of zero-config app. It can be zero-config, because it just works sanely out of the box with no fiddling of configuration, and in most cases you’ll be happy with it, yet still allows some configuration if need be. And there’s zero-config in that there are simply no configuration features provided. And I find the latter doesn’t imply the former. Yet I feel the former is more important, especially if it can’t be configured, it means the out of the box experience has to be perfect under all environments.

I was just about finished with my tonsky-formatter, but this will prove a bit trickier. (I totally agree the update is needed, it’s just that it is a surprisingly (to me) different beast.)

Yeah. I was also considering a simpler solution: always indent everything with 1 space. No conditions required then. But I imagine people would have hard time accepting such a radical change

1 Like

I like the general simplification idea, but one space is too little, would affect how I read code. What makes two spaces harder than one?

1 Like

Parinfer. If we format code like I initially suggested

(defn f
  ([x]
    body))

It will move body inside the vector, because indented it seems to be inside. To fix this, we have two options: indent lists that start not with a symbol with 1 space (which might be tricky to implement), or indent all lists with 1 space (easier to implement, but too big of a jump). So I changed my article to first option (special handling of lists starting with symbol).

2 Likes

I never said is was a non goal :). What I said is that current pprint implementation was not suitable for code formatting and that code formatting would be a pretty big project, even if building on top of Joker: https://github.com/candid82/joker/issues/42. FWIW, I am experimenting in this area right now, but don’t know yet if anything will come out of it.

1 Like

excellent, is it time to do the following?

  • Start a GitHub repository in clj-commons and create some issues for the different formatting rules that would need to be decided
  • In parallel, start building/adapting a formatter to implement the formatting rules. At this point it might be useful to do a few spikes in different directions to see what is the most promising route.

i’d love to see the discussions on each of the ten decisions in their own Issue posts, as Daniel mentioned.

2 Likes

I would love to see “Standard Clojure Style”. As others have mentioned, this will help with maturity in the Clojure ecosystem :+1:t2:

The StandardJS project might be worth looking at as a reference: one tool, no configuration, opinionated, easy to use, etc. Under the hood StandardJS uses ESLint for checking and formatting code, but that could be swapped out with another tool without changing the rules.

I am trying to get cljfmt to follow @tonsky’s simple rules, but fail with the ”special” handling of lists that do not start with a symbol. I asked for help here:

If anyone wants to help, please do it on that issue, to avoid totally derailing this thread.

for some inspiration; here is Google’s style guide for Common Lisp: https://google.github.io/styleguide/lispguide.xml#Formatting

line length
no line is longer than 100 characters.

Indentation
Indent your code the way a properly configured GNU Emacs does.

Vertical white space
Vertical white space: one blank line between top-level forms.

Horizontal white space
Horizontal white space: none around parentheses. No tabs.

Yep, I think this is a good time to start getting into some more specifics. I’ve created GitHub - clj-commons/formatter: Building blocks and discussion for building a common Clojure code formatter and put some of the issues up for discussion. A few of the ones I put at the top were a little bit tricky to write up quickly, so I’ll come back to them later.

Feel free for you and others to start issues on the different decisions a formatting tool would need to make.

I 100% agree, I’ve been exploring a few different programming languages lately, and having a code formatter which idiomatically formats my novice code is very nice. It means that as I’m learning the language, I’m also learning the style idioms and getting nicer looking code than I could have ended up with by myself.

I think this points us strongly in the direction of @tonsky’s suggestion, which minimizes what we have to learn about code formatting idioms and lets us concentrate on learning the language and its idioms instead.

I started an issue for discussing indentation:

I also added a comment where I root for going with @tonsky’s indentation suggestions. All who want this tool to go that way, please give that comment a thumbs up.

1 Like

I find Nikita’s proposed new formatting system unconvincing for two basic reasons:

  • deciding how to format code based on implementation simplicity is backwards
  • creating a new opinionated formatting system that everyone should switch to rather than encouraging standardization on one of the existing popular ones is not appealing (again, it’s a clear example of https://xkcd.com/927/)

These are totally separate from my concerns about the actual content of the proposed rules.

1 Like

the xkcd comic is funny—and relevant when new ground being explored isn’t different enough from old ground to merit the exploration. If you don’t think this is new enough ground, can you bring up the specific objectives made in the proposal that are accomplished by another system?

As for simplicity, Nikita’s article mentions that a simple implementation has the effect of combatting the reasons for lack of adoption without adversely affecting current standards.

1 Like

My proposal is not based on simplicity. It actually could’ve been very complex. What it can’t be is it can’t have versions and can’t depend on custom formatting of specific forms. Without that, it’s actually quite hard to come up with complex format, so simplicity is achieved naturally.

I’m all in for existing ones but they all depend on exceptions rather than universal rules. This is a red flag for anything you want to be used by the whole community.

4 Likes

I don’t think we spent enough time addressing why cljfmt is insufficient?

I just made a native-image build of it and hooked it up so it runs everytime we lint our code.

The default options seem perfect to me. And I like how it follows mostly the clojure-mode indentation, also it’s used by emacs and cursive for buffer formatting.

OP listed this reason: https://github.com/weavejester/cljfmt/issues/120

But line breaks are a highly contentious rule, and one that tonsky’s rules doesn’t cover either.

So for now the only reason why cljfmt doesn’t already fit the bill is because it is not a total formatter. So is there interest on a total formatter?

AKA one where every single white space combinations including tabs and newlines always reformat to the exact exact same format. So you can’t even control line breaks or any alignments whatsoever.

I think that’s what I understood from the issue, correct me if I’m wrong @danielcompton

3 Likes

I don’t think we spent enough time addressing why cljfmt is insufficient?

Definitely happy to discuss more about cljfmt and whether it could be suitable. I outlined in the original post that I wanted something that made more formatting decisions, cljfmt is relatively conservative compared to things like zprint, or formatters in other languages. However I’m not set on this, and it’s possible that cljfmt is what people can agree on.

So for now the only reason why cljfmt doesn’t already fit the bill is because it is not a total formatter. So is there interest on a total formatter?

AKA one where every single white space combinations including tabs and newlines always reformat to the exact exact same format. So you can’t even control line breaks or any alignments whatsoever.

That’s where I’m interested, and where I see a gap in the tool space for Clojure. I don’t think a total formatter would mean that you couldn’t control line breaks and alignment. I’d probably prefer alignment to be automatic, but can definitely see that having control over line breaks as being very important for code not becoming unreadable.

But line breaks are a highly contentious rule, and one that tonsky’s rules doesn’t cover either.

There are two related issues, line breaks and indentation. Depending on your formatting rules, you may have one control the other, or have your formatting rules control both.

You could imagine a very aggressive formatter which automatically breaks lines when they get over n characters as well as indenting the re-broken lines. That is at one end of the spectrum and is the design space of some formatters like Black. At the other end of the spectrum is something like cljfmt which does some indenting, but mostly leaves idiomatic code alone, even if it isn’t indented correctly.

Tonsky’s guide lets line breaks drive indentation, which seems like a pretty reasonable way to do it.

2 Likes

I think that custom formatting of specific forms leads you down some pretty tricky paths.

I’m not as concerned about the formatter slowly evolving over time. You definitely want to pick an approach that you can stick with and not be reformatting everyone’s code every three months. While the spec is being developed and adopted there will be bugs or new situations that haven’t been considered and you would like to be able to address them. I could see an argument for setting the threshold for making changes quite high though.

1 Like