CLJ Commons: Building a formatter like gofmt for Clojure

didibus · December 12, 2018, 6:16am

Maybe we should start listing some examples of what cljfmt does and doesn’t do?

For example, it has specific indent and block rules for certain forms here: cljfmt/cljfmt/resources/cljfmt/indents at master · weavejester/cljfmt · GitHub

But I don’t know what it defaults too otherwise.

I’m not sure I follow people’s objections against having more readable rules for common forms. But say that’s where we wanted to go, it could just be that we need to create a bundle of cljfmt with no specific rules.

It’s probably like a day’s work to wrap cljfmt into a container that freezes its options. So it would be easy basically to create a cljfmt bundle that prevents any configuration, and defaults it to a canonical one we all agree on. And then made a native-image of that for fast startup time.

Because of how simple and quick this option is, i feel it be nice to discuss cljfmt more. Can we not make it work for our use case? And if so, why not, give some examples, concrete reasons of its behavior, etc.

PEZ · December 12, 2018, 8:36am

If the prevention of configuration is not of value in itself, cljfmt can already be instructed to ignore its built-in rules using the ^:replace hint. So this project can provide its own rules.

As I am hoping we will go for Tonsky’s rules I tried to make cljfmt adhere to them, to build a formatter for VS Code where people can try those rules out, but run into what is probably a bug in cljfmt: https://github.com/weavejester/cljfmt/issues/154 I would greatly appreciate help with fixing that issue.

danielcompton · December 12, 2018, 8:05pm

I’m starting this project more from the end of defining a formatting spec that can meet the goals outlined in the initial post. Once we know what format we’d like to pick, we can then examine existing tools to see if any are close and can be modified to match the spec. I don’t want to just pick a tool and say “This is the tool”, because that defines the formatting spec as the implementation of that particular tool. Rather, we should think carefully about what an ideal world would look like, and then work towards it.

There are many different contexts that formatting Clojure code needs to operate in, I think it would be useful for there to be a spec and set of test cases defined so that people can write their own implementations. As an example, Cursive, CIDER, and Calva all have JVM parts to them, but I’m not sure if the architecture of these editors would suit just dropping in cljfmt.

didibus · December 13, 2018, 7:42pm

Could still be useful to start listing example code snippets, and use cljfmt defaults as the baseline. Then people could argue on a per-example basis what they dont like about the way its indented, aligned and line broken, and what they’d prefer.

By the way, I use indentation to mean how much space are in the beginning of the line. I use alignment to specify how much space to have in between symbols, and line breaks as where line breaks should be put. You probably also can have blank line adjustments, as the number of blank lines between forms. There’s also normalization, such as trimming of whitespace at the end, conversion of tabs to spaces, etc.

bbatsov · December 14, 2018, 12:21pm

Hey there!

For those of you who don’t know me - I’m the author of CIDER, the editor of the Community Clojure Style, and in the Ruby community I’ve spent years working on a formatter and linter for Ruby code (RuboCop) and a Ruby style guide that goes with it.
I think all of this gives me a somewhat unique perspective, as I’m both really passionate about setting up (community) standards and I’ve also experienced how painful all of this can be in practice.

I’m a couple of weeks late to the party, but here are my thoughts…

On the original proposal

I agree that ideally the tool should have no configuration options, but I doubt that’s feasible in practice. When I started work on RuboCop many years ago, it wasn’t configurable and many people were outraged by this. The Ruby community was 15 years old at the time, many formatting patterns existed and very few people cared about global code consistency - most cared about getting consistency in their projects, and of course - with their own style preferences. Luckily for us Lisp’s semantics are much simpler than Ruby’s. Making RuboCop configurable was instrumental to its wide adoption in the Ruby community - we never got complete alignment in the style department, but we got some alignment and this definitely beats none. I know some people are still trying and maybe they’ll succeed at some point, but I lost my desire to participate actively in this, as I’m tired of endless debate over trivial matters.

I think that for the proposed formatter tool to be useful in editors it should certainly be able to operate on lines (and groups of lines), as reformatting the whole files all the time is somewhat annoying for users and not always an option in the first place. That’s probably the biggest reason why most editors have their own indentation engines - they give them the most flexibility with respect to whether you want to reformat everything, just a few lines, etc.

Some people might think that’s not a big concern, but few established projects would accept global changes to indentation and formatting just for the sake of uniformity. Clojure is well-known to be one of them.

I guess for editors a great “API” would be - you send a filename and a range of lines (or characters if you want to be extra granular) and you get their formatted version. If it’s configurable it might be nice if editors can override some of its configuration options (prettier style).

Nikita’s Proposal

I’m an old-time Lisper and the proposal definitely made me grind my teeth. I understand just as well as everybody else that this is the simplest (and the only way without some extra formatting specs) way to achieve uniformity, but I think the price we’ll pay for this is
semantics.

“Special” indentation rules usually exist to related “special” semantics. Once you put everything under the same denominator you’re in effect saying that the semantics don’t really matter, which is always debatable. I might expand on this in a separate blog post if I find the inspiration to write one.

I don’t agree there are some rules in existing standards that probably can be simplified, but overall similar constructs have reasonably similar formatting rules and the only real problem is relaying formatters those semantics.

The approach of metadata has always been appealing to me, because:

it’s self-documenting
it’s easy to extract this info when doing static analysis
you have the info at your disposal when doing typically REPL-driven development (as CIDER does it)

It’s also something that Common Lisp and Emacs Lisp have proven to work well - but there we have universal consensus about the metadata and tools that understand it.

Community Uptake

I think that adopting an universal formatter/code style 10 years into the existence of a language is unlikely to (fully) succeed if it’s not driven from the top. There will always be strong opposition to whatever we decide, as people have built strong preferences at this point and they’ll need extremely compelling arguments to change them. Changes is hard, and no one really wants to deal with it, especially if they don’t have to.

gofmt succeeded mostly because it was pushed from the top, pushed from the start and everyone was expected to use it. I’m reasonably sure this ship has failed for Clojure, just as it has sailed for Ruby. I don’t know how successful the similar projects for other languages are. I know only that Prettier is quite successful, but it’s also configurable to some extent and it didn’t really propose anything novel in terms of formatting.

That’s why I think that the only way for a tool to gain much traction would be if it’s aiming to enforce something relatively close to what people are doing currently.

Conclusion

I think a formatting tool that can be used from editors would be quite useful, so I’m looking forward to hearing someone create one, although if we set on a simplistic indentation scheme obviously it renders much of the need for such tool redundant, as any editor can trivially implement this (Emacs/cloure-mode has been supporting this “consistent” indentation for years now).

I think we should also set our expectations accordingly about what can be achieved in a mature community - a common formatter would probably have some uptake (as proven by cljfmt), but it won’t be adopted by everyone (also proven by cljfmt).

tonsky · December 15, 2018, 7:25pm

What about “align function arguments by default”. Am I the only one having trouble with this? I mean, it only works for really short functions, like or or and. Not for something like my-namespace/my-method. This would lead to a code indented way too much. Maybe defaults should be reversed here? No aligning is the default, aligning carries a special meaning?

bbatsov · December 15, 2018, 11:12pm

I think this came from the fact that people were looking for a way to differentiate a list literal for from some function call (and perhaps to highlight a function’s name), but that might be just a guess. I’ve noticed that people typically write multi-line literal lists with one element per line, but for functions they’d use the alignment you mentioned. If I had to speculate - perhaps this was influenced from how most Algol-like languages do such indentation. This arguably makes the name of function stand out and the code becomes easier to process by a human reader.

Engelberg · December 18, 2018, 10:06pm

If this existed, I would gladly use it.

Every few years, I try out all the latest Clojure IDEs, and switch to whichever one works best for me at the time. It has been frustrating for me when the IDEs disagree over how to format my existing code base. I would love to have one standard.

pjstadig · December 18, 2018, 11:02pm

I have strong opinions about code formatting, but even stronger is my desire to not think about it. I agree 100% with Stuart Sierra’s “How to ns,” but if there was a fast, standalone, zero config, Clojure code formatter that did the “wrong” thing, I would use it (probably with bitterness in my heart, but I would use it), because I care, but I want to not care.

Like, do I think that vertically aligning let forms is easier to read…probably…I think it could be problematic because it would shift things further to the right, but mostly I don’t do it because it is annoying to have to do that all manually. If there is a tool doing it, I am much happier.

Actually, my ideal world would be a pre-commit hook that condensed things down into some kind of whitespace minimal canonical format, and a post-checkout hook that automatically formatted everything the way I like it, others could have their own personal view. That combined with a tree differ. I don’t even want to diff lisp code as lines of text.

I want to encourage this effort and not discourage it, but I do want to point out that the fact that you have to understand code to format it is a feature not a bug. A file of Clojure code is essentially a REPL script. Each form interacts with all the lovely, living objects that were created in memory by all the previous forms. You can define a macro and then use it in the very next form. You can add metadata to things and use it for formatting. Clojure code is best written with an active connection to a REPL.

I know this makes static analysis hard, and causes heartache for tool maintainers like Colin, but that’s the way I see it.

iyedb · January 4, 2019, 7:38pm

As a golang developer using gofmt I can’t imagine a language not having an canonical source code formatter with no settings at all. It’s simply part of the language: no controversies, no useless debates and tiresome and pointless style flame wars, no frustrating code reviews because my editor changed the formatting with respect to yours. What you call dictatorship, I see it as a practical, pragmatic move to make developers lifes easier when dealing large clojure code bases within a team of more than one developer.

didibus · January 5, 2019, 11:20pm

Does gofmt rewrites whitespace also? Like if I put a newline somewhere, would it get rid of it? So I can’t even control line wrapping and spacing?

iyedb · January 7, 2019, 8:28pm

No it’s not very rigid wrt empty lines

greinseth · February 8, 2019, 9:10am

I’d argue that it is harder to read. You seldom start looking at the values, but instead read the map as an index table, scanning the index (keys) and then looking up the value. If the values are aligned, then you’d have to trace the line from the key to value. And there are no visual aids (like the grid lines in a spreadsheet) to help you with that tracing.

(edit: including the quote I’m replying to)

grzm · April 3, 2019, 3:37am

The primary purpose of code formatting is to make code easier to understand for the reader. People (clearly!) vary in what they perceive to be easier to read, and it’s hard to say they’re wrong, that they should prefer some other representation. After all, it’s idiosyncratic.

With source version control, it’s often desirable to limit changes to only what’s meaningful. Some projects try to limit whitespace changes (which are what formatting is comprised of) by either by periodically running an indent/formatting script and committing only those changes, or by enforcing such run before each commit using commit hooks. Such issues would largely be obviated if we had “logical” source control which preserved images or canonical ASTs or some such, but we’re not there yet, at least for Clojure. Having a canonical representation for a project is often desired and useful.

These are really two separate use cases. One is for the reader, and one is for the repo. In our editors and IDEs, we have lots of options to make reading and editing easier, including formatting choices like typefaces and sizes, and syntax highlighting. We rarely expect everyone to use the same editor, much less the same style. I think code formatting and these other style choices are of a piece.

What I’d like to see is the ability for editors to present the code in the way each reader wants. When the editor persists the code to disk, it’s formatted following the standards for the project. Besides the code formatting tools, at least one additional piece of tooling would be needed, something along the lines of source maps, so code references can be mapped between contexts (say, servers that are reading the code in the canonical format, and each developer’s own format) for interpreting stack traces and the like.

As an aside, having tools which format the code as it’s read into the reader allow us to persist the code into other stores besides files, such as databases.

Richard_Heller · April 6, 2019, 11:17pm

I’d be all for a tool like that, although I’d prefer one with more options than a no options version. For me, the gold standard for code formatters is clang because almost every possible thing it formats can be customized. The main reason I code ClojureScript in VSCode is because it’ll format it without needing a REPL connection, which is a pain to get set up. Whatever VSCode is doing, I’d be ok if that could be done in other editors. I think it uses cljfmt, but I’m not sure how it gets by without the REPL. I’ve tried with vim and emacs and they both want a REPL to do formatting. I have a couple issues with how it formats things, but I’d rather have that than nothing.

Having source control format the code is how we did things back in the 2000’s before git took over. The more popular source control tools had server side formatting, so things got formatted when checked in. If somebody didn’t like the format, they would have a client side formatter on their machine to convert it to what they wanted. There’s no good way to do that with git, though.

system · October 6, 2019, 11:17am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.