I am experimenting with using zprint as the formatting engine for Calva Formatter. I am hoping that I will be able to offer it as an option to cljfmt, which I am currently using.
Calva Formatter currently has two ”modes” of formatting. One that happens as you type and one explicit that is invoked on command (and on save, if that is enabled). The former is more relaxed and lets you keep empty lines around and such, while the latter is stricter and tidies things up more.
I am having trouble getting zprint to relax on removing newlines and I get lost in the documentation, so I’d like to have some examples, if such ones exist, my Google Fu fails me. Also general pointers from people using zprint would be nice.
I can follow up with more explicit requirements if that makes it easier to help me.
There is a :community option in zprint, but it is still very opinionated on where to place newlines and not. Out of the box, cljfmt is much more ready to let me have my newlines where I have put them. Right or wrong, but formatting-while-typing gets weird if It refuses to enter newlines or adds and remove new-lines.
My ambitions with Calva Formatter is that it should be a contributor to a community style. But I also want it to remain an option for teams where they have different style guides. So I am planning to make Calva Formatter accept custom cljfmt configuration maps. When I saw that thread you are referring to, I realised that I should probably also support zprint and its configuration maps.
But… I will need to be able to configure any formatter I use to mainly/only do indentation and justification (zprint’s term) while the code is being typed. And I am currently not sure that I can have that with zprint.
@pez, to start off with the issue that started this thread – yes, zprint is almost totally opinionated about where to place newlines. You couldn’t find any way to configure it to just indent code because they don’t exist. There is a very limited capability to respect new-lines in vectors, but that isn’t at all what you are looking for. To briefly recap what I said in the “community formatter” thread, there are indenters and there are formatters, and the difference is how much respect they have for newlines in the code (with indenters having a lot, and formatters having little to none). Obviously there is a spectrum of behavior from one to the other, and zprint was written explicitly to be on the formatter end of the spectrum. I am currently exploring how and whether it is possible for zprint to be enhanced to move some distance toward the indenter end of the spectrum (due in part to a current issue, and in part because this is important to you). I honestly don’t know how this will work out. Prior to your interest I had about decided that respecting blank lines was what I was going to try for, and ignore all other newlines. But that’s not going to meet your needs. So, we’ll have to see about this.
Also, you indicate that Calva has two modes – “as you type”, and “on command or save”. Zprint is a great option for a “command or save” formatter, and would hopefully be easy to integrate for that mode. That is how it is used today in the existing editor integrations and by anyone using it as a unix “filter” with their editor or IDE. If there is something that it needs to do to be a “command and save” formatter, please let me know, as nothing I’ve noticed in these posts suggests that it isn’t ready for that role in Calva. I’ll be glad to help with any issue or questions you have there.
Certainly zprint was never designed to be a formatter that would do “while you type” indentation. That doesn’t mean that it can’t be stretched to do so. I’d like to gather the list of things you need a “while you type” indenter to do here, so we can both assess how close zprint already is and what needs to be done to get there.
“More respect for new-lines.” Does this mean that you need a pure indenter, that is, something that respects all existing new-lines? Or is there some intermediate ground? This is by far the most challenging of the five requirements I’ve listed here, just FYI.
“Consider cursor position and selection before and after reformatting the text.” This makes sense, but I’m not clear on the meaning of “consider”. zprint has an intermediate output format that contains a lot more data than shows up in the string output. I have some escapes to output this information for testing, but it would be simple to make it a defined output format. This is essentially a vector of 3-tuples, where each tuple has [<string> <color-keyword> <element-keyword>]. It has :left and :right elements in it, and currently accepts “paths” to specific elements. This is because zprint supports a “focus” highlighting mode, where on input you tell it what expression to highlight. The way you tell it what expression to highlight is to give it a path in a vector, where each element is how far you go to the right at this level before you go down. It works for the uses to which it has been put, it only gets you to an expression, not a character. As it happens, this path doesn’t change regardless of what zprint does to the formatting. This is a bit of a long shot, but if it would help, you could give zprint a path before you called it, and you could accept the intermediate format on output, and zprint could easily tag the element that your path pointed to in that intermediate format. To recover the output string, you just (apply str (map first <intermediate-format-vector>). Which is what zprint does on output. Just a thought.
“Format a given range of code.” zprint will format any “top-level” form now. That is what it does best – to do a whole file is actually a bit more work. It doesn’t know how to do less than a top-level form, since to do that it doesn’t know the where to indent it. I can imagine that if you were to give zprint the current place on the line, and the current indent, then I might be able to enhance it to operate within those boundaries. It is always going to have to be for full expressions, that is a given. The parser it uses does not parse things that are not expresions. Which may be a blocking issue right there, I don’t know.
“Format a given range from the cursor.” There are two ways to think about this. This could just be a situation where you would do #3, and you would find an expression “up” from where you are, and tell zprint to format that. Alternatively, zprint will take a full top-level expression/form, and will format it, and then only output a configurable number of lines before and a configuration number of lines after the “focus” point. Which might (or might not) be more what you are looking for here.
“Access to the internal AST/zipper format.” zprint is called zprint because it does zippers. I haven’t documented a zipper as an input because nobody seemed to care, but that would be easy to do. zprint doesn’t modify the zipper it creates from parsing the source, it builds its own output (as mentioned above). But if your editor kept a zipper updated with changes, you could keep giving that zipper to zprint and save the parsing time. That would work great! Overall, I think using a zipper as the input would make most of the requirements easier. Not #1 though, unfortunately.
These are the 5 requirements that I’ve gleaned from this thread and your comments in the other thread, but I may well have missed some. Please add any additional requirements, and let me know your comments on these. It would be useful if you could prioritize these as “must have” and “nice to have”. Also, I believe these 5 requirements are all for the “as you type” formatting. If there are any requirements for the “command or save” more complete formatting, please let me know. If any of the things I’ve mentioned as possible solutions to the 5 requirements above would be useful for the “command or save” formatting, that would be good to know.
We can keep using this thread to discuss things for a while. When and if we think we have a plan, I’ll want to turn these into specific issues in the zprint repo for tracking.
Oh, wow. Thanks for taking these things up for honest consideration. After having tried to bend zprint to do what I needed, I sort of concluded that it wasn’t really meant for that. I am still planning to offer it as a ”Format Document” command, supporting feeding it with a config map so that it can be used by people who have such maps and zprint in their work flows. (Format on save is tricky on VS Code because one needs to be super quick to perform it or VS Code will just skip it.) I am super happy for you opening up a discussion for seeing if something could be done in order for me to offer zprint for all the use cases in Calva Formatter.
Some of the stuff I mentioned in the CLJ Commons Formatter thread was in the context: "We are making a new formatter and need to consider its place inside the editors as well”. So not necessarily stuff that must be handled by a formatter/indenter used by Calva today. I hack my way past some limitations (even if I rather would wish to not have to do that, of course).
Let me start, like you did, with the Format Document/On save use case. I think I know how to easily do that without you having to support it in any particular way. There is one thing that would help me throw out one of my hacks is if I could be informed by zprint where the cursor should move to ”stay” in the same place in there structure after formatting. I have a regexp based hack for that today, but on my wish list is to be able to kick that out.
Using your numbering from above:
I hear you about that ”keeping existing newlines” is the most challenging one on my wish list. It is also the most important one, and it goes a bit beyond keeping just newlines, as it is about some of the whitespace in general, as this, fixed, issue on cljfmt shows. Yes, I think it is that while in this "on type” mode, the formatter should be an ”indenter only” thing (and aligner/justifyer). There might be intermediate ground, of course, but we should maybe try find that in a voice call, some day? I do think I can hack my way mitigating this, now that I think about it. But my hacks tend to make things a bit brittle, is my experience, and I think formatting needs to be a really solid and reliable thing.
I will experiment a bit with the info you provided. I do have a hack to mitigate this somewhat today, as mentioned above.
I think with that idea you had about Calva keeping a zipper updated with the help of zprint, might be a way forward for this ”format a range only” case. But it looks a bit like this: In order to keep the formatting as you type snappy enough Calva only formats the current enclosing (by some type of brackets) form. The issue with newlines keeps me from trying out if zprint is fast enough so that the current top level form could be used. In any case I can feed the formatter with any form and deal with padding the needed indent on it, I guess. With cljfmt I can do this, however:
It pads the indent from the first line on the following ones, which is sort of necessary with cljfmt, because it indents some things, but not others so I can’t do the padding myself. Doing the same thing with zprint, for reference:
Apropos only formatting expressions: I do have an issue filed towards Calva Formatter about being able to format non-expressions, but I am not obsessed about that yet myself, so not a blocker.
I solve this like #3 today. But whatI mean with this wish is that I’d like it f I didn’t have to extract the form to be formatted, but could rely on the formatter to do this for me, given the cursor position within a larger chunk of code. (The whole text, preferably, but maybe it would have to be the current top level form, given zprint’s way of doing things?) Doing it line based does not really work, I think.
I’ll have to think about this a bit. I am not a very experienced Clojure programmer and haven’t even ventured into zippers yet. But it sounds like a good idea. If I understand it correctly I would ”back” the editor with a zipper corresponding to the current text and feed this zipper (or even parts of it?) to zprint? Would zprint give me instructions on how to edit my zipper?
I’m running out of time right now. Will have to return to this. Again thanks for lending me your ear this generously!
Let’s work on this one first. You have to be able to tell zprint where the cursor is to start with, and zprint has to be able to tell you where it “ends up”. zprint doesn’t think in terms of lines and characters on input – it parses the the code, and deals with the resulting zipper. Currently, the only input of this sort is for using the “focus” feature of zprint, described here. If you were to give zprint that information then zprint could give it back to you in one of two ways:
If you were willing to accept the zprint “rich” format for the output (which is a vector of 3-tuples), then zprint could tag the 3-tuple that was matched up with your path on input.
Alternatively, zprint could (with some work), probably figure out the line and character in the resulting string output and return that to you as well.
I think the big issue is – how would you tell zprint where the cursor is on input?
If you were to give zprint a zipper to format, with the current loc being where the cursor was, that would be very straightforward. Note that zprint doesn’t alter the incoming zipper – it walks over it and generates its output from what it encounters. It doesn’t change or edit the zipper to produce the output.
I don’t think I’ve been clear on zippers. zprint will take a string and parse it with rewrite-clj into a zipper, and then it looks at the zipper internally to generate an internal form of the formatted output, which is a vector of 3-tuples (that is, vectors with three things in them). From this internal format it will generate a single string for output with or without ANSI color escape sequences in it. zprint will accept a zipper created by rewrite-clj as input, and skip the parsing that it would otherwise do, and can be trivially extended to offer up the vector of 3-tuples as output if that would be useful. It never changes the zipper at present. That is, it doesn’t modify the zipper internally as part of the formatting.
To try to answer your question – if you ask zprint to parse and format unformatted or previously formatted text, it will generate the same output, though the internal zipper rewrite-clj will parse the input into will be different. But zprint doesn’t care about the “whitespace” today, so the parts of the zipper that zprint cares about will be the same regardless of the formatting – as the formatting is only “whitespace”. Now, I’m working to see if I can change that, but that is what is happening today. I’m not sure this answers your question, exactly, so please ask again if I didn’t get it.
Oh, you have probably been clear enough. But I do appreciate this extra explanation. I will have to experiment a bit with both zippers and zprint and those paths and stuff. I’ll look at that three-tuple data as well. It is a bit different to deal with the formatters, like cljfmt and zprint, than with the editing and indenting libs I am also using, like Paredit and Parinfer. Those latter use editor semantics. I think I might need to write some layer between the formatters and the editor that can give me some of those semantics back even though I am dealing with a formatter.
Yes, namespaced keywords and maps are a trial. There has been some work in rewrite-clj for this: https://github.com/xsc/rewrite-clj/issues/54. I don’t actually know if it has solved the problem. I think the comment toward the end of that issue is important — exactly how to model all of these things in the resulting zipper is important. Nobody has complained to me about the particular issues you raised, and I don’t believe that the syntax you mention is common. But I’m sure it won’t be long before I someone has this problem with zprint.
If rewrite-clj has actually solved the problem, there is some possibility that the solution would be transportable to rewrite-cljs, as I believe much of the code was just ported over. I saw this coming a while back, but haven’t taken the time to try to figure out a solution myself.
In thinking about the cursor position, the real issue is how would we communicate about it. I don’t know how you think of the data, that is, what sort of data structures you use inside the editor. Would it be convenient for you to give zprint a line number and column number as the current cursor position? I think I could relate that to the zipper that I parse the input string into, and probably keep track of it, and then reconstruct something like that on output — return a line number and column number. Would that be a big help to you?
I have spent time thinking about indenters and formatters. I think there are several “levels” of indenter/formatter:
Don’t add or remove any new-lines. Just indent things on the lines they are on. This level of indenter doesn’t have any concept of width.
Don’t remove any new-lines, and only add new-lines if the line is going to be too long for the configured width.
Don’t remove any new-lines, and add any new-lines that make sense to make the code better formatted, including if the lines get too long.
Add or remove new-lines to make the code format “better”, but try to keep the existing new-lines if possible.
Add or remove new-lines without regard for where they are in the incoming code. This defines a formatter, not an indenter.
This is far from complete, but it does give some structure to the discussion. There are cases I didn’t include, for sure.
Today, zprint is largely #5. I think you want something which is #1 for the “as-you-type” formatting you are doing. I wonder if #2 would work for you for the “as-you-type” formatting?
I don’t yet know whether I can whack zprint into doing any of this well, since it was designed from scratch to do #5, but #2 seems like a reasonable target. It is hard to imagine #1 being interesting in general, but let me know if that is really what you need.
That would be perfect. As I said before I do have ways to figure this out using knowledge about the text before and after formatting, but it feels brittle and rather something that should be taken care of on the structural level. It would be of great help if zprint took care of this for me.
I am trying to imagine what this implies, but my gut feeling is that it would work with #3 as well. Calva Formatter (or if it is VS Code) isn’t ready for full out formatting-as-you-type, so today it happens mainly when new lines are typed. And I think that it would be convenient to have the formatter tidy up the code with newlines improving the formatting. It is when the new lines are removed that it starts risking to be a fight between the coder and the formatter. I see my colleagues put a few lines around the code being typed while they think and experiment, and then when they are done they fold up the paren trail and delete extraneous newlines.
A requirement that I might not yet have expressed is that I would like to have empty lines around the cursor indented such that entering something at the end of them would have that something correctly indented. I ”fool” cljfmt to help me with this today, by inserting a symbol in the text where the cursor is before formatting and then remove that symbol when replacing the text in the editor with the formatted version. Speaking of this, I recently had an issue fixed where cljfmt trimmed trailing space on empty lines (despite :trim-trainling-whitespace being set to false). So that is also important. (I think I posted a link to that issue above with a GIF showing why it is important.)
That would be a much easier goal to strive for, and would fit well into the “respect new-lines” approach that I have already been working on. I’m still not sure I can pull that off, but it is a lot more likely to happen – for two reasons. One: it fits into the existing code better. Two: it is likely to be interesting and useful for many more use cases than just an “as-you-type” formatter, so I’m more interested in making it happen even if it takes a lot of development time to make it happen. So thanks for letting me know that, I wouldn’t have guessed from our previous conversation!
That … would never have occurred to me. Thanks for mentioning it. It is too early in the work to know if that will just sort of fall out, or if it will be a big deal to make it happen. But it is good to know about it sooner than later, that’s for sure.
While formatting, I don’t expect to be able to tell when I’m “around the cursor”. I would expect to do this to all of the blank lines – indent them all to the “right” place, regardless of whether they are around the cursor or not. That would also take care of the “don’t trim trailing whitespace off of blank lines”, since I wouldn’t trim it off – I’d adjust it to the correct indent level. Would that work for you?
Yeah, that would work just great. In fact it would make the cursor almost always correctly positioned on ”empty” lines when moving the cursor up and down in the text.It will at least always be as easy as telling the editor to move the cursor to the end of the line. In Calva, an explicit formatting command will let zprint do its pure formatting thing and those indented, otherwise empty, lines will go up in smoke (as we say in Sweden).