X> & x>>: auto-transducifying thread macros (now with parallelizing |>> and =>>)

joinr · September 8, 2021, 4:39pm

I’m seeing a 6 times (or more) improvement in speed!

very nice, although I wonder why the implementation is so much more effective on the cljs side (I am a cljs dabbler and have little idea how the seq implementation is optimized or not on there). I know there are issues with things like locals clearing on cljs, so transducers/reducers are implicitly preferred. Maybe there is a connection.

didibus · September 8, 2021, 5:30pm

Could you make this a seperate macro? I feel your initial macro was focused to rewriting things as a transducer pipeline, but now it’s slowly becoming some kind of better-thread library, it be nice to seperate those. I’d prefer the semantics of the macro to remain as close to normal threading as possible personally.

Just feedback, and my opinion, feel free to ignore, you owe me nothing and I’m very happy for your library either way.

John_Newman · September 8, 2021, 5:56pm

@joinr: Because nums and strings are not applicable, they throw exceptions when you try to put them in the operator position in thread macros. That’s not currently a usable semantic. If they were applicable, I wouldn’t be able to load up on their behavior without changing their usable semantics. So I’d argue that this is an additive change to thread behaviors, from a usability perspective. But you’re right that the additions make it more difficult to migrate a thread back to ->> semantics.

@didibus: The points y’all bring up are good. I guess my reasoning is that, if people got used to the path navigation abilities of threads in x>>, then their obvious utility would be moreso common knowledge, and then I’d probably attempt to advocate/lobby to have those behaviors added back into -> and ->> in Clojure core. Because numbers and strings currently throw errors in the current thread macros, there’s no reason they couldn’t be upgraded as well - the new behavior would not break any existing code. I’d imagine it’d be harder to lobby for an addition to Clojure core for an idiom that nobody even knows about yet, but I’ve never tried.

I feel like having another in-> macro that did just the path navigation and form threading, while really cool, would get only a small amount of adoption - perhaps too small to put a real impression on the collective unconscious of the Clojure community, sotospeak. And if you want to reach for x>> optimistically, now you’re opting yourself out of the better navigation mechanics, but why? Why can’t you have both? Why shouldn’t all thread macros have these better navigation mechanics? Especially when it’s an additive change?

But I’m also torn about minimizing the surface area of this change… Like guaranteeing backward compatibility from x>> to ->> would be interesting. Anybody else have an opinion on this matter?

didibus · September 8, 2021, 6:25pm

Well, I think you’re bringing something that is a drop-in replacement which gives a performance boost if you need too, and adding a “hot-topic” to it, where there will be strong opinions for and against, both in a community, but even on an individual team. Now not only the team must agree that instead of just rewriting the threading to use the normal transducer syntax, we use this macro to help us, but also that having number and string as keys in our threading is now a good practice and a new standard we should use. Its just an extra hurdle, and for me personally, I disagree with the latter, but wouldn’t mind the former.

John_Newman · September 8, 2021, 7:24pm

I’d argue the team lead has a few avenues of recourse:

Just don’t advertise the availability of the new behaviors or administratively disallow them as a matter of code policy, or
Fork the code and provide the less capable version, or
Come here and explain why these new thread semantics aren’t better. If someone can actually articulate a good enough argument, I may change my mind

I agree, my trojan-horsing of the additional semantics will be viewed as a potentially contentious change by some Clojure veterans. But that’ll be true of almost any addition - we love to rag on unfamiliar aesthetics But I’m challenging them to justify their position.

Truth is, I’ve been mulling a potential get-in-like thread macro for far longer than the auto-transducifying one. Probably years now. And it only gets more annoying over time, knowing the semantics are “missing.” So it’s an argument I’m willing to have. But again, if someone provides a convincing argument against the additional semantics, I’m definitely willing to change my mind!

joinr · September 8, 2021, 7:43pm

As it stands, the “navigation semantics” are meaningless for the threading macros, since they literally are the simplest code transformations possible (outside of cond->> and as-> and other variations). They only fold operations according to position, and do nothing to infer access semantics or otherwise transform the code (with the notable exception that individual symbols are wrapped as lists, .e.g (-> 2 inc (* 3)) is equivalent to (-> 2 (inc) (* 3)). In this sense, the threading macros are very straightforward and yield no surprises.

Your new semantics constrains the context e.g. regarding “what” the application of otherwise (e.g. not supported by eval) non-applicative values are. In the context of a macro, literally a local extension of the language, that’s within your purview and anyone using the library.

What you really seem to be seeking is why doesn’t eval extend keyword-access semantics to primitive values in the function position (numbers, strings, booleans, etc.). That is a deeper discussion, likely ask.clojure.org if it hasn’t already been discussed.

John_Newman · September 8, 2021, 8:10pm

I’m not really doubting that wisdom. If strings and nums were invokable, I wouldn’t be able to make this addition without introducing breaking changes. Because they’re not, this is not breaking anything.

Also, it doesn’t introduce too much murkiness regarding which new value types are applicative - the contract is: “just think of it like a get-in, but with optional forms in between path values.”

If someone extends the num and string types in their own project and then intends use them to invoke things in threads, then they wouldn’t be able to in these threads, but nobody really does that.

John_Newman · September 8, 2021, 8:16pm

Also,

the “navigation semantics” are meaningless for the threading macros, since they literally are the simplest code transformations possible

I’ve seen a lot of code in the wild that navigates maps with (-> m :a :b :c), rather than using get-in. It feels natural, it’s convenient and it unwinds into something faster than get-in. So, sure, the navigation by keywords in threads was a happy accident, due to the invokability of keywords, but nevertheless we have this semantic in the wild, as a primary use case for threading. So I see this as an extension of that existing use case.

didibus · September 8, 2021, 9:29pm

My default position is to avoid bringing in custom macros that don’t provide substantial value, because they can create a kind of tribal knowledge. Every macro extends syntax and semantics, and if you have that in your code, people don’t “know it” simply by knowing Clojure, so they need to spend a bit more time learning about the particular macro, semantics, and getting familiar with it.

I just don’t have maps that have numbers or strings as keys, and the rare times I might have had one, adding (get) is like a one second thing. So it doesn’t meet my threshold for substantial value add.

I’m also a bit confused, like what if you want to thread the int or string as an argument to a function? Do you introduce a runtime type check to see that the threaded element is a map before introducing get?

John_Newman · September 8, 2021, 10:07pm

what if you want to thread the int or string as an argument to a function

That’s currently impossible in existing threads.

(-> x 2) will unwind into (2 x), which will throw an exception.

There is no major existing capability that is being removed from your toolbox of options by introducing this new semantic.

John_Newman · September 8, 2021, 10:18pm

I’m not currently checking the type of the first param to get. The caller knows it’s going to be used in a get. Nums and strings would not have had a purpose in those locations anyway, so we’re not messing anyone’s existing semantics. If you don’t put nums and strings there, it’ll continue acting like the old macro.

joinr · September 9, 2021, 9:50am

looks like at least one fundamental reason strings and numbers are not accessors in eval and likely never will be is performance related. Since the related classes are final (at the jvm level), there is no capacity it extend IFn to them and get efficient invocation as it currently exists. So lifting that to the language level (via eval) would imply adding relevant checks to every function call, whether the language semantics are desired (a separate question).

In the case of your lib, it probably makes sense to provide your own ->>, → implementations (uncertain if other threading macros would be affected) that extend the accessor semantics you desire. Callers could opt-in at the library level and have a consistent experience, e.g. using injest/->> and the like, which would be interchangeable with injest/x>> but not necessarily clojure.core/->>. Maybe concurrently submit a patch with proposed changes to the core threading macros to jira and see if it gets traction with the core devs; or post the enhancement on ask.clojure.org as an enchancement (I think that’s the non-jira means of communicating with core dev folks).

John_Newman · September 9, 2021, 1:12pm

Yeah, I think it’s possible to do in ClojureScript. But it’s discouraged. And I wouldn’t recommend letting your lib leak those callable numbers and strings into user’s application code.

Yeah, I may submit a patch/proposal to ask.clojure.org one day, after these bits settle a little. And true, providing injest/-> and injest/->> would allow easy migration back to fully lazy semantics while preserving the path navigation features.

Before spinning up the gears of Rich and the Clojure Core team on a possible proposal, I’d like to have a thorough debate about the pros and cons, just for my own understanding - I think I’ve considered most possible ergonomics, but I could have missed something.

Objections so far have really boiled down to unfamiliar aesthetics. That’s a fair default objection to have in general, but I’m arguing that this addition brings both syntactic and semantic simplicity by extending existing idioms, somuchso that it outweighs the aesthetic unfamiliarity. So if y’all have more objections outside of aesthetics, keep them coming!

joinr · September 9, 2021, 2:12pm

I think at this point, perhaps you are asking in the wrong place, and your sample size will be limited. Core Dev / language design is a useful area to discuss these things. Many decisions w.r.t. language design do boil down to aesthetics, principle of least surprise, and other intangibles. A lot of this used to be discussed on the google group for Clojure; then it migrated to Jira patch notes; I am uncertain where the meatier discussions are today (maybe slack). Alex Miller is at least attentive to ask.clojure, clojureverse, and reddit.

Perhaps the implication of having any value (or simply “more” types of primitive values) be interpreted as an applicable function is that expressions like (1 {1 :hello} → (get {1 :hello} 1 work where (1 1) → (get 1 1) will just return nil under your implicit interpretation of get (get is somewhat liberal). Should that actually be an error? It’s not novel under existing semantics though (due to get), (:a :a) returns nil too, so there is at least symmetry with existing treatment of keywords and symbols.

Does this convenience create problems for reasoning down the road? It is - by virtue of history now - idiomatic that numbers and strings do not have a function representation. If I see numbers or strings applied in the function position (or say something like clj-kondo does), do we introduce a slew of false positive errors when trying to reason about the code? Maybe this is irrelevant if you are the only one reading the code, or readers will be versed in the expanded idiom.

It would be interesting to see what people who have put much more thought into these questions would have to say.

John_Newman · September 9, 2021, 3:00pm

Yeah, I’ll probably do that at some point soon.

Okay, so I want to conduct a survey. We have a few options with regard to handling numbers in threads.

Always producing an nth is great because it allows us to index into both vectors and lists, but then we’re not as ergonomic with maps with numbers as keys (which is rare, granted)

Always producing get works for both vectors and maps, but then we can’t index into sequences flowing down the thread, which would be awesome

The best of all worlds would be calling get for map values but nth for vectors or lists, but that would require introducing a new runtime function that doesn’t come with core.

Which would you prefer?

Numbers always produce an nth (works on vecs and lists)
Numbers always produce a get (works on maps and vecs)
Numbers should produce get when arg is map, otherwise nth (works on all three, but requires new runtime fn)

0 voters

seancorfield · September 9, 2021, 7:24pm

Option 4: None of the above.

I’ve been watching this thread for a while without contributing because I think what you’re trying to do is just inherently a bad idea – but it seems common practice for folks who fall in love with macros.

Every macro introduced adds semantic complexity to the language of code that uses it. It’s something that has to be learned by each new person that encounters it and if it isn’t an official core macro, that person has to figure out where it’s coming from and then go read that library’s documentation (and hope it’s good enough).

Because ->> and transducers have different semantics, hiding that difference in a “very similar” x>> macro is kind of the worst of all worlds as far as macro usage goes: the “uncanny valley” where the surface similarity leads people to assume one behavior (because ->> is well-known and well-documented) when the actual behavior is different, and subtly so.

And on top of that, you’re proposing making your x>> / x> semantics even more misleading by silently supporting constructs that can’t be changed back to ->> / -> (because you’re giving semantics in the x world to constructs that are errors in the core world).

Where you started off – with a very simple syntactic transform – wasn’t too bad (although I would never use it in my code and would never let it come in via a PR review either) but you’re way off the deep end at this point, creating a monstrous “kitchen sink” DSL-in-a-macro.

John_Newman · September 9, 2021, 8:05pm

Thanks for the feedback, Sean!

Every macro introduced adds semantic complexity to the language of code that uses it. It’s something that has to be learned by each new person that encounters it and if it isn’t an official core macro, that person has to figure out where it’s coming from and then go read that library’s documentation (and hope it’s good enough).

Isn’t this always true though? For all new semantics?

Because ->> and transducers have different semantics, hiding that difference in a “very similar” x>> macro is kind of the worst of all worlds as far as macro usage goes: the “uncanny valley” where the surface similarity leads people to assume one behavior (because ->> is well-known and well-documented) when the actual behavior is different, and subtly so.

What then would differentiate an uncanny macro from a canny one? It’s not as if someone would be using x>> unintentionally, or by some accident or without knowing what the purpose of x>> is. It’s utility isn’t really ambiguous either. What subtle differences would we not know about when deciding to transducify a thread-last thread?

And on top of that, you’re proposing making your x>> / x> semantics even more misleading by silently supporting constructs that can’t be changed back to ->> / -> (because you’re giving semantics in the x world to constructs that are errors in the core world).

For it to be misleading, it would have to be conveying something not true. I think you’re thinking that people will have wrong expectations about how it will behave. I don’t understand why you think that though. The advertised behaviors of the new macros are not exaggerating or making things up. The eagerness semantics of transducers aren’t extremely mysterious. Regarding the new navigational capabilities, there’s not a lot of mystery there either.

Where you started off – with a very simple syntactic transform – wasn’t too bad (although I would never use it in my code and would never let it come in via a PR review either) but you’re way off the deep end at this point, creating a monstrous “kitchen sink” DSL-in-a-macro.

Kitchen-sink!?? It’s a two-line addition, you cantankerous troglodyte!

Again, these are all aesthetic objections, unrelated to technical merits or lacktherof. And I appreciate your aesthetic opinion on it too. But I wouldn’t be making the proposal if I didn’t already disagree with you on all those aesthetic judgements.

seancorfield · September 9, 2021, 8:47pm

People coming fresh to a code base that already uses it – I’m coming at this from a maintenance p.o.v. Functions are far more obvious since they are part of the core semantics.

My purely technical criticism here was about complecting multiple semantic changes, hence “Option 4: none of the above.” by which I mean “if using value X in ->> is an error, using value X in x>> should be a similar error”.

That’s why I haven’t chipped in until this last step where you asked for feedback on how/whether to extend the basic transformation to add semantics that make thread-land → x-land essentially a one-way trip (because x-land → thread-land becomes multiple transformations and they are context-sensitive/value-sensitive).

John_Newman · September 9, 2021, 9:43pm

Uy, y’all keep claiming that introducing both semantics at the same time is a technical factor and not an aesthetic one. I think they are both orthogonal to each other and to existing core behavior, and so do not ergonomically complect at all.

But I’ll make a compromise. The legacy behavior will be requireable like:

  (:require [injest.core :refer [x>>]]
    ...

Where as the new behaviors will be available like:

  (:require [injest.path :refer [x>>]]
    ...

Code shops would have to decide which they they are going to be using in their code base. I’m going to recommend injest.path, y’all can recommend injest.core.

Is that a fair compromise?

mjmeintjes · September 9, 2021, 10:31pm

Just my 2 cents - I like the idea of rewriting threading into transducers, but I would personally prefer if it was a refactoring instead of a macro. In other words, if I could run a function in my IDE to change the actual threading code to become a well formatted transducer flow, instead of it happening automatically at compile time.