Sorting a rich sequence without (sort)

zcaudate · March 15, 2022, 7:02am

Old news or not, there’s a tremendous amount of jargon to it that leads me to think that there was a marketing purpose to it. It has strong proponents of it in many languages such as Js, Julia, Rust, etc… extolling the virtues of it and rubbishing everything else.

To me, it’s kinda cultish. I have no problem with cults or cult members and you shouldn’t be so defensive about it either. But yeah, I avoid using transducers for the reasons I described.

outrovurt · March 15, 2022, 7:33am

(Following is a reply to the OP, and is has pretty much nothing to do with the above entertainment.)

I’m going to try to summarize everything I know about transducers.

a large number of core sequence transformation functions work by taking a sequence and producing a lazy sequence (f.e. map, filter, mapcat, take*, drop*)
if you want to perform more than one of these operations on a single sequence without transducers, you would have to feed the result of the first into the subsequent, and so on. For example:

(->> coll
     (map map-fn)
     (filter filter-fn))

(NOTE: just to be absolutely clear, here I am using the ->> macro for convenience, so each call in the stack is an ordinary call and not a transducer.)

The above works absolutely fine functionally, however it also creates one intermediate sequence (albeit a lazy one) for each operation.
Another possibly unwanted consequence of using these lazy-sequence generating functions is that they only generate lazy-sequences. What if you wanted your output to be a vector, map, string and so on? You then have to coerce the result, usually like so:

(->> coll
     (map map-fn)
     (filter filter-fn)
     ,,,
     (into []))

So transducers solve these two problems: they bypass the generation of intermediate collections, and they allow you to choose any output type you feel appropriate. But what exactly is a transducer, and how do you go about creating one? I’ll answer these in reverse order.

Most sequence functions which return a lazy sequence will also return a transducer if you call that function without the final argument (usually the sequence). You can then use that transducer (itself a function) by passing it to one of a number of functions which accepts transducers. The most important functions which I use are:

transduce
into
eduction
sequence
and of course reduce (I’ll explain this below)

Which function you use depends on the result you are after. Let’s say for example that you want to increment a range of numbers and return the result as a vector. Without transducers, you would write:

(into []
      (map inc (range 10)))

which is just the standard 2-arity call to into which takes the lazy sequence generated by the map function, iterates over each item, and adds it to our empty vector.

But to turn the above into something which accepts a transducer, in this case all we need to do is to spit our sequence out of the call to map to the right:

(into []
      (map inc)
      (range 10))

We now invoke the 3-arity version of into, which accepts a transducer (returned by calling (map inc)) and the range. And what this does is iterates directly over range (rather than the lazy sequence which we used previously), calls inc on each item, and calls conj to add that item to the vector.

I could rewrite the above using transduce as follows:

(transduce (map inc) ;;transducer
           conj ;; reducing function
           [] ;; initial value
           (range 10) ;; sequence
)
;; [1 2 3 4 5 6 7 8 9 10]

It’s a little more verbose by does the same thing, and in fact is what the 3-arity form of into calls internally. The above generates a vector of 10 items. But what if I wanted to generate a string instead? I would use almost the same code, swapping out conj for str, and [] for "", like so:

(transduce (map inc) ;;transducer
           str ;; reducing function
           "" ;; initial value
           (range 10) ;; sequence
)
;; "12345678910"

Et voila!

So a transducer is something which performs some operation on an item of a sequence, and which then takes the result and “adds” it to an running total (which begins with an initial value, in the first case [], in the second case "") using a specified operation (in the first case conj, in the second case str.) I’ve quoted “adds” on purpose, in this case it happens to be correct, however it can actually do anything it wants with the result. This is actually a reducing function, which accepts the running total (often called accumulator) and an item, in this case the result of the operation on the original item.

While (map inc) generates a mapping transducer, there are several other built-in transducers available. For example (filter even?) will generate a filtering transducer, and this will function by calling the supplied predicate on each item of the sequence, and only if that predicate is successful will it then call the reducing function (conj, str) to add the original item to the resultant collection (initially [] or "").

[16/03/2022: EDIT]
Now to answer the question “What is a transducer?” This is a little trickier to wrap ones’ head around. A transducer is a function which accepts one reducing function (say conj, str, essentially a function we will use to combine the result of our operation with a collection) and returns another reducing function, one which calls our original reducing function and which performs our original intended operation, e.g. mapping inc, filtering even? or whatever. Why? This is best illustrated by revisiting our above example:

calling (map inc) creates a transducer which can be used to consume any seq and which will output every item in that seq incremented by 1
how it will output that item can be chosen at the time of transduction
in our example we have chosen to output it to a vector, so we call transduce passing in our transducer function (map inc), conj (our reducing function) and an empty vector (our initial value)

The real magic happens inside transduce itself (which by the way is a ridiculously simple function given what transducers are and do), with the call to (transducer reducing-function), which in our example becomes ((map inc) conj). (map inc) (our transducer) takes conj (our reducing function) and transforms it into a new reducer function, one which is then passed to reduce together with an initial value of [], and it is that call to reduce which actually performs a) the inc on each item of the sequence we pass to it, and b) the conjing of the resultant item onto our accumulator, which is initially [].

So any given transducer contains the essence of a given operation we want to do, such as mapping inc, filtering even? or whatever, and the ability to combine it with a value (often a collection) we supply to it during transduction via a combining operation (e.g. conj) which itself is a reducing function. If you look at the source for conj, str, even arithmetic operators such as +, - and so on, you will note that they all have 0-arity and 1-arity versions as well as the 2-arity versions we expect. That’s because they are all, wait for it, yes, reducing functions.

the 0-arity version generally returns an empty collection, nil-value or a general default value; this is the initial value of our reduction
the 1-arity version which is often the identity function, but this depends on the operation
the 2-arity version, which accepts a collection and an item, performs some operation involving the two, and generally returns the collection, perhaps with the item added to it somewhere

When writing reducing functions ourselves, most of us will just create a 2-arity version because that is all the reduce function requires. However if we want to call reduce with one of our own functions but also wanted to perform some operation on the sequence beforehand, we can’t just pass a transducer directly to reduce.

For example, before I learnt about transducers, I would write this:

(->> [:a :b :c :d :e]
     (map-indexed vector)
     (reduce (fn [result [idx item]]
               (conj result [idx item]))
             []))
;; [[0 :a] [1 :b] [2 :c] [3 :d] [4 :e]]

This works, but creates an intermediate sequence which is the result of (map-indexed vector coll).

Immediately after learning about transducers, I thought I could write something like this:

(transduce (map-indexed vector)
           (fn [result [idx item]]
             (conj result [idx item]))
           []
           [:a :b :c :d :e])
;; [[0 :a] [1 :b] [2 :c] [3 :d] [4 :e] [nil nil]]

After all, isn’t that just a reducing function after the transducer? Well, yes it sort of is, but it isn’t quite “complete”, at least not as far as how transduce works is concerned. While the above certainly runs, you can see the result is not quite correct, as it adds an extra [nil nil] entry at the end of our sequence. To fix this, we wrap the reducing function in completing like so:

(transduce (map-indexed vector)
           (completing
            (fn [result [idx item]]
              (conj result [idx item])))
           []
           [:a :b :c :d :e])
;; [[0 :a] [1 :b] [2 :c] [3 :d] [4 :e]]

Which gives us the correct result.

The whole point of completing is to take a 2-arity reducer function that we would normally pass to reduce, and add 0-, and 1-arity versions to it, thus making it fully compatible with transduce.

[END EDIT]

eduction does something similar, allowing us just to pass our sequence through any number of transducers first then two reduce:

(->> coll
     (eduction (map-indexed vector))
     (reduce (fn [result [idx item]]
               ,,,)
             []))

eduction basically works by creating an Eduction object which implements IReduce, so when the result of the eduction above is passed to the reduce, it basically turns the operation into a transduce and once again avoids the creation of an intermediate lazy sequence. So on each iteration of the reduce function, the original item in the collection is transformed into [idx item], and that is passed to the reduce function. joinr’s function does something which is functionally identical, he just uses completing to turn the reducing function into a transducer because it wouldn’t work otherwise.

outrovurt · March 15, 2022, 7:54am

Of course it is: platform-specific code using mutable data structures is always going to be faster than using immutable structures. There is always this trade-off we have to make, and it’s down to the individual to determine what they need to use.

Most of my code isn’t performance-critical, though some certainly is. So while I use immutable structures in the majority of cases, as and when I need performance, I generally follow these optimization steps in this order:

convert to/from transients (this is faster than using atoms/volatiles)
using native data structures, e.g. #js[] in this case for left/right only, together with native interop
write something in the native platform

Only the first option is platform agnostic. Most of the time the first or second is sufficient, and failing that, the third is always available.

zcaudate · March 15, 2022, 8:35am

Loved your write up of transducers btw.

That’s why I had the pseudo-lisp code. This is not a concept native to js.

I’m glad you mentioned tradeoffs. Readability is a tradeoff. Debugging is also a tradeoff. How must faster does transducer code into an immutable vector need to be in order for you to pick it over the more readable, less piecemeal, and generally more understood option? When would you go for transients? I mean in all seriousness, if you are considering transients due to performance reasons for a library outside of clojure.core the better option still would be to start with mutable data and work backwards to provide a porcelain api compatible with Clojure.

The case made for transducers is that the same concept can be used for immutable datastructures as for streams. I get that. What I’m saying is that iterators provide a better, simpler, faster and less marketed option.

joinr · March 15, 2022, 9:05am

tremendous amount of jargon

Mentioning monads and then complaining of jargon is an unexpected juxtaposition.

that leads me to think that there was a marketing purpose to it.

Sounds like a conspiracy. Either that, or it’s just a “Good Idea” that - by your observation - a seemingly diverse group [that you are not a part of] seemed to partake in. Probably a conspiracy though. Maybe carpenters are “cultists” in your world too because they seemingly all choose to use hammers. Maybe anything with any degree of popularity is indicative of cult membership.

I have no problem with cults or cult members…

This is where you made the heel turn a bit too obvious. I know of no place where being associated with or called a cultist is a form of praise or compliment or even a benign observation. Under your customs and courtesies, I guess I could say “you seem like prime Jonestown material” without recrimination. Or maybe “you would fit right in with the Manson Family.” If not, then you shouldn’t pretend to be unoffensive while presenting oblique remarks and insults and then kind of scamper to the side like that; it’s juvenile, clumsy, and just disingenuous in the face of considered technical arguments. It appears sophistry adjacent.

you shouldn’t be so defensive about it either

I posted code examples and entertained questions and initially benign technical arguments. Then the implication of being a cultist because I disagreed with you. I think you came in with an axe to grind. Conversely: you shouldn’t be so offensive about it; it’s like Rich Hickey spit in your coffee and there’s an unspoken vendetta against some inert tooling. Strange, but eerily familiar.

zcaudate · March 15, 2022, 11:28am

Seriously? I wasn’t expecting this.

I’m really thankful for Rich and Clojure - and looking back to the incident you are implying - especially so after he opened my eyes to the competitive nature of consultancies. I also learnt from Rich that good code shouldn’t be so freely shared, especially if one expects to make money from it. I guess some people just can’t let things go.

I’m slagging off transducers and making fun of it’s marketing, as someone with an opinion should. You may not think transducers are overblown, overhyped and a little bit cultish. Fine. I do. And you haven’t really done anything to convince me otherwise besides the fact that people drinking the transducer Kool Aid will do anything to defend it.

outrovurt · March 15, 2022, 12:20pm

I’m not getting dragged into your quarrel or whatever you want to call it, but I really would like to know, where is all of this coming from exactly? Specifically, what do you base your conclusions such as “overblown, overhyped and a little bit cultish”, and “people drinking the transducer Kool Aid will do anything to defend it” on? Is what you are claiming specific to only transducers, or does it extend to Clojure, to immutable data structures, or even to Functional Programming, or to any subset of programming or computer science which a small but enthusiastic group find really really cool?

Whatever it is you are claiming, can’t you say the same about f.e. OO programming, Bitcoin/Ethereum/blockchain, anything that Apple has ever released, graph databases, and so on?

On your last point that I have quoted, if people “drinking the transducer Kool Aid will do anything to defend it”, then what exactly is wrong with “it”, and more importantly how would you fix it? I really have no idea since I’m very much in the “transducers are a really cool thing” camp and I have never come across any problems as such, either wrapping my head around them or using them on an ongoing basis. Looking through this entire thread, I haven’t actually seen anything you’ve posted which describes in any detail whatsoever what exactly is wrong with transducers, so I’d be really happy if you could share your thoughts on this.

zcaudate · March 15, 2022, 1:19pm

Reducers are great, immutable collections were a game changer (although I don’t use it for performant code), the emacs integration is awesome. The functional programming (with immutable data) is great and Vars/Thread binding is pretty awesome, edn is great.

The subset of clojure that I try to avoid are cljs, clojure.spec and transducers.

You’re right, there’s nothing wrong with transducers. I see it as a very clever hack of the 1-arity function which is doing nothing anyway in a reducing context. But it’s not the way I want to wire my code. Iterators are really overlooked at I wanted to highlight that fact.

The examples you gave are single stage transducer examples. Multistage transducers, built using comp are extremely difficult to debug - especially when one or more stages in the pipeline involve a caching layer (which most streaming apps need). It’s much easier to reason about iterator composition than transducer composition.

Like I said before, iterators compose, they work on collections - both mutable and persistent, they are simple, faster, and more performant, especially in streaming code and are fundamental to js, python, lua etc. It’s something that’s worthwhile to know.

zcaudate · March 15, 2022, 2:05pm

Don’t you talk about ETH like that you heathen.

But yes. You’re exactly right.

outrovurt · March 15, 2022, 2:39pm

Understood, and all fair enough, but can you explain just one thing to me:

What exactly do you mean by this?

zcaudate · March 15, 2022, 3:25pm

Reducers have a initialiser(0-arity) and and step(2-arity). Transducers add a 1-arity function which does not allow the function to curry properly. It is cool and quite mind-bending… but that’s not what I want in a function.

the best way I can describe it is that you’re threading the continuation for the rest of the pipeline through the 1-arity function. (map inc) is not (partial map inc). It’s a completely different beast altogether.

if you compare iterator code for the map equivalent, you can see that it is extremely straight forward and does not rely on the 1-arity threading mechanism I described. In fact, when compared to the clojure transducer implementation, the guava iterator java implementation are shorter and more succinct than Clojure’s.

github.com

google/guava/blob/master/android/guava/src/com/google/common/collect/Iterators.java#L814-L832

      
        
            /**
             * Returns a view containing the result of applying {@code function} to each element of {@code
             * fromIterator}.
             *
             * <p>The returned iterator supports {@code remove()} if {@code fromIterator} does. After a
             * successful {@code remove()} call, {@code fromIterator} no longer contains the corresponding
             * element.
             */
            public static <F extends @Nullable Object, T extends @Nullable Object> Iterator<T> transform(
                Iterator<F> fromIterator, Function<? super F, ? extends T> function) {
              checkNotNull(function);
              return new TransformedIterator<F, T>(fromIterator) {
                @ParametricNullness
                @Override
                T transform(@ParametricNullness F from) {
                  return function.apply(from);
                }
              };
            }

outrovurt · March 16, 2022, 6:56am

Ah bugger, I made a massive mistake in my original explanation concerning transducers themselves. For anyone who has read it please re-read the section marked by EDIT and END EDIT markers as I really don’t want to mislead anyone with incorrect information.

joinr · March 16, 2022, 9:12am

Might be worth mentioning John Newman’s library injest which among other things, provides threading macros that will automatically detects and composes efficient transducers from familiar lazy threading pipelines.

While simplistic, this is identical to the earlier transducer implementations and creates no intermediate collections. It holds for arbitrarily complex threading pipelines, and can mix-and match transducers and non-transducers (falling back minimally to sequences where necessary).
Note the use of partition-all instead of partition, as partition-all provides a transducing function variant.

(require '[injest.path :refer [x>>]])

(def raw '[A D B E C F])

(x>> raw
     (partition-all 2)
     (reduce (fn [[l r] [x y]] [(conj l x) (conj r y)]) [[] []]))

;;[[A B C] [D E F]]

There is even an implementation that coerces these into parallel folds using reducers.

outrovurt · March 16, 2022, 2:01pm

I’m going to have to take your word for it that it’s a bad thing - at least for you and perhaps others who have a similar issue with it. I however don’t have the same problems, but possibly because my use cases for transducers are considerably simpler than yours.

system · September 15, 2022, 2:01am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.