X> & x>>: auto-transducifying thread macros (now with parallelizing |>> and =>>)

John_Newman · September 9, 2021, 10:47pm

cljr-unwind-all could be tweaked to do that pretty easily

didibus · September 10, 2021, 1:12am

That’s a good idea, even a linter that could tell you: this could be rewritten as a transducer, would be useful.

As for the discussion, I think maybe in ClojureScript you face that need more often? That your keys are strings or numbers? Due to interop maybe?

For me, I think it’s just confusing because numbers and strings are not valid functions, so how come they work in the threading macro? Now you’d have to learn about the fact that this is a “special” threading macro that doesn’t only thread things, but it also treats numbers and strings specially.

I wouldn’t say it’s an aesthetic thing, it’s more an expectation from the reader thing, it’s just not what I expect, so it be surprising and confusing at first. And it has the problem that if I get used to it, and I’m suddenly in a context with only the core threading macros, my muscle memory is broken, and I might again be surprised…

If it saved me a lot of verbosity, I might still consider the trade off, but I don’t know, you can use get and get-in and it’s barely any lengthier.

I think it be different if it was consistent across the language, but like joinr pointed out, it seems it might not be possible to implement IFn for numbers and strings.

Something else for me is more about the entire library. Like the macro as x>> and x> it’s like, okay this is a transducing threader, I get it, it is conceptually consistent within itself: Use transducers like they were sequence functions you could thread together. But now suddenly it’s like… Oh and also this other unrelated feature… And now it makes me think, okay so are you going to add more unrelated features as well as the library evolves? And is this library actually better thought of as better-thread which is more like: Threading with a whole lot of added convenience.

So ya, my vote would be, have one macro for every logically consistent feature set, and if you personally want a macro that has all the features, well create a macro with all of them combined as a better-thread where you can say, this is a full featured threading macro with all the things I always wanted thread-first and thread-last to also support.

John_Newman · September 10, 2021, 3:23am

Okay, the new ns scheme is up on the repo, with a new release:

clj -Sdeps \
    '{:deps 
      {net.clojars.john/injest {:mvn/version "0.1.0-alpha.12"}
       criterium/criterium {:mvn/version "0.4.6"}
       net.cgrand/xforms {:mvn/version "0.19.2"}}}'

As described above, you can opt into the new path navigation with:

(ns ...
  (:require [injest.path :as injest :refer [x> x>> +>> =>> <>>]]
   ...

injest.path also provides non-transducifying ‘path threads’ +> and +>> so that you can restore laziness to a thread without having to remove any added path navigation semantics that may have been added.

@didibus Yeah, I’m planning on having a separate one for the parallelized semantic as well.

As for the discussion, I think maybe in ClojureScript you face that need more often? That your keys are strings or numbers? Due to interop maybe?

I’ve actually seen quite a bit of backend code as well, in the wild, having to deal with cheshirized json that, for whatever reason, couldn’t be keywordized. Super common on integrations. Data wrangling. I’d prefer all the keys be keywords but it’s just not like that out there for most dev shops, for a significant slice of their code. So this’ll come in super handy for threading into data coming from json that couldn’t be keywordized.

seancorfield · September 10, 2021, 3:25am

Thank you for saying more politely, more elaborately, and more convincingly what I was trying to say

seancorfield · September 10, 2021, 3:29am

Just FYI, if you’re using a recent CLI version, you can do this instead:

clj -Sdeps \
    '{:deps 
      {io.github.johnmn3/injest 
       {:git/tag "v0.1-alpha.3" 
        :git/sha "71a03de"}}}'

It’s good to get into the habit of using VGN - Verified Group Names - in coordinates for libraries (instead of groups like johnmn3).

John_Newman · September 10, 2021, 3:37am

@seancorfield Nice, thanks! I’ll update the repo and above references.

seancorfield · September 10, 2021, 3:42am

You know me: on a mission to get everyone using the latest version of the official Clojure tools

John_Newman · September 10, 2021, 3:49am

Yeah, prolly should’a spun the lib up with that new new goodness you put out recently I’m still catching up with latest tools.

John_Newman · September 11, 2021, 6:43pm

Added lambda wrapping, per Should the threading macros handle lambdas? - Clojure Q&A (updated release coordinates above)

Wrapping lambdas makes threads more clear and concise and has the added benefit of conveying to the reader that the author intends for the anonymous function to only take one parameter. In the classical thread syntax, the reader would have to scan all the way to the end of (#(... in order to know if an extra parameter is being passed in - so the intention of the author is more explicit. It also prevents people from creating unmaintainable abstractions involving the threading of values into a literal lambda definition, which I would rather not have to maintain.

With regard to proposals to clojure.core, I don’t think there’s any reason to rush. We could let folks kick the tires or a few months or years, just using the lib. Whether these semantics contribute to more or less code maintainability should start to become more obvious over time.

Personally, I’m a big fan of Clojure’s simplicity. If Rich and crew were not so disciplined about keeping the basic abstractions simple and non-complected, I would not have been able to build the x>> macros. Heck, they couldn’t have made transducers so ergonomic if the 1-arity collection functions were already squatted on. It’s the foresight not complecting abstractions that prevents it from becoming another Javascript and has allowed for new, unforeseen capabilities. Sometimes adding less worse things now let’s you add more better things later. So I’m very sympathetic to knee-jerk aversions to new semantics.

However, with regard to these new semantics, if you analyze their impact, you can see that we’re not barring any potential directions of semantic growth and astraction that we would want to entertain, nor are we introducing any new abstractions. We’re simply reclaiming unusable tokens for usage in the already existing thread abstractions.

Oh, I also got rid of the :exclude [-> ->>] requirment and introduced +> and +>>, which have these path thread semantics without transducifying their forms (has the lazier behavior). When x> or x>> are required from the injest.path namespace, they have the path thread +>/+>> semantics.

John_Newman · September 16, 2021, 2:16pm

Parallel `=>` and `=>>`

Got a new update out last night. Try it out with criterium and net.cgrand/xforms:

clj -Sdeps \
    '{:deps 
      {net.clojars.john/injest {:mvn/version "0.1.0-alpha.12"}
       criterium/criterium {:mvn/version "0.4.6"}
       net.cgrand/xforms {:mvn/version "0.19.2"}}}'

This release comes with parallel versions of x> and x>> which use the equals sign’s two horizontal bars to denote parallelism: => and =>>

The improvements are interesting: Instead of using sequence on the thread, => and =>> leverage core.async's parallel pipeline in order to execute singular or consecutive stateless transducers over a pool of threads equal to (+ 2 your-number-of-cores). Remaining contiguous stateful transducers dealt with in the same manner as in x> and x>>. It doesn’t work well for small data payloads though, so for demonstration purposes let’s augment our previous example threads:

(require '[clojure.edn :as edn])

(defn work-1000 [work-fn]
  (range (last (repeatedly 1000 work-fn))))

(defn ->>work [input]
  (work-1000
   (fn []
     (->> input
          (map inc)
          (filter odd?)
          (mapcat #(do [% (dec %)]))
          (partition-by #(= 0 (mod % 5)))
          (map (partial apply +))
          (map (partial + 10))
          (map #(do {:temp-value %}))
          (map :temp-value)
          (filter even?)
          (apply +)
          str
          (take 3)
          (apply str)
          edn/read-string))))  

(defn x>>work [input]
  (work-1000
   (fn []
     (x>> input
          (map inc)
          (filter odd?)
          (mapcat #(do [% (dec %)]))
          (partition-by #(= 0 (mod % 5)))
          (map (partial apply +))
          (map (partial + 10))
          (map #(do {:temp-value %}))
          (map :temp-value)
          (filter even?)
          (apply +)
          str
          (take 3)
          (apply str)
          edn/read-string))))

Same deal as before but we’re just doing a little extra work in our thread, repeating it a thousand times and then preparing the results for handoff to the next stage of execution.

Now let’s run the classical ->> macro:

(->> (range 100)
     (repeat 10)
     (map ->>work)
     (map ->>work)
     (map ->>work)
     (map ->>work)
     (map ->>work)
     (map ->>work)
     last
     count
     time)
; "Elapsed time: 18309.397391 msecs"
;=> 234

Just over 18 seconds. Now let’s try the x>> version:

(x>> (range 100)
     (repeat 10)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     last
     count
     time)
; "Elapsed time: 6252.224178 msecs"
;=> 234

Just over 6 seconds. Much better. Now let’s try the parallel =>> version:

(=>> (range 100)
     (repeat 10)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     (map x>>work)
     last
     count
     time)
; "Elapsed time: 2862.172838 msecs"
;=> 234

Under 3 seconds. Much, much better!

All those times come from Github’s browser-based vscode. When running in a local vscode instance (or in a bare repl), those above times look more like: 11812.604504, 5096.267348 and 933.940569 msecs - a performance increase of 2 fold for the x>> version, to an increase of 10 fold for the =>> version, when compared to ->>.

In the future I’d like to explore using parallel folder instead of core.async but this works pretty well.

After a few days or weeks - after folks have had a bit to kick the tires - I’ll release a beta version on Clojars and put out a more formal release announcement in a separate set of posts. In the mean time, please give it a whirl and let me know if you find any issues. BTW, there was a bug in the last release that made it impossible to define a thread within a function with bindings - that’s been fixed but sorry if anyone got bit by that; it would have been pretty confusing. Anyway, enjoy!

John_Newman · September 24, 2021, 12:14am

So I’ve got another alpha out, this time with parallel r/fold's Fork/Join under the hood. It’s pretty fantastic. More robust than the pipeline version and much less of a foot-gun when working with smaller workloads.

Bottom-line, when trying to parallelize work, if the work is too small, parallelization can actually make the whole job take longer. This is especially true of pipeline and when used on large sequences with small workloads, the problem compounds and it becomes unusable. r/fold is a little more forgiving in this regard, dividing sequences into more manageable partitions. I’m exploring doing automatic partitioning of sequences being passed into the pipeline, but I haven’t come up with anything satisfying yet.

This pretty much sums up the features I wanted on the roadmap, so I’m very close to releasing a beta. My only issue left is naming…

Initially, I named the pipeline-thread-last operator =>>

Then I named fold-thread-last operator =>> and renamed the pipeline-thread-last to |>>, since I wanted fold-thread-last to be the more used operator and I thought =>> denotes parallelism better and |>> is a little ugly.

Then I figured fold-thread-last might be better represented as <>>, where < denotes a fold. So I renamed the pipeline one back to =>>, since I thought |>> was kinda ugly.

It’s nice though that |>> starts with a “pipe” character, which might be better from a mnemonic perspective. OTOH, = looks like a pipe or a parallel set of pipes.

So what do y’all think? Have a preference over names? Answer below or just respond to this poll:

<>> for fold and =>> for pipeline
=>> for fold and |>> for pipeline
=>> for fold and o>> or *>> or anything (answer below)

0 voters

Anyway, the alphas are now available on clojars as well:

clj -Sdeps \
    '{:deps 
      {net.clojars.john/injest {:mvn/version "0.1.0-alpha.12"}
       criterium/criterium {:mvn/version "0.4.6"}
       net.cgrand/xforms {:mvn/version "0.19.2"}}}'

Once we settle on good names I’ll probably move it into beta and make a more formal announcement on the proper channels.

system · March 25, 2022, 12:15pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

X> & x>>: auto-transducifying thread macros (now with parallelizing |>> and =>>)

Parallel => and =>>

Parallel `=>` and `=>>`