cljr-unwind-all could be tweaked to do that pretty easily
Thatās a good idea, even a linter that could tell you: this could be rewritten as a transducer, would be useful.
As for the discussion, I think maybe in ClojureScript you face that need more often? That your keys are strings or numbers? Due to interop maybe?
For me, I think itās just confusing because numbers and strings are not valid functions, so how come they work in the threading macro? Now youād have to learn about the fact that this is a āspecialā threading macro that doesnāt only thread things, but it also treats numbers and strings specially.
I wouldnāt say itās an aesthetic thing, itās more an expectation from the reader thing, itās just not what I expect, so it be surprising and confusing at first. And it has the problem that if I get used to it, and Iām suddenly in a context with only the core threading macros, my muscle memory is broken, and I might again be surprisedā¦
If it saved me a lot of verbosity, I might still consider the trade off, but I donāt know, you can use get and get-in and itās barely any lengthier.
I think it be different if it was consistent across the language, but like joinr pointed out, it seems it might not be possible to implement IFn for numbers and strings.
Something else for me is more about the entire library. Like the macro as x>>
and x>
itās like, okay this is a transducing threader, I get it, it is conceptually consistent within itself: Use transducers like they were sequence functions you could thread together. But now suddenly itās likeā¦ Oh and also this other unrelated featureā¦ And now it makes me think, okay so are you going to add more unrelated features as well as the library evolves? And is this library actually better thought of as better-thread
which is more like: Threading with a whole lot of added convenience.
So ya, my vote would be, have one macro for every logically consistent feature set, and if you personally want a macro that has all the features, well create a macro with all of them combined as a better-thread
where you can say, this is a full featured threading macro with all the things I always wanted thread-first and thread-last to also support.
Okay, the new ns scheme is up on the repo, with a new release:
clj -Sdeps \
'{:deps
{net.clojars.john/injest {:mvn/version "0.1.0-alpha.12"}
criterium/criterium {:mvn/version "0.4.6"}
net.cgrand/xforms {:mvn/version "0.19.2"}}}'
As described above, you can opt into the new path navigation with:
(ns ...
(:require [injest.path :as injest :refer [x> x>> +>> =>> <>>]]
...
injest.path
also provides non-transducifying āpath threadsā +>
and +>>
so that you can restore laziness to a thread without having to remove any added path navigation semantics that may have been added.
@didibus Yeah, Iām planning on having a separate one for the parallelized semantic as well.
As for the discussion, I think maybe in ClojureScript you face that need more often? That your keys are strings or numbers? Due to interop maybe?
Iāve actually seen quite a bit of backend code as well, in the wild, having to deal with cheshirized json that, for whatever reason, couldnāt be keywordized. Super common on integrations. Data wrangling. Iād prefer all the keys be keywords but itās just not like that out there for most dev shops, for a significant slice of their code. So thisāll come in super handy for threading into data coming from json that couldnāt be keywordized.
Thank you for saying more politely, more elaborately, and more convincingly what I was trying to say
Just FYI, if youāre using a recent CLI version, you can do this instead:
clj -Sdeps \
'{:deps
{io.github.johnmn3/injest
{:git/tag "v0.1-alpha.3"
:git/sha "71a03de"}}}'
Itās good to get into the habit of using VGN - Verified Group Names - in coordinates for libraries (instead of groups like johnmn3
).
@seancorfield Nice, thanks! Iāll update the repo and above references.
You know me: on a mission to get everyone using the latest version of the official Clojure tools
Yeah, prolly shouldāa spun the lib up with that new new
goodness you put out recently Iām still catching up with latest tools.
Added lambda wrapping, per Should the threading macros handle lambdas? - Clojure Q&A (updated release coordinates above)
Wrapping lambdas makes threads more clear and concise and has the added benefit of conveying to the reader that the author intends for the anonymous function to only take one parameter. In the classical thread syntax, the reader would have to scan all the way to the end of (#(...
in order to know if an extra parameter is being passed in - so the intention of the author is more explicit. It also prevents people from creating unmaintainable abstractions involving the threading of values into a literal lambda definition, which I would rather not have to maintain.
With regard to proposals to clojure.core
, I donāt think thereās any reason to rush. We could let folks kick the tires or a few months or years, just using the lib. Whether these semantics contribute to more or less code maintainability should start to become more obvious over time.
Personally, Iām a big fan of Clojureās simplicity. If Rich and crew were not so disciplined about keeping the basic abstractions simple and non-complected, I would not have been able to build the x>>
macros. Heck, they couldnāt have made transducers so ergonomic if the 1-arity collection functions were already squatted on. Itās the foresight not complecting abstractions that prevents it from becoming another Javascript and has allowed for new, unforeseen capabilities. Sometimes adding less worse things now letās you add more better things later. So Iām very sympathetic to knee-jerk aversions to new semantics.
However, with regard to these new semantics, if you analyze their impact, you can see that weāre not barring any potential directions of semantic growth and astraction that we would want to entertain, nor are we introducing any new abstractions. Weāre simply reclaiming unusable tokens for usage in the already existing thread abstractions.
Oh, I also got rid of the :exclude [-> ->>]
requirment and introduced +>
and +>>
, which have these path thread semantics without transducifying their forms (has the lazier behavior). When x>
or x>>
are required from the injest.path
namespace, they have the path thread +>
/+>>
semantics.
Parallel =>
and =>>
Got a new update out last night. Try it out with criterium
and net.cgrand/xforms
:
clj -Sdeps \
'{:deps
{net.clojars.john/injest {:mvn/version "0.1.0-alpha.12"}
criterium/criterium {:mvn/version "0.4.6"}
net.cgrand/xforms {:mvn/version "0.19.2"}}}'
This release comes with parallel versions of x>
and x>>
which use the equals signās two horizontal bars to denote parallelism: =>
and =>>
The improvements are interesting: Instead of using sequence
on the thread, =>
and =>>
leverage core.async
's parallel pipeline
in order to execute singular or consecutive stateless transducers over a pool of threads equal to (+ 2 your-number-of-cores)
. Remaining contiguous stateful transducers dealt with in the same manner as in x>
and x>>
. It doesnāt work well for small data payloads though, so for demonstration purposes letās augment our previous example threads:
(require '[clojure.edn :as edn])
(defn work-1000 [work-fn]
(range (last (repeatedly 1000 work-fn))))
(defn ->>work [input]
(work-1000
(fn []
(->> input
(map inc)
(filter odd?)
(mapcat #(do [% (dec %)]))
(partition-by #(= 0 (mod % 5)))
(map (partial apply +))
(map (partial + 10))
(map #(do {:temp-value %}))
(map :temp-value)
(filter even?)
(apply +)
str
(take 3)
(apply str)
edn/read-string))))
(defn x>>work [input]
(work-1000
(fn []
(x>> input
(map inc)
(filter odd?)
(mapcat #(do [% (dec %)]))
(partition-by #(= 0 (mod % 5)))
(map (partial apply +))
(map (partial + 10))
(map #(do {:temp-value %}))
(map :temp-value)
(filter even?)
(apply +)
str
(take 3)
(apply str)
edn/read-string))))
Same deal as before but weāre just doing a little extra work in our thread, repeating it a thousand times and then preparing the results for handoff to the next stage of execution.
Now letās run the classical ->>
macro:
(->> (range 100)
(repeat 10)
(map ->>work)
(map ->>work)
(map ->>work)
(map ->>work)
(map ->>work)
(map ->>work)
last
count
time)
; "Elapsed time: 18309.397391 msecs"
;=> 234
Just over 18 seconds. Now letās try the x>>
version:
(x>> (range 100)
(repeat 10)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
last
count
time)
; "Elapsed time: 6252.224178 msecs"
;=> 234
Just over 6 seconds. Much better. Now letās try the parallel =>>
version:
(=>> (range 100)
(repeat 10)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
last
count
time)
; "Elapsed time: 2862.172838 msecs"
;=> 234
Under 3 seconds. Much, much better!
All those times come from Githubās browser-based vscode. When running in a local vscode instance (or in a bare repl), those above times look more like: 11812.604504
, 5096.267348
and 933.940569
msecs - a performance increase of 2 fold for the x>>
version, to an increase of 10 fold for the =>>
version, when compared to ->>
.
In the future Iād like to explore using parallel folder
instead of core.async
but this works pretty well.
After a few days or weeks - after folks have had a bit to kick the tires - Iāll release a beta version on Clojars and put out a more formal release announcement in a separate set of posts. In the mean time, please give it a whirl and let me know if you find any issues. BTW, there was a bug in the last release that made it impossible to define a thread within a function with bindings - thatās been fixed but sorry if anyone got bit by that; it would have been pretty confusing. Anyway, enjoy!
So Iāve got another alpha out, this time with parallel r/fold
's Fork/Join under the hood. Itās pretty fantastic. More robust than the pipeline
version and much less of a foot-gun when working with smaller workloads.
Bottom-line, when trying to parallelize work, if the work is too small, parallelization can actually make the whole job take longer. This is especially true of pipeline
and when used on large sequences with small workloads, the problem compounds and it becomes unusable. r/fold
is a little more forgiving in this regard, dividing sequences into more manageable partitions. Iām exploring doing automatic partitioning of sequences being passed into the pipeline
, but I havenāt come up with anything satisfying yet.
This pretty much sums up the features I wanted on the roadmap, so Iām very close to releasing a beta. My only issue left is namingā¦
Initially, I named the pipeline-thread-last operator =>>
Then I named fold-thread-last operator =>>
and renamed the pipeline-thread-last to |>>
, since I wanted fold-thread-last to be the more used operator and I thought =>>
denotes parallelism better and |>>
is a little ugly.
Then I figured fold-thread-last might be better represented as <>>
, where <
denotes a fold. So I renamed the pipeline one back to =>>
, since I thought |>>
was kinda ugly.
Itās nice though that |>>
starts with a āpipeā character, which might be better from a mnemonic perspective. OTOH, =
looks like a pipe or a parallel set of pipes.
So what do yāall think? Have a preference over names? Answer below or just respond to this poll:
-
<>>
for fold and=>>
for pipeline -
=>>
for fold and|>>
for pipeline -
=>>
for fold ando>>
or*>>
or anything (answer below)
0 voters
Anyway, the alphas are now available on clojars as well:
clj -Sdeps \
'{:deps
{net.clojars.john/injest {:mvn/version "0.1.0-alpha.12"}
criterium/criterium {:mvn/version "0.4.6"}
net.cgrand/xforms {:mvn/version "0.19.2"}}}'
Once we settle on good names Iāll probably move it into beta and make a more formal announcement on the proper channels.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.