Hello @vvvvalvalval
I like to highlight some of the things I see when comparing Pathom to d2q (disclaimer: Pathom author here).
What seems interesting to me about d2q is the CPU performance, I didn’t dig but by our conversation at Clojure Days I have some feelings (please correct me if I’m wrong). I remember you told me that you do a preprocessing analysis and then try to batch and parallelize the running on processing in a way to make it really fast, which is nice, and I think it kinda looks like its optimized on that direction. Pathom in other hand was never optimized for CPU speed, instead Pathom is optimized for reach and IO, and I like to elaborate what I mean by that.
One thing that makes pathom different from everything else IMO is because we embrace the idea that you might have multiple ways to reach the same data, to give an example let’s say you have an API with two different entry points, a user API where one endpoint gets the user via :user/id
, and other from the user :user/email
, in Pathom you can write 2 resolvers to address that:
(pc/defresolver user-by-id [env input]
{::pc/input #{:user/id}
::pc/output [:user/id :user/name :user/email]}
(do-your-request-here ...))
(pc/defresolver user-by-email [env input]
{::pc/input #{:user/email}
::pc/output [:user/id :user/name :user/email]}
(do-your-request-here ...))
Note we don’t have to say anything about a user type, it’s all about attributes, and that gets indexed, the index looks like this:
{::pc/index-oir
{:user/id {#{:user/email} #{com.wsscode.pathom.playground/user-by-email}},
:user/name {#{:user/email} #{com.wsscode.pathom.playground/user-by-email},
#{:user/id} #{com.wsscode.pathom.playground/user-by-id}},
:user/email {#{:user/id} #{com.wsscode.pathom.playground/user-by-id}}}}
Note that what matters is that the outputs get at the start of the index, then we have paths to get to then, and because we have no types or entities, any attribute can be connected on any other attribute, this is a very powerful concept I think.
So to me, the most important thing is archiving correctness when providing the data the user asked for, it’s not that I didn’t think about batching before, it’s just how I incrementally add those, by having a simpler implementation before I can iterate and improve performance as I see the pain points, batching was one of them. Another interesting detail about batching is that not everything is batchable… For example, if you need to rely on some service that can only bring you one record at time (think a user/ID api, that doesn’t provide you a way to request many ids at once), so in pathom we use batch only when it’s worth, so what means to be batchable to d2q when that’s the case?
This query processing can do a lot, when you request an attribute it will consider the data you have available and compute the possible paths to reach the attribute you asked for, since we embrace the idea that the system might have multiple ones, we also need strategies to select the path, the current one is by using the path with less weight (we compute the weight every time we call a resolver and store that in an atom, keep the recent average), if you work on a distributed system were you might have many ways to reach some data, and one of the sources goes in a problem the engine can rebalance the calls to some other resource that’s responding better. If a path fails we also backtrack and try other paths, while keeping a request cache that ensures no resolver is called twice with same input, it’s different from full pre-planning but is also more tolerant to failures.
I’m about to release also the new parallel parser, the parallel parser takes advantaged that the paths are computed ahead of time, so we as go processing your attributes it can know if some data is already expected to be returned by a running process or if it’s a new one, and if it is that can go in parallel.
Async support is also important I think, but that’s truer when you have to support CLJS, but even on CLJ by supporting async I can have resolvers taking advantaged of async IO so we can avoid having thread pools to call remote services.
These days I think it’s fair to describe Pathom as “big controller to connect disparate sources of information”, so yeah, optimized for that, and it’s design encourages a world were we can just establish data in terms of keywords and they know how to navigate from one to the other.
So to summarize my view, I see d2q been a very nice thing when you want speed in processing, while Pathom focus on correctly fetching data in environments where there are ambiguous paths and need a more robust fault tolerance system.
Please let me know if that makes sense to you, and if you have a different perspective.
Thanks for sharing!