Releasing d2q, a simple, efficient, generally applicable engine for implementing graph-pulling API servers in Clojure. Feedback wanted!

I’m happy to release d2q, a library for implementing graph-pulling server backends, playing in the same space as Lacinia or Pathom, with an emphasis on simplicity, generality and performance:

https://github.com/vvvvalvalval/d2q

This library has been used internally on a previous project for months, I’ve just removed the application-specific parts and added documentation.

Before developing this any further, I’m now looking for feedback:

  • Do you find the value proposition clear and appealing?
  • Any criticism of the design choices?
  • I see many potential directions for future development, which are important to you? (enhance query expressiveness with new features / better docs / add specs / integration with GraphQL or Om.Next / more helpers for integrating data sources)

Cheers,

7 Likes

Pathom uses Om.Next’s query syntax, inspired from Datomic Pull, which in my opinion sacrifices some extensibility for the sake of concision

I disagree.

I did look over some of your examples and except for one thing your query syntax seems to be pretty much covered by Om/Fulcro/Pathom query syntax only less verbose.

[{:d2q-fcall-field :myblog/post-by-id
  :d2q-fcall-key "p1"
  :d2q-fcall-arg {:myblog.post/id "why-french-bread-is-the-best"}
  :d2q-fcall-subquery [:myblog.post/title
                       {:d2q-fcall-field :myblog.post/author
                        :d2q-fcall-subquery
                        [:myblog.user/id
                         :myblog.user/first-name
                         :myblog.user/last-name
                         :myblog.user/full-name
                         :myblog.user/n-posts]}
                       {:d2q-fcall-field :myblog.publication/last-n-comments
                        :d2q-fcall-key :last-5-comments
                        :d2q-fcall-arg {:n 5}
                        :d2q-fcall-subquery [:myblog.comment/id
                                             :myblog.comment/content
                                             :myblog.comment/published-at
                                             {:d2q-fcall-field :myblog.comment/about
                                              :d2q-fcall-subquery [:myblog.post/id
                                                                   :myblog.comment/id]}]}]}
 {:d2q-fcall-field :myblog/user-by-id
  :d2q-fcall-key "no-such-user"
  :d2q-fcall-arg {:myblog.user/id "USER-ID-THAT-DOES-NOT-EXIST"}
  :d2q-fcall-subquery [:myblog.user/id
                       :myblog.user/first-name
                       :myblog.user/last-name]}
 {:d2q-fcall-field :myblog/user-by-id
  :d2q-fcall-key "invalid-fcall"
  :d2q-fcall-subquery [:myblog.user/id
                       :myblog.user/first-name
                       :myblog.user/last-name]}]

Pathom

[{(:myblog/post-by-id {:myblog.post/id "why-french-bread-is-the-best"} "p1")
  [:myblog.post/title
   {:myblog.post/author
    [:myblog.user/id
     :myblog.user/first-name
     :myblog.user/last-name
     :myblog.user/full-name
     :myblog.user/n-posts]}
   {(:myblog.publication/last-n-comments {:n 5} :last-5-comments)
    [:myblog.comment/id
     :myblog.comment/content
     :myblog.comment/published-at
     {:myblog.comment/about
      [:myblog.post/id
       :myblog.comment/id]}]}]}
 {(:myblog/user-by-id {:myblog.user/id "USER-ID-THAT-DOES-NOT-EXIST"} "no-such-user")
  [:myblog.user/id
   :myblog.user/first-name
   :myblog.user/last-name]}
 {(:myblog/user-by-id {} "invalid-fcall")
  [:myblog.user/id
   :myblog.user/first-name
   :myblog.user/last-name]}]

The only thing not currently supported by Pathom/Fulcro is the alias support and I wondered about that myself recently and I think it should be an easy addition. Every parameterized query already takes a map argument and could either take a secondary alias parameter or just a namespaced keyword arg. Its also extensible in this way since you can put anything in there.

(<query-id> {<key> <val>, ...} <alias>)
(<query-id> {:query/alias <val>, <key> <val>, ...})
(<query-id> {:myblog.query/extension <val>, <key> <val>, ...})

Completely ignoring the implementations I wonder what other extensibility you’d want?

The Om Query Syntax is not perfect since I hate that you have to quote lists and symbols when writing it directly in Clojure Code but it is a pretty good baseline “wire” query format?

Well, it is just my opinion, so I’m not surprised people disagree :slight_smile:

I’m surprised you would say that, I would have said d2q’s syntax is actually more verbose. And that’s fine - an explicit choice of d2q is that it will not compromise on programmability for concision. That’s not to say that concision is not an important concern - rather that it should be addressed by another component than d2q (maybe another input language like GraphQL or Om Next, or maybe a little Clojure or EDN DSL like #d2q/fcall [:myblog/post-by-id {:myblog.post/id "why-french-bread-is-the-best"} "p1" [:myblog.post/title ,,,]]).

Redarding extensibility, I would argue that having one flat map is more straightforward to extend and manipulate programmatically than a tuple where maybe you have ann additional element or maybe the argument element is a map and there a special key in it although this has not much to do with the actual argument etc.

Be even without going down to that, how would you annotate a Pathom query ? I can see plenty of use cases for that - static analysis for security, hints for optimizing executions, ‘anchors’ for mutually recursive queries, etc.

That’s a bit of a general issue I have with Pathom - a lot of nice features are there, but were really added as an afterthought, and because there wasn’t really room for them initially it makes the API somewhat irregular. For instance, Pathom’s eventually got batching resolution, but you have to make extra case analyses such as “You must detect if the input is a sequence and also be ready to handle the case when it is not”. In contrast, d2q’s always batching, and you can explicitly add a generic helper for when you don’t want to compute in batches.

Yeah sorry. Meant to say that Pathom is less verbose.

I only meant to comment on the Query Syntax itself. I did not look at the implementation and so far I have only used Pathom Connect in a setup where I don’t care much about performance. In that case it almost feels like magic.

Hello @vvvvalvalval :slight_smile:

I like to highlight some of the things I see when comparing Pathom to d2q (disclaimer: Pathom author here).

What seems interesting to me about d2q is the CPU performance, I didn’t dig but by our conversation at Clojure Days I have some feelings (please correct me if I’m wrong). I remember you told me that you do a preprocessing analysis and then try to batch and parallelize the running on processing in a way to make it really fast, which is nice, and I think it kinda looks like its optimized on that direction. Pathom in other hand was never optimized for CPU speed, instead Pathom is optimized for reach and IO, and I like to elaborate what I mean by that.

One thing that makes pathom different from everything else IMO is because we embrace the idea that you might have multiple ways to reach the same data, to give an example let’s say you have an API with two different entry points, a user API where one endpoint gets the user via :user/id, and other from the user :user/email, in Pathom you can write 2 resolvers to address that:

(pc/defresolver user-by-id [env input]
  {::pc/input #{:user/id}
   ::pc/output [:user/id :user/name :user/email]}
  (do-your-request-here ...))

(pc/defresolver user-by-email [env input]
  {::pc/input #{:user/email}
   ::pc/output [:user/id :user/name :user/email]}
  (do-your-request-here ...))

Note we don’t have to say anything about a user type, it’s all about attributes, and that gets indexed, the index looks like this:

{::pc/index-oir
 {:user/id    {#{:user/email} #{com.wsscode.pathom.playground/user-by-email}},
  :user/name  {#{:user/email} #{com.wsscode.pathom.playground/user-by-email},
               #{:user/id}    #{com.wsscode.pathom.playground/user-by-id}},
  :user/email {#{:user/id} #{com.wsscode.pathom.playground/user-by-id}}}}

Note that what matters is that the outputs get at the start of the index, then we have paths to get to then, and because we have no types or entities, any attribute can be connected on any other attribute, this is a very powerful concept I think.

So to me, the most important thing is archiving correctness when providing the data the user asked for, it’s not that I didn’t think about batching before, it’s just how I incrementally add those, by having a simpler implementation before I can iterate and improve performance as I see the pain points, batching was one of them. Another interesting detail about batching is that not everything is batchable… For example, if you need to rely on some service that can only bring you one record at time (think a user/ID api, that doesn’t provide you a way to request many ids at once), so in pathom we use batch only when it’s worth, so what means to be batchable to d2q when that’s the case?

This query processing can do a lot, when you request an attribute it will consider the data you have available and compute the possible paths to reach the attribute you asked for, since we embrace the idea that the system might have multiple ones, we also need strategies to select the path, the current one is by using the path with less weight (we compute the weight every time we call a resolver and store that in an atom, keep the recent average), if you work on a distributed system were you might have many ways to reach some data, and one of the sources goes in a problem the engine can rebalance the calls to some other resource that’s responding better. If a path fails we also backtrack and try other paths, while keeping a request cache that ensures no resolver is called twice with same input, it’s different from full pre-planning but is also more tolerant to failures.

I’m about to release also the new parallel parser, the parallel parser takes advantaged that the paths are computed ahead of time, so we as go processing your attributes it can know if some data is already expected to be returned by a running process or if it’s a new one, and if it is that can go in parallel.

Async support is also important I think, but that’s truer when you have to support CLJS, but even on CLJ by supporting async I can have resolvers taking advantaged of async IO so we can avoid having thread pools to call remote services.

These days I think it’s fair to describe Pathom as “big controller to connect disparate sources of information”, so yeah, optimized for that, and it’s design encourages a world were we can just establish data in terms of keywords and they know how to navigate from one to the other.

So to summarize my view, I see d2q been a very nice thing when you want speed in processing, while Pathom focus on correctly fetching data in environments where there are ambiguous paths and need a more robust fault tolerance system.

Please let me know if that makes sense to you, and if you have a different perspective.

Thanks for sharing!

2 Likes

It looks a lot like a load balancer or a service mesh like Linkerd. Did you take inspiration from similar tools when designing this behavior?

Hello, I heard about Linkerd before but I never had the chance to use it. I think the closest inspiration I get can be from Finagle that we use in the company. The current implementation was more a natural need given the engine has to choose a path at some point, so I had to come up with some way to do it. Being honest the current heuristic is not so great, here is how it works:

  1. Every resolver starts with weight 1 (this is recorded per instance)
  2. Once a resolver is called, it’s execution time is recorded and updated in the map using the formula: new-value = (old-value + last-time) * 0.5
  3. If a resolver call throws an exception, double it’s weight
  4. Every time we mention some resolver in a path calculation its weight is reduced by one.

Those numbers are quite arbitrary, they have been working fine in my case but I can see that this is not very general (needs custom tuning by case), I hope we have a better heuristic in the future, suggestions are very welcome; at the end that’s just a sort function to pick a path, enabling users to extend and improve this one over time (and have different heuristics depending on each system properties).

Right now it’s not customizable, but can be made with a few code changes.

Hi Wilker, glad to see you commenting on this - the comparison to Pathom was a tricky one to write, so I’m glad you’re reviewing it. I see 2 aspects in your comment: performance and resolution strategies.

Performance

Performance has been an important point of consideration in d2q’s design, but not so much CPU performance as latency, load and contention. Most data-resolution workloads will be IO-bound, not CPU-bound; so there’s not much to gain by optimizing CPU speed.

The main performance issues with a naive (synchronous and non-batching) data resolution algorithm are latency and load:

  • latency: things get computed serially which could be computed in parallel
  • load: many threads are blocked by pending queries (increasing memory usage), the database server gets overwhelmed by tons of small queries, etc.

d2q prevents these issues by encouraging batching and asynchronous data resolution:

  • asynchrony makes independent computations automatically parallel, reducing latency
  • asynchrony prevents threads from being blocked by IO, reducing load
  • batching keeps the number of database queries and IO roundtrips small, reducing both latency and load

It’s important to realize that async and batching resolution is a more general, not a more constraining, way of resolving data. It’s very straightforward to emulate non-batching resolution with batching resolvers, and synchronous resolution with async resolvers; the other direction is impossible. In particular:

In such a case, your d2q resolver will receive a batch of user entities, and then indeed issue one API call per user. But even then, batching benefits us not for performance but for control: for instance, you may choose to issue no more than 5 api calls at a time, or refuse to serve the query if there’s too many users by returning an error, etc.

Resolution strategies

This is indeed a key distinction: Pathom assumes there are several data-resolution paths, and provides an engine for automatically choosing an optimal one.

I think the example you just gave is not challenging enough to obviously differentiate Pathom from d2q - you can achieve the same effect in d2q by writing resolvers for e.g :find-user-by-id and :find-user-by-email, and you could also imagine those resolvers talking to different services, and prefetching some User fields to cache them in the returned Entities objects while they’re at it.

Of course, there are more advanced use cases than this, in which case I totally see how Pathom’s strategy is valuable in the sort of distributed (/ chaotic) environments you described to me; I only think this is a relatively specific need, and I don’t want d2q to be that specific. In particular, I appreciate that d2q is less constraining about the server-side representation of Entities (I’ve benefitted this for authorization logic in particular, representing d2q Entities as a Record holding both a Datomic Entity and authorization information, which would not have conformed to Pathom’s expectation of being a map), and I would worry that a sophisticated algorithm like Pathom’s would prevent me from enhancing query expressiveness in the future.

That being said, hopefully I’ve made d2q programmable enough to make it possible to implement advanced resolution optimizations such as Pathom’s: for instance, I could imagine having an upstream stage which would choose an optimal resolution strategy by static analysis of the query, annotate the query accordingly, and then have Resolvers read those annotations to know which service to fetch data from. Of course, as of today, if a user needs that right away they’re probably better off with Pathom :slight_smile:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.