(Lazy) Sequences, deferred results, composable queries, oh my (please help)


#1

I’d really appreciate some help on some design decisions for an API I’m designing for an internal system, however I think the general problem is very common in many database-backed systems.

We want have a core domain/model API that enforces business rules, e.g. given a user, when asked for a list of widgets, show me widgets this particular user has access to. In our specific case, we do that by constructing a particular query against Mongo.

On top of that, we also have some presentation rules — extra filtering (e.g. to support search), sorting, pagination etc. Or perhaps we want to fetch and join some relationships as well. This happens by taking the core query and adding more clauses, or perhaps going to some more low-level Mongo functionality.

I’d like to keep the core API surface quite agnostic to these presentation rules, that is, I don’t want the core functions to take 7 different arguments to cover all the different presentation rules (or a map of options, it’s kind of the same).

What I think I want to do is have the core API return something that can be further refined by calling other functions, but also eventually realised into a collection of items (or a count, etc). So perhaps it’d look like this:

(def square-widgets 
  (-> (q/get-widgets db user)
       (q/filter-by {:type "square"})
       (q/sort-by :name)
       (q/paginate {:limit 20 :skip 3})))

Now, at this point, square-widgets could be further changed (perhaps add another filter clause, by using another q/filter-by) or actually realised into a collection of results – or just a count.

I could imagine this going two ways:

  1. The result of all these functions is just a hashmap. Add a q/execute and q/count and other “top level” operations that know how to interpret this to give you back a seq (potentially lazy) or a number.
  2. The result is actually a Query record that implements Seqable, Countable and perhaps other Clojure protocols. Then the result of these functions can be passed as-is around the system, and only when something needs to actually iterate over it (e.g. for serialisation purposes) is the query run against the DB.

It also would be nice to be able to cache these results, so if you have executed the query once, you don’t need to execute again (or if you have fetched all the results already, you don’t need to re-execute the count DB operation).

It kind of feels that the record/protocols approach is too magicky, since reading the code might be confusing, but I can’t think of any other drawbacks. Consumers of the API can just call vec if they want to execute the query and get back non-lazy results. I’d also need to implement a bunch of protocols if I want to support things like reverse/nth and so on (though there could be ways around it, I think).

It also feels though that keeping the record approach allows for more flexibility to perhaps implement other useful protocols, or realise the results while keeping the original query around so that could be used for logging slow queries at some other layer.

I’d be very interested to hear your thoughts about this! Thanks!


#2

Hey, I’ve been tinkering with a similar thing for MongoDB on top of monger for a while. I actually arrived at the Query (and Transaction) record which can be built up using protocol functions. While this worked quite well initially (especially Query using MongoDB’s aggregation pipeline) I encountered a few issues with this approach:

  1. The “ending” functions like execute or count are actually quite painful to work with. Sooner or later I found myself either forgetting them and then tracing possibly weird errors, or wanting to break out of the pattern of setup->execution. Getting laziness/async-like functionality in there as well that was supposed to “just solve” caching did not turn out well
  2. Actually creating a set of protocols around queries/transactions that come close to forming an elegant algebra around persistence is quite hard. I found (at least with my own probably slightly crazy implementations) that very often operations on the Query/Transaction would secretly expect a certain ordering, or show deep issues when used in more complex setups. While tinkering with it, I often found myself implementing features of an ORM.
  3. Which brings me to what I identified for myself as the root issue with this approach: Essentially, I was building little machines to pass around the app. Not only did those machines require a varying degree of experience to construct, but they contained internal state of some form or another (laziness didn’t help here), and grew exceedingly unpredictable (was this called already? how many times? what did that entail again? easy/easier to answer for a read operation, maybe, but quite head-scratchy in terms of control flow for writes), and the needed to be passed around everywhere. I was basically trying to force a deeply OOP-ish solution, which in Clojure often can lead to a lot of headache, as it did in my case.

I’m pretty sure others can expand on this last point much better than me, but at some point I realized that my approach just wasn’t going to work. This code serves as a sort of backbone library for a handful or two web apps I develop, and I was running in circles to make the domain entity/model/database portion work in all cases. In the end, the use cases (although very similar!) were just too diverse – I admit that it might very well have been my inexperience with functional programming and OOP “muscle memory” that ruined any semblance of sanity – and I found myself wishing for a splatter of functions that “just do map/collection stuff”.

I abandoned all that record/protocol work in favor of a set of really useful functions. monger already provides a lot of functionality, and for using it in a more coherent and domain-oriented way I wrote a single namespace containing all sorts of validation, query and transaction helpers around it (e.g. for “looking up” related documents from other collections, I have (find-belongs-to), (find-has-one), … instead of trying to reimplement a ORM).

For domain entities, I also use a multimethod define-model which returns a well-defined hash-map describing a particular document type (db/collection name, fields, indexes, validation rules) in mostly data only, which I can call sort of like a static class member (define-model :myproject.shop/order). Combined with an (easily testable!) set of functions writing those domain-y namespaces like myproject.shop.orders has become a breeze, and those namespaces now serve as what I would’ve written a Repository for in Java.

Regarding caching, I moved the problem to where the data from the database functions was actually used, which in turn made it much easier to control as well. Having a component-based architecture, this was actually quite easy using Redis & carmine, and I skipped the whole Cacheable protocol idea that was creeping up on me again :wink:


#3

Some advice:

  • Don’t make a domain-specific query language that compiles to a generic query language - you’ll probably be biting on more than you can chew, ending up with a DSL that is both too weak, under-specified, and opaque. You should probably just make some functional helpers that output the MongoDB query language.
  • Don’t mix laziness with IO, it’s full of caveats. Instead of returning a lazy sequence, consider parameterizing the result with a transducer to apply to the returned documents.
  • If you want a higher-level, storage-agnostic API, it might be interesting to use a GraphQL-like language on top of MongoDB to implement your logic, using libraries like Pathom for instance.

#4

Thanks for the advice!

But it’s so much fun! In seriousness, I agree that it might be headaches down the road. It seems though that Clojure is extremely suited for this, as my “mapper” namespace is full of small functions that I can pick apart later (I’m still in the exploratory phase, concepts are more important than code).

I gave up on this, instead I’m just constructing a “query” hash map that contains the expected shape of the documents that will be returned — converting the documents is trivial with that. I didn’t think of using a transducer but it’s a good idea to add as needed.

I’m not going storage-agnostic at this point, but a GraphQL implementation seems straightforward to build on what I’m working on.


If anyone is following this, I ended up constructing mongo pipelines out of my mini-DSL. The 3.6 aggregation functions can do effective joins with subqueries so it actually is quite doable to do SQL-like joins.


#5

That looks quite similar to what I’m making – mostly data driven things with the occasional function to get more custom behaviour.