How to profile Datalog DBs?

yeah, it’s a pretty general question, but it gets at a pain point I’ve heard repeated for a while now. Serious and semi-serous DB architects/managers are familiar with various tools that can help profile performance in SQL databases. However, in Datalog DBs like Crux or Datomic the common complaint I hear is how difficult it is to know if something is running as efficiently as it could, or to debug run-time issues. Can anyone speak to this issue, either from experiencing the problem or knowing solutions?

1 Like

I can only speak about our experience with running Datomic in production. We log the query execution times, so that we can spot slow queries in our monitoring dashboard.

However I’m not aware of any tools for Datomic that provide something similar like Postgres’s “explain analyse”. The first :where clause should be the one that limits the result set the most. To find the slow parts of a datalog query I do:

  • comment out everything except the first :where clause
  • measure the query execution time
  • then I uncomment the next where clause
  • measure the query execution time
  • and so on

The pull parts of the datalog query can also be slow. We didn’t partitioned our entity ids. Therefore the data for one customer is oftentimes spread across a lot of Datomic segments. This causes that probably thousands of segments needs to be loaded into the memory, before the execution of the datalog query can even start. While the first query execution can take several seconds, the subsequent executions runs in 100-400ms. This is our biggest performance challenge at the moment. The described cause is only a hypothesis, since I cannot analyse this further without having Datomic’s source code. It would be great to have a Datomic datalog “explain analyse” tool that also reports the number of loaded segments.

3 Likes

@Webdev_Tory, I was just asking about this (in relation to Crux) to some JUXT guys the other day over at their Zulip, and the answer was that where clauses are reordered based on the internal stats they keep (I suspect it’s similar to what you mention about Datomic). You can actully see how the query is done if you enable debug logging.

The thread in question is here: https://juxt-oss.zulipchat.com/#narrow/stream/194466-crux/topic/query.20plan.2C.20.60where.60.20ordering

3 Likes

It looks like Datomic has no query planner

The :where clauses of Datomic queries are executed in order. This gives you total control over how the query engine performs with your data set. One rule of thumb is to place more restrictive clauses before less restrictive clauses

2 Likes

As @mvarela pointed out, Crux’s Datalog query engine ignores the user-provided ordering of :where clauses in a query and will always determine its own query plan, based on cardinalities/statistics and other heuristics. Computed query plans are required for the Worst-Case Optimal Join engine design to operate efficiently. This design enables lazy query execution that isn’t constrained by large memory requirements associated with constructing intermediate relations. It also means the queries that users submit are effectively more “declarative” in the general case, where the need to worry about execution ordering is avoided.

Inevitably though there may be cases where fine-tuning of the calculated join order is required and short of adding news heuristic to the query engine (or waiting for the Crux team to do so) you can usually restructure the query to force some predetermined subset of the final calculated order.

Debugging the join order is currently achieved via logging (again, as @mvarela helpfully mentioned) and looking for obvious ordering issues, although friendlier tooling for this is on the backlog here: https://github.com/juxt/crux/issues/1234. There is also a slowest-queries API which would be useful during testing and production to prioritise efforts.

Hope that helps :slight_smile:

1 Like