do you use partitions for your Datomic entity ids? Do you think that they will help to improve the query speed of a multi-tenancy / SaaS system?
We noticed that our Datomic database queries are becoming significantly slower. This is only the case for a cold (in-memory) object cache. The execution time of one of the problematic queries drops from over 12 seconds to under 200ms for the second execution.
This query already uses an appropriate index to quickly find the entity ids belonging to the current customer. It contains a part that pulls 2 attributes from 100 entities of a customer. But even this part takes over 1 second (with a cold cache).
My first hypothesis was that the latency of the index segment fetching causes these high query durations. We do not use Datomic partitions, therefore way more segments must be loaded from the storage. Probably over 95% of the datoms in these segments belong to other customers and will slow down segment fetching for the current customer query.
To validate my hypothesis I started a managed memcached instance on Google Cloud and configured the Datomic peers accordingly. Regrettably, with a warm memcached the query execution only dropped from 12 to around 8 seconds on a peer with a cold object cache. I did a similar experiment with memcached installed on the peer to exclude network latency issues, but execution time only dropped a few hundred milliseconds (stayed over 8 seconds).
My hypothesis now is that the parsing and loading of the segments into the (in-memory) object cache of Datomic is the main cost driver (and not the fetching of the segments data). Regrettably, this is even harder to validate without the Datomic source code.