Runtime optimizing compiler for Datalog: who's interested?

vvvvalvalval · November 11, 2020, 5:00pm

Problem: the current Datalog engines for Datomic and DataScript often add a lot of overhead. Basic OLTP queries can typically see their execution time improve from 100µs to 1µs by rewriting them from Datalog to direct index lookups in Clojure (at the expense of readability).

I’m considering working on an ‘optimizing compiler’, that would accept a Datalog query and return a function ready to be run on its arguments, with much less interpretation overhead.

Before starting work on this, I’d like to know who’s interested? Any reasons why this may not be valuable to your projects?

Vote with or

whilo · November 11, 2020, 7:33pm

At lambdaforge we have been looking into design ideas for a Datalog just-in-time compiler for Datahike for more than a year now and fleshing it out is next on our agenda after merging the current pull requests. We would be very happy to team up and try to make this also more generally applicable to DataScript, Datomic and other parts of the ecosystem like we already did for our Datalog parser.

dustingetz · November 11, 2020, 8:32pm

The main perf concern I see in the wild is Datomic queries that take many seconds to complete, do you think the interpretation overhead is also significant here?

vvvvalvalval · November 11, 2020, 10:02pm

Thanks! I’m not sure yet I can commit to implementing this, might be too much for me to chew on, but happy to at least brainstorm.

At first glance, it seems to me that most Datalog queries could be transformed into (transduce ...) expressions with index lookups inside the transducer fns, and that with some minor optimizations this might reduce most of the overhead.

vvvvalvalval · November 11, 2020, 10:06pm

No, I think these are really caused by Datomic’s index data structures not being well-suited to fast analytical queries that span a lot of data. I don’t think that can be helped by a smarter query engine.

But there is another perf concern I see in my projects, and that’s really the N+1 problem, e.g having 1000 Datalog queries each taking 100µs to complete. For example, that issue is especially perceptible when you’re implementing a GraphQL-like API and want to leverage your Datalog rules to compute some derived fields.

whilo · November 13, 2020, 4:09am

Ok, brainstorming is perfectly fine for us. I was just mentioning that we are interested in the topic as well. I think your translation could be directly done with the help of our Datalog parser and wrapping the q function for databases speaking the same Datalog dialect, Datomic, DataScript, Datahike et al.

vvvvalvalval · November 13, 2020, 12:02pm

That would be great news!

whilo · November 13, 2020, 7:12pm

Cool . If you like to we can have a call and discuss how to do it and maybe do some pair programming on the way.

system · May 15, 2021, 7:12am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.