Incremental updates with datomic

mjmeintjes · July 15, 2022, 8:35am

I’m trying to use the datomic transaction log to incrementally update some values in an search index.

For example, say I have a database that looks like:

[{:db/id 1
  :project/title "The title"
  :project/todos [2 3]}
 
 {:db/id 2
  :todo/title "Todo 1"
  :todo/tags [4]}
 {:db/id 3
  :todo/title "Todo 2"}
 {:db/id 4
  :tag/title "Tag"}]

Now I want to create something like:

[{:todo/id 2
  :search-title "The title / Todo 1 #Tag"}
 {:todo/id 3
  :search-title "The title / Todo 2"}]

Creating this transformation once is simple using the entity API. However, I want to be able to incrementally update it based on the datomic transaction log. But I am completely stuck and just cannot figure out how to do this.

The transaction log gives me the list of eid and attributes that changed, but for example, how do I know to update the search-title for :todo/id 2 when the Tag title changes to Tag2.

Paul_Iannazzo · July 15, 2022, 1:34pm

why don’t you have real references between these 2 entities? then when the ent with tag/title changes you can do a pull on todo/id or todo/_id and update the field in your related entity.

i’m making a lot of assumptions here, though, because the premise unclear to me.

mjmeintjes · July 18, 2022, 1:20am

Thanks. I’m trying to add a full text search interface on top of data stored in datomic, but struggling to keep the full text search index up to date with changes in the datomic data.

I think what you suggested is probably the best way forward (storing references to all dependent data in the full text index). I was just hoping that there was some simpler way that I was missing.

mjmeintjes · July 18, 2022, 1:38am

Just some more research I found about this:

https://vvvvalvalval.github.io/posts/2018-11-12-datomic-event-sourcing-without-the-hassle.html#detecting_indirect_changes_is_still_hard

There are several strategies to mitigate this problem, all with important caveats:

You can ‘denormalize’ the Event Types to add more data to them, effectively doing some pre-computations for the Aggregates. This means the code that produces the Events needs to anticipate all the ways in which the Events will be consumed - the sort of coupling we’re trying to get away from with Event Sourcing.

You can enrich each Aggregate to keep track of relational information it needs. This makes Event Handlers more complex to implement, and potentially redundant.

You can add an ‘intermediary’ Aggregate that only keeps track of relational information and produces a stream of ‘enriched’ Events. This is probably better than both solutions above, but it still takes work, and it still needs to be aware of the needs of all downstream Aggregates.

And later on in the same article:

For instance, here’s a query that determines which Users must have their reputation re-computed because of Votes:

(comment "Computes a set of Users whose reputation may have been affected by Votes"
  (d/q '[:find [?user-id ...]
         :in $ ?log ?t1 ?t2                                 ;; query inputs
         :where
         [(tx-ids ?log ?t1 ?t2) [?tx ...]]                  ;; reading the Transactions
         [(tx-data ?log ?tx) [[?vote ?a ?v _ ?op]]]         ;; reading the Datoms
         [?vote :vote_question ?q]                          ;; navigating from Votes to Questions
         [?q :question_author ?user]                        ;; navigating from Questions to Users
         [?user :user_id ?user-id]]
    db (d/log conn) t1 t2)
  => ["jane-hacker3444"
      "john-doe12232"
      ;; ...
      ]
  ;; Now it will be easy to update our 'UserReputation' Aggregate
  ;; by re-computing the reputation of this (probably small) set of Users.
  )

Paul_Iannazzo · July 18, 2022, 1:56am

that is what datomic is good at, storing meta data that links to outside data.
but to keep things in sync it’s a pain in the ass, and that’s what event sourcing is ok at, but event sourcing has it’s own pains. i would advice trying to come up with a strategy that reduces your event source code to as small as possible.

also, if you are using something like elastic search, you are going to have to deal with it silently dropping your commits, and you’ll have to come up with a way to figure out if it’s data is in sync with datomic, or your event source.

the vvvvalvalval guy is cool, though. i use some of his libraries. i should read that article.

system · January 16, 2023, 1:56pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.