Recollect: quick diff/patch for data in EDN

“Recollect” is a small library I created to sync data from server to clients. The idea behind Recollect is React’s DOM diff/patching.

To sync states across all clients in realtime, we are facing a similar problem: clients data are created based the same database, when database is updated, all clients need to update their copies of data in their browsers. Mean networks is slow, we should only send the changed parts over the network. And, this is almost what we have done with React.

Recollect is implemented in ClojureScript since we rely persistent data structure heavily. And many tricks are required to make it performant because we all know diffing can be slow, no mention doing it on server side.

Currently there are several updating operations in Recollect:

details on protocols is outdated, follow latest protocol in the README.

[:m/!   coord x]      ; reset data
[:m/-   coord k]      ; remove key from map
[:v/+!  coord xs]     ; append to vector
[:v/-!  coord k]      ; remove after index k
[:st/++ coord xs]     ; add to set
[:st/-- coord xs]     ; remove from set
[:sq/-+ coord [k xs]] ; drop k items and add sequence

For Vectors and Sequences, since they are operated from head/tail, it’s slow to update in the middle.

It’s fine in current stage of experimenting. I’m wondering if we can achieve better performance and reliability with better algorithms, any ideas?


updates: I got some examples:

3 Likes

Hey @jiyinyiyong, I finally had a chance to have a better look at this. It looks pretty cool! I’ve been thinking a lot about what my dream Clojure/ClojureScript stack would look like, and the one part where I just can’t decide what the right way forward is is server/client sync.

I have to admit I also haven’t had enough of a chance to experiment with what’s out there.

I like what Christopher Small is doing with Datsys, it uses Datomic and DataScript, and keeps server and clients synced through a re-frame like event loop over a websocket. It does feel a bit overengineered, and it assumes the client is allowed to see the full database which is a bit of a blocker, but still this general approach I find very appealing.

Then there’s GraphQL / om.next style “let the UI ask for what it needs”. I always stayed clear of om.next because its documentation is just too confusing, but I heard the defn episode about Fulcro recently (used to be Untangled), and that seems to be designed to take the pain out of om.next.

Then there’s the CRDT approach (Conflict-free Replicated Data Type) taken by Replikativ, which is based on some very solid CS foundations.

The main problem for me is that all of these have a pretty steep learning curve. Just to get to the point where you understand how one of these works, have built a demo app, and understand the trade-offs will take several full days.

I heard of CRDT before but didn’t ever try it. In Node.js community there are Meteor and Derby doing realtime syncing, but I don’t find they enough for my needs.

Cumulo(my project) is very rough solution to data syncing. The main problem is it’s very slow and very unfriendly to databases. So currently the best use case is my realtime collaborating editor. I say it’s rough because it keeps all data in memory and do diffing on every data change. That could be really slow and nearly impossible to scale.

The good part of Cumulo developing apps is so fast since most of the duplicated work are handled by diffing. Making prototypes of chatrooms is very quick. I would prefer improving Cumulo is such directions.