Adjusting SQL -> Crux

jorinvo · June 1, 2020, 11:40am

Hi there, I would like to add to this some more Crux-specific points
since it can be rather confusing with all the ways of running Crux.

I would see three broad setups to run Crux in at the moment:

Crux runs in standalone mode (RocksDB, JDBC SQLite, …) embedded in a JVM process
Crux runs with Kafka or JDBC (Postgres, MySQL, MS Server) as storage and there are one or multiple client nodes
Crux runs in one of the two above modes but you also run crux-http-server and your application logic run separate from Crux itself

Crux also has different data storages:

The transaction log
The document store
The indexes inside the client nodes

Backups

The indexes directly in the nodes (can be memory only, but typically persisted in RocksDB) can be rebuilt from the log and don’t need backups.

The document storage is mostly in the same place as the transaction log, but it’s also possible to store it in a place like S3 now.

In general only the transaction log and document storage require backups.

With Kafka or JDBC you can rely on the storage-native backup mechanism.

In standalone mode the backup utilities can be used or with something like SQLite or H2 they also support their own backup mechanism.

Connection Pooling

crux-jdbc uses HikariCP for connection pooling under the hood.

Migrations

Since there is no explicit schema, no explicitly schema has to be migrated.

When changing the shape of documents the implicit schemas still have to be considered for the different usage scenarios:

There is an implicit schema for writing new documents to Crux, which might change over time. It often helps with data integrity to also enforce this schema with a tool like spec.

For the schema on read you have to consider that with Crux you retain all of history, which means your code must always also be able to read old data.
You can get pretty far by avoiding breaking changes in documents. Having unique attributes is pretty nice to do with namespaced keys and adding attributes or making attributes optional can be done without migrating the data.
With a document store like Crux you can decide to model your different data types simply with unique attribute names or you can introduce a separate attribute to match to a certain type, schema or version. If you are explicity about mapping data to types, a breaking change would require to map to a new type or version of a type.

Many use cases work only with the latest version of documents while history is only relevant for a few specific features.
It might be helpful to migrate data to the latest schema to simplify the majority of the code base.
In Crux you can even migrate historic data to the latest version by writing these history documents with the appropriate valid-time. Then only for features where you also need to consider all transaction-times you must actually handle all past shapes of a document.

The most appropriate strategy is highly specific to each use case.

I hope all of this is somewhat understandable And of course there might also be other aspects to this which I have not considered. Happy to learn more!