Hi there, I would like to add to this some more Crux-specific points
since it can be rather confusing with all the ways of running Crux.
I would see three broad setups to run Crux in at the moment:
- Crux runs in standalone mode (RocksDB, JDBC SQLite, …) embedded in a JVM process
- Crux runs with Kafka or JDBC (Postgres, MySQL, MS Server) as storage and there are one or multiple client nodes
- Crux runs in one of the two above modes but you also run crux-http-server and your application logic run separate from Crux itself
Crux also has different data storages:
- The transaction log
- The document store
- The indexes inside the client nodes
Backups
The indexes directly in the nodes (can be memory only, but typically persisted in RocksDB) can be rebuilt from the log and don’t need backups.
The document storage is mostly in the same place as the transaction log, but it’s also possible to store it in a place like S3 now.
In general only the transaction log and document storage require backups.
With Kafka or JDBC you can rely on the storage-native backup mechanism.
In standalone mode the backup utilities can be used or with something like SQLite or H2 they also support their own backup mechanism.
Connection Pooling
crux-jdbc uses HikariCP for connection pooling under the hood.
Migrations
Since there is no explicit schema, no explicitly schema has to be migrated.
When changing the shape of documents the implicit schemas still have to be considered for the different usage scenarios:
There is an implicit schema for writing new documents to Crux, which might change over time. It often helps with data integrity to also enforce this schema with a tool like spec.
For the schema on read you have to consider that with Crux you retain all of history, which means your code must always also be able to read old data.
You can get pretty far by avoiding breaking changes in documents. Having unique attributes is pretty nice to do with namespaced keys and adding attributes or making attributes optional can be done without migrating the data.
With a document store like Crux you can decide to model your different data types simply with unique attribute names or you can introduce a separate attribute to match to a certain type
, schema
or version
. If you are explicity about mapping data to types, a breaking change would require to map to a new type or version of a type.
Many use cases work only with the latest version of documents while history is only relevant for a few specific features.
It might be helpful to migrate data to the latest schema to simplify the majority of the code base.
In Crux you can even migrate historic data to the latest version by writing these history documents with the appropriate valid-time
. Then only for features where you also need to consider all transaction-times
you must actually handle all past shapes of a document.
The most appropriate strategy is highly specific to each use case.
I hope all of this is somewhat understandable And of course there might also be other aspects to this which I have not considered. Happy to learn more!