I think I’m missing something important, I don’t know something about how to Datomic…
Simplier:
How to design a system that justify Datomic?
If you are doing your traditional ER designs, you can still use Datomic if you want… but: Is that the intended path for a system based on Datomic Ideas?
If you deside FIRST Datomic is the database to use (because you are learning Datomic): How would be the design process? Is any literacy there explaining this?
Well, doesn’t it seem mostly like a liberation from a straitjacket? So it all depends on which straitjacket is your context. Supposing that you begin from the SQL straitjacket, –
Start with the old-and-obvious plan that is now called CQRS.
Let the center of your program’s processing be pure. Push database changes to the fringe.
Gleefully skip building a half-baked transaction log into your program. Use the real one.
Embrace the inevitable. Your program might remain in service for many years. Design for schema growth and change.
Design solid algorithms based on stable input. Your program always has a whole-db-value (as of some time x, which you may choose).
Factor “consequences” out into separate programs that follow the transaction log.
Banish all thought about caching. It’s not your problem.
Design an open system. Let the program be forthright about the distinct concerns that attach to each entity.
As @felipegmarques and (I assume) the others who liked your post, I would like to see you expand on your points. I am also in the process of building my first Datomic system, and I feel like learning backend is mostly new ground. Hypothetically, if you were asked, would you be willing to take part in one of the running Clojure podcasts? Not that I am in a position to decide what the podcasters do, but this is a discussion I’d really like to hear in long form audio. A blog post would be interesting as well, but having a host that gently prods with questions is a good way to tackle a complex topic, I find.
Also, thanks for the Fowler link. Good read.
Teodor
Related: I made a thread a while back with some questions on backend architecture. @didibus provided some thorough insight that helped me then.
Gleefully skip – In pretty much any data-based project, you have tables with fields like changed_by_whom, changed_when. Or tables X and X_History. And changed_by_whom is a foreign key, so you can never delete from Who, you instead set deleted=‘Y’ and try to remember to check it in all 937 select statements. This is what I call half-baked. Half-baking takes a ton of work! >> With Datomic, the program can see the context, even the whole world, as of some previous state, at no extra charge. But what about changed_by_whom, chnaged_when? You can add those as attributes of the transaction in Datomic. That’s what I mean by “use the real transaction log instead” of programming your own.
Baesd on stable input – In pretty much any data-based project, procedures sometimes fail because of inputs that do not match up. The system architect imposed structure or exercised discipline to try to provide each procedure with consistent inputs, but compromises are made. The programmer fights a losing battle with failure modes that do not exist in Datomic. >> With Datomic, you write functions-of-a-database-value. All the inputs are there. All are consistent. Do not waste time on rigor of the kind you expected to need when working with other databases.
Consequences – “After changes to entities X, do Y…” Your app does X. That’s enough! Let a second program do Y. Thus, you don’t tangle the program code related to distinct policies, you can restart X without bothering Y, you can run X and Y on separate machines, you can run Y only during the hours when power is cheapest. Now, how will Y know when to take action? >> Datomic’s transaction log is available via API. Y can efficiently “follow along” with the progress made by X.
I used Datomic only in one personal project and have no maintenance experience with it, but the fact you’re deeply tied to the timelining of datoms scares me because it means it is not the history you’re looking for.
Basically if you have to change your “database schema” or rather make any change that is pervasive, by default your correction will take effect “now” even though what you intended was to perform a fundamental change that should be seen as happening very early in the datoms timeline.
One solution is to introduce an abstraction layer to better articulate these kind of necessary, unplanned maintenance acts: it consists in holding two time values instead of one in each datom:
the time at which the fact gets written into the db.
the time at which the fact becomes readable/is considered by the db engine during requests.
A new business-developed open source db will be announced a few days from now at clojurenorth, based on this concept of time called bi-temporality.
Check out this reddit post and my comment to get a better understanding of what’s at stake.
OP asked about Datomic. There are so many senses of time (bug, postmark, corrected “upstream”), and so many kinds of historical reconstruction (satisfying different laws or questions) that real-world times are inevitably the system architect’s responsibility. I like how Datomic is explicit about this. Go ahead and put whatever timestamps you want on the transaction.
While we’re at it, let’s chop off another arm from the straitjacket!
Don’t duplicate structure or schema in code. Insofar as the program’s behavior should parallel the schema or persisted data structure, express that structure as data and let the program discover it. Remember that Datomic Datalog can follow joins discovered at run-time, e.g., “where” clauses like [?e ?a ?v], in which ?a gets unified with something concrete in another clause.
P.S. There’s so much to say (especially before the reveal) about the competition… Let’s open another topic for that.
If you think programming with immutability and values is a good idea, afaik Datomic is the only enterprise-grade database that extends these ideas through to the database. So, you don’t need to justify Datomic, in my opinion immutability & functional programming is the default paradigm that all data processing systems want, so you need to justify not using Datomic.
In other words, as soon as a database other than Datomic is part of your application, you have lost the benefit of functional programming in the 50% of your implementation that interfaces with said imperative database.
(Unless you are a low level distributed systems engineer who is implementing an immutable database.)
When I say “justify” is because I think it is a good idea to decide the database engine later if possible (following Clean Architecture ideas).
Using what you say, I could say that “if immutability, values and functional programming then datomic”.
So, if you design a solution that benefit from, implies or needs those attributes, make sense to choose them, and finally chose Datomic.
Now, is in that first design phase where I feel that the “straitjacket” (good spot @Phill ) is affecting my understanding. Probably because I’m so used to design to please OO programming.
I didn’t want to ask that directly, but: how to design to please Datomic? would be the honest question.
I think open a different topic: How do you Clojure people Design Software?.. but in this thread, the idea around Datomic is still valid.
I have to thank you all for such a big amount of Great information and recommendations around the topic.
That’s a good question and a harder one. Generally speaking you want to model your domain as if it were Clojure maps, with direct references to one another.
So there is direct object references which can be directly walked, as opposed to SQL’s foreign key IDs which must be JOINed and then unpacked into objects. Note the above has no explicit IDs, just object references.
Trying hard to keep your application specific notions of time fully paired to the tx times is an anti pattern with datomic IMO. Like Phil mentions, absolutely nothing wrong with tracking first class time values on entities where it matters and in this case you still benefit from the internally tracked times (for example to track when a migration happened). Plus the tx times reflect the time it takes to process a transaction which could be quite a bit after the time you actually want to track. Having multiple notions of time is a feature!
Not to be too much of a fanboy, but perhaps justify not using it
For instance, are you processing clickstream/IOT/etc data? then yep, it’s not the best fit (but probably awesome for projections/summaries/etc of that data).
IMO, it’s super useful in a wide variety of your typical db usecases, with the added benefit of just enough schema, no need to go multi-db when you need something more graphy or documenty, when you do actually need to say rope in ElasticSearch it’s super painless, and inside the Clojure ecosystem, you have consistent language/model/abstractions from the browser to the db.
It’s been so liberating to realize that perhaps SQL (plus the obligatory something to hide it), Java/Python/etc, and FillInTheBlankScript on the frontend maybe shouldn’t be what we consider normal for application development