Data modelling approaches?

I’ve been thinking lately about how I’ve been approaching data modelling in a way that directly maps to the underlying storage mechanism, e.g. if I use SQL I’ll think in tables and foreign keys, if I use a document store I’ll think in nested documents etc.

However it occurs to me that in many cases this will prematurely lock the entire software design into those limitations, whereas I’d like to keep some kind of flexibility and move the persistence logic down to another abstraction layer.

Especially when starting out thinking about a problem domain, most times I’d want to use plain in-memory data structures that reflect the ideal state of my data, and then serialise those to some frontend for presentation and back-end for storage.

Does Clojure provide any mechanisms that will aid with this approach?

For example, taking the usual “blog” domain with “articles” and “authors”. If two articles have the same author, I’d like to represent that author in memory only once, and that means that I’d use something like a var or atom to add one level of indirection. But that will start to complicate things since the usual functions that work with nested maps will stumble.

Any thoughts?

1 Like

The least harming (trying to destroy my brain) approach I’ve found so far is Datomic.

IM-highly-opinionated-O “in-memory” is only great for the very first napkin draft (I prefer
Any playable data sample (like 3-7 pieces) is already too heavy for my brain to store for even 2 hours period.

Why Datomic?

Still don’t understand why Datomic Free is not a part of Clojure RT by default.
“in-memory” is not really a database.

RICH HICKEY: the other big place you have left once you switch to a functional programming language is your database. So you can do whatever you want, you can use Clojure or Scala or Haskell, and then this database ruins everything for you, because it is a place, and most databases updated in place. There’s all the complexity associated with that, that there is with using places in memory. What Datomic endeavors to do is to say “Let’s stop doing that. We have a lot more storage space than we ever did, and we have enough that we could take a functional approach to storing our data.”

I agree that Datomic does seem targeted to this kind of problem. Unfortunately most of the snippets in the linked article rely on the Peer API which is not in the Cloud version.

If we’re talking about the early prototyping phase of a project, I’ll often do exactly as you describe: describe the ideal form of my data with in-memory data structures, serializing my prototype data to an EDN file or MapDB via spicerack.

For your blog/articles/authors example, I might use a single nested “database” map, stored in an atom. To store authors only once I could have an author function that takes an ID and performs a lookup on the “database”. During prototyping, the innards of that function would probably be something like (get-in @db [authors id]).

1 Like

How would the article entity have a reference to the author, so that could be later resolved? Using a namespaced key like {:article/author {:user/id 123}}?

Your suggested model is certainly likely. Depending on the complexity of the prototype’s data model I might elide the inner map, since (maybe) the only things I’ll store about a blog’s author are over in the author info. Maybe the db atom would be something like:

{:authors {123 {:author/name "dave"}
 :articles {456 {:article/title "foo" 
                 :article/author 123}

I think it’s important to note that until you have a clear idea of the APIs between different pieces of functionality, you’re still liable to find assumptions baked into your data model. This is true even if the data is modeled with Clojure data structures instead of tables or Datomic entities. That’s because Clojure’s abstract types (maps, sets, vectors, etc.) have shape and therefore embody choices just like tables with relations or a bag of namespaced facts. In some ways they embody more choices, for instance whether the data is sequential or not.

The key is to prioritize the ability to modify or throw away old toy data and data models quickly, without fuss, until you’ve thought about the problem enough to be sure you’re ready to move to the next level of commitment. So first whiteboarding, then maybe diagramming software, then in-memory data structures, then pick your storage. The hassle of change should scale inversely with your skepticism in your model. (and its fork Fulcro) uses a vector of two elements to indicate ref types. Also, the convention is to use singular forms instead of plurals. The above becomes:

{:author/by-id {123 {:author/name "dave"}
 :article/by-id {456 {:article/title "foo" 
                 :article/author [:author/by-id 123]}

this [:author/by-id 123] just means (get-in app-state [:author/by-id 123]). By providing all info needed for looking up associated data, you won’t need to declare somewhere (in some kind of schema?) which keyword refers to which “table”.
Rendering is just a matter of recursively looking up such associated data until you have all you need. By using two-element ref types, you break free from such schema, refactoring will be a breeze.


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.