How to remodel hierarchical data as flat relationships with Datomic/Datascript?


#1

Hello!

TL;DR: I’m trying to learn how to model data from relationships with Datomic/Datascript. I’m running into a problem where data references are “local”, and don’t really know how to handle that. Wall of text incoming.

I’d appreciate any hunches from more experienced users of Datomic/Datascript!


Problem definition

I’m working with data that’s hierarchical in the context I’m coming from, and I’d like to build a flat Datomic model that can reproduce the hierarchy. This sounds like the right approach to me, “derive hierarchy from a flat model”.

Let me describe the crux of the challenge. The data I’m working with already has relations, but these are allways local, in a context, within the hierarchy. Concretely: I’m modeling structures with computational mechanics. The foundational tools here are nodes and elements. A node has a position, and an element connects nodes. Something like this in a Clojurish syntax:

(def fem-model {:model.doc/description "A truss model of 1d elements for demonstration!"
                :model.doc/figure "   (3)
                                     / |
                                    /  |
                                   /   |
                                  /    |
                               (1) -- (2)"
                :model/nodes [{:node/id 1 :node/x 0.0 :node/y 0.0}
                              {:node/id 2 :node/x 1.0 :node/y 0.0}
                              {:node/id 3 :node/x 1.0 :node/y 1.0}]
                :model/elements [{:element/id 1
                                  :element/nodes [1 2]
                                  :element/kind :element.kind/one-dimensional}
                                 {:element/id 2
                                  :element/nodes [2 3]
                                  :element/kind :element.kind/one-dimensional}
                                 {:element/id 3
                                  :element/nodes [1 3]
                                  :element/kind :element.kind/one-dimensional}]})

But I want to store multiple models in a single database. In fact, that’s the main value prop. We currently have (self contained) models in JSON files floating around. But that prevents us from making applications that reason across models. How much have results improved while we’ve been working on the model with respect to different analyses?

Possible approaches

  • Model the data directly, don’t use Datomic relationships.
    • Advantage: simple
    • Disadvantage: I can’t use the database to resolve relationships, and would instead have to query a whole model and build the indexes myself. Sounds like just storing a blob in Datomic.
  • Just use Datomic references directly
    • Advantage: can query with Datomic
    • Disadvantages
      • I will need to map to local IDs when I export a model
      • Relationship from element to nodes is ordered. :element/nodes [1 2 3] gives a triangle with an orientation opposite to :element/nodes [3 2 1].
  • Ignore references, and store the whole model as EDN
    • This smells like avoiding the problem.

I feel like in foreign territory here. Modeling relationships instead of objects feels like the right thing to do. But I’m not quite sure where to start. Read the f* manual may be the right response here. If so, I’ll take that. I haven’t read the docs in detail full detail.

So!

Does this problem seem familiar? How would you approach it? Loose hunches and hard facts welcome. I’ll happily provide more details, but I guess this post is long enough by now.

Thanks!




References for the interested reader


#2

Some thoughts after sleeping on the problem:

  • This is a bunch of different (specific) questions complected into one, which makes it difficult to answer.
  • A starting approach could be to just store a whole EDN model as one, with some metadata as attributes, and instead make some library functions for working with models, for instance merging two and ensure that IDs don’t collide.

I could be not-stupid with the attribute names if I wanted to represent things more detailed later (:fem.model/raw-data). Something like:

;; Datomic entity
{:fem.model/name "Test model"
 :fem.model/version 1
 ;; But store the raw data as EDN
 :fem.model/raw-data {:fem.model/elements [...]
                      :fem.model/nodes [...]}
 ;; With the ability to add a "real" semantic data representation later
 }

Then I could defer any model design decisions until I have a better understanding of the use cases.


#3

See these related topics:

As an alternative to the proposed solutions, you could also store each node under its own key in the element model: :element/node_1, :element/node_2, etc … and use :element/kind to guide you in determining what are the keys to look for. Then you might want to write some rules or generators to directly use arrays in your queries.

Oh and this: http://www.hypergraphdb.org/


#4

I don’t have experience with Datomic. But if it is relational, then one to many relations, which is what hierarchical data is, you can think of it as a tree, is represented normally by having the children have a reference to the parent.

You can then find all children by querying them where parent ID = the parent. And thus recreate the hierarchy by joining children to parents.

Oh, and order should be explicit. So siblings that need to be ordered should have an Order attribute which can be sorted on to give you the desired order.


#5

@TristeFigure, @didibus

Thanks for your replies. They were helpful to clear up my understanding.

  • I just expected Datomic to be able to store arbitrary EDN. Which isn’t the case.
  • I need to differentiate between raw data (which can make sense to store as the serialized string in the format I got it) and structured data.
  • I can’t expect to “just throw the data at a databse” and structure to emerge. This is a non-trivial design problem, and I need to decide on what structure I want.

#6

This is not too hard to model in datomic.

I have the following naming of schemas: R - reference. R* reference - cardinality: many.

:model.doc/description string
:model.doc/figure well, string
:model/node R*
:node/id should be a unique/identity
:node/x, node/y is double (i guess)
:model/elements R*
:element/id - unique/identity
:element/nodes R*
:element/kind either keyword or R with {:db/ident :element.kind/one-dimensional}

So, throw in more reference types than you think. There is nothing wrong with several, parallell reference types either. The reference itself carries information.

One thing that can be confusing is that the references you put in a vector, like [1 2] are not ordered when in Datomic. You can of course sort them by the ids for sane printing and similar, but data stored in Datomic itself is not ordered in a “user controlled” way.