Design question: DataScript vs. Nested Maps as data model

What are the advantages or disadvantages of using DataScript for modeling domain data structures, vs. using nested maps?

My feeling is that the DataScript approach is cleaner and easier to reason about, but as a newcomer I worry I am missing some design issues that will bite me later.

  • Are there performance/scalability limitations?
  • Any other situations or downsides?
  • Why is this approach not the default?
2 Likes

I’ve worked about 2 months on my project in clojure/script.

I know about Datascript, because I love working with rum and from time to time I also read tonsky’s blog.

In fact, in my project I decided to use nested maps, because my app was simple and small. I didn’t see see any extra benefits from using Datascript.

My main map represents app state, inside this map I have nested maps where each map contains state for view/widget. I could compose view/widgets to build target page.

The big difference between maps and Datascript is visible when you need shared entity across two or more views.

In case with nested maps you could repeat that entity value inside nested maps, but it is problematic if you want to update entity and still have consistent state across all views.

To avoid this you could extract only id from entity to your nested maps. But:

  1. to render components you need to select data from the app state
  2. sometimes you must search your entity in unexpected way
  3. sometimes entity gets extra data required in browser like a information about which entities are marked/selected

Datascript in this case is better then nested maps because offers API to mix data, select it (with projection) in optimized way.

1 Like

Thanks. I also found specter (library to work with nested data structures).

In an Object oriented approaches, you’d create a graph of mutable objects to represent state. A lot of the application code would then be to query/search/iterate through this object graph.

I do think that the more genric DataScript like approach will give up some performance.

I have never used Datascript in anger, but I love this question - especially “why is it not the default?” I really wonder if it could indeed be a solid default choice for many apps that need an in-memory database in which to:

  • cleanly and flexibly hold state
  • get your data model right

I particularly like the idea of prototyping fast this way.

I do have experience with large UI that uses a redux store - basically nested maps (albeit with some control over changes to the data structure) - which has become more and more unwieldy, particularly with the same data lying about at different levels. It feels very messy and hard to reason about.

I guess it also depends on your domain. Does it involve dynamically structured data (are you making an editor?) that suits nested maps. Or are you making a business automation / decision support system (like the rest of us) where writing down atomic facts as a representation of the world works just fine?

The query and transaction semantics for datascript are much more expressive.

As a result, there is more to know. One Way to view the difference is between a document store and a graph one.

I have never seen a domain model that didn’t benefit from graph over document. That being said, start less complex.

I went through the same decision a couple months ago for a small web site project. Basically, DataScript gives you a much more sophisticated query language (Datalog). Maps are maps and you can manipulate them with all the standard Clojure functions. For me, I ended up with just a large map stored in an atom, persisted to disk with Duratom. Perfectly fine for small data sets with minimal requirements.

For now I’ve decided to follow an approach similar to Fulcro/Pathom, using a map of maps, and representing entities as tuples like - [:person/id 1]

See this.

I’m still early in my journey, so I’m trying to stay within core libraries.

I worked a lot with both pure map as a storage and DataScript.
Now I think pure map is a good choice but only for very small apps (like veeery small). The data normalization can be a very complex process and you can lose a lot of performance here (even with awesome clojure functions).
For example, if I write a chat and want to have conversations and messages. If you put messages into the conversation:

{ :conversations [{:id 1 :messages [{:id 2 :text "Hello"}]}]}

Then you are fine in terms of performance (you can access messages of conversations in O(1) time, but it’s very difficult to use this from outside, like finding messages by id would be O(n^2).
Or you can normalize your store like this:

{:conversations {1 {:id 1 :messages [2]}}
 :messages {2 {:id 2 :text}}}

This is much more convenient way (redux authors recommend always do this), but querying data can be very complex and even accessing all messages of a single conversation means O(n) instead of O(1) in previous case.
And this is a very simple example, you can have like 5 levels instead of one and you almost always will loose a lot with simple atom.

On the other hand, datascript has indexes, it’s quite fast (it depends of your query but it’s anyway faster than mapping and querying manually). Also, you don’t need to normalize data (it’s already done for you). So, I would vote for DataScript.
Here are two libraries which can help you with that:
Posh If you want to just transform your queries into atoms and re-posh if you want to use datascript as your data storage inside for re-frame

I have some interesting anecdotes about the subject.
I once started writing a relational engine for in-memory Clojure data. The idea was to keep data as close to nested maps as possible, and to adapt and optimize the schema behind, add indexes, etc … by observing how data was linked and how queries were made.
Wrote something like 200 lines of code and had a hard time debugging them so I discarded all my work and started work on a debugger instead.

This debugger was fairly simple: traverse the code and add println statements everywhere. Actually, those weren’t exactly calls to println, but rather calls storing debugging data into a store.

At first i used datascript, but it was too slow (it was way before it got optimized, but still I was working on a debugger and that didn’t cut it). Eventually I turned to nested-maps and got several orders of magnitude of improvement in speed.
Thinking about this tradeoff between expressibility and speed, another solution might be https://github.com/redplanetlabs/specter, a library that relies on “selectors” (somewhat similar to queries) to fetch data into nested Clojure structures, to try and get the best of both worlds. (Specter is claimed to be even faster than vanilla Clojure).

Recently, as i was helping a friend design a schema for a C(ontent)MS, i started writing it in EDN and when I was done I put in 30 minutes more of additional coding, and ended up with a fairly respectable pseudo-sql engine. It’s not performant at all, but it should be enough to deal with small hand-crafted datasets used at compile time (for instance to generate static pages or code within macros). Let me show you its features.

So here is the schema

(def schema
  ;; So I'll try to explan along what all of this means.

  ;; here we store each entity class in a map by name (obviously)
  {:entities {;; abstract here means this entity-class should never be instantiated
              ;; only inherited by another class. It isn't implemented (yet) but it
              ;; doesn't hurt clarity.
              :base-entity  {:abstract true}
              :named-entity {:abstract true
                             ;; We specify which entity-class(es) this one
                             ;; descends from. It's based on merging maps,
                             ;; accepts multiple inheritance, so it's closer
                             ;; to how prototypes works actually.
                             ;; Still very useful in order to avoid repeating
                             ;; the same stuff over and over like in
                             ;; datascript/datomic
                             :include  [:base-entity]
                             ;; I'm pretty proud of this one: since we're
                             ;; building an engine for a hand-curated data,
                             ;; we'd like to avoid arbitrary ids (ints, uuids).
                             ;; So rather than having `id` as a field let's have
                             ;; it as a function instead: it must return a submap
                             ;; from an entity (a map) that garantees it will be
                             ;; unique with respect to ids computed from
                             ;; entities of the same type.
                             ;; When :id is associated to a keyword (here :name)
                             ;; it will get compiled into
                             ;; `(fn [entity]  (select-keys entity [kw]))`
                             ;; The same applies if :id is associated to a seq
                             ;; of keywords (=> this sql engine supports
                             ;; composite ids).
                             :id       :name
                             :fields   {:name :keyword}}
              ;; As a result of inheritance we get these kind of beautiful
              ;; one-liners
              :epreuve      {:include  [:named-entity]} 
              :genre        {:include  [:named-entity]
                             ;; Some fields to know what we can query and
                             ;; how to validate data and ensure those ‘hands'
                             ;; I talked about above did not introduce data
                             ;; breaking the schema. I suppose I'll have to
                             ;; find a way to state which fields are required
                             ;; or optional
                             :fields   {:titre :string}}
              :menu         {;; This entity class isn't identified by 
                             ;; `:name`. No, this one uses a composite
                             ;; identifier.
                             :include  [:base-entity]
                             :id       [:epreuve :genre]
                             ;; You'll notice fields can be primary values as
                             ;; well as entities themselves. What this means in
                             ;; practice is that the :epreuve field (epreuve
                             ;; means exam in French) will be an `id` for an
                             ;; entity of the epreuve class: it will look like
                             ;; this:
                             ;; {... :epreuve {:name :impossibru-exam} ...}
                             ;; They can also be sequences of such ids, as it is
                             ;; the case for the `:categories` field.
                             :fields   {:epreuve    :epreuve
                                        :genre      :genre
                                        :categories [:categorie]
                                        :intro      :string}}
              :categories   {;; To query entities that are related to a given
                             ;; one, you take the id map that represents the
                             ;; linked entity {:name :hardcore-category},
                             ;; inject the given entity id into it like this
                             ;;   {:name    :hardcore-category
                             ;;    :epreuve {:name :impossibru-exam}}
                             ;, Then you just filter the linked entity
                             ;; instances, something like this:
                             ;; (filter (fn [m]
                             ;;           (= (select-keys m [:name :epreuve])
                             ;;              {:name    :hardcore-category
                             ;;               :epreuve {:name :impossibru-exam}}))
                             ;;         categories)
                             ;; For now, the foreign key is named after the
                             ;; linked entity class, there's room for improvement
                             :include  [:named-entity]
                             :id       [:name :epreuve :genre]
                             :fields   {:epreuve :epreuve
                                        :genre   :genre
                                        :articles [:article]}}
              :article      {:include  [:named-entity]
                             :id       [:name :categorie]
                             :fields   {:categorie :categorie}}}})

The data looks like this:

(def data
  {:epreuves   [{:name :oral}
                {:name :ecrit}]
   :genres     [{:name :theatre :titre "Théâtre"}
                {:name :roman   :titre "Roman et le récit"}
                {:name :poesie  #_...}
                {:name :idees   #_...}]
   :menus      [{:epreuve    :oral
                 :genre      :theatre
                 :categories [:racine :beaumarchais :voltaire]
                 :intro      "Hoc libro ..."}
                {:epreuve    :ecrit
                 :genre      :theatre
                 :categories [:astuces :dissertations :commentaires]
                 :intro      "Hoc libro ..."}]
   :categories [(defaults :genre :theatre
                  (defaults :epreuve :oral
                    {:name     :racine
                     :articles [:parcours :passion-et-tragedie :un-autre-extrait]}
                    {:name     :beaumarchais
                     :articles [#_...]}
                    {:name     :molière
                     :articles [#_...]})
                  (defaults :epreuve :ecrit
                    {:name     :astuces
                     :articles [:les-bons-tuyaux :une-autre-astuce :encore-un]}
                    {:name     :dissertations
                     :articles [#_...]}
                    {:name     :commentaires
                     :articles [#_...]}))]
   :articles   [(defaults :genre :theatre
                  (defaults :epreuve :oral
                    (defaults :oeuvre "Racine" :auteur "Phède"
                      {:name :parcours            :titre "Parcours"}
                      {:name :passion-et-tragedie :titre "Passion et Tragédie"}
                      {:name :un-autre-extrait    :titre "Un autre extrait"}))

                  (defaults :epreuve :ecrit
                    {:name :les-bons-tuyaux       :titre "Les bons tuyaux"}
                    {:name :une-autre-astuce      :titre "Une autre astuce"}
                    {:name :encore-une-astuce     :titre "Encore une astuce"}

                    {:name :un-commentaire        :titre "Un commentaire"}
                    {:name :un-autre-commentaire  :titre "Un autre commentaire"}
                    {:name :encore-un-commentaire :titre "Encore un autre commentaire"}))]})

A note about defaults. It’s a macro that will merge default key-value pairs in the enclosed maps. Yet another trick to avoid repeating oneself. It’s based on the section macro (not show here) that I used to tree-structure these records. In the end I hope to write two similar fns/macros:

  • directory: to have these list of records in their own separate file, in a hierarchy of directory derived from data added by the section macro.
  • file: ditto, but applies to a record’s single field rather than the whole record.

At last, here is the implementation (still a work in progress):

(def ^:dynamic *config*)
(def ^:dynamic *schema*)
(def ^:dynamic *data*)

;; ---------- U T I L S
(defn- position-in [coll x]
  (.indexOf coll x))

;; ------------------ A S S O C I A T I V E
(defn plan-merge
  "Like `merge-with` except that the combination fn of a specific pair
  of entries is determined by looking up their key in `plan`. If not
  found, falls back to the function found under key `:else` or if not
  provided to a function that returns the value in the right-most map,
  thus providing the behavior of `merge`.
  In addition to a map, `plan` can also be a function accepting a key
  and returning a combination fn for the two values to merge."
  [plan & maps]
    (when (some identity maps)
      (let [merge-entry (fn [m e]
                          (let [k (key e) v (val e)]
                            (if (contains? m k)
                              (let [else-f (get plan :else #(identity %2))
                                    f (get plan k else-f)]
                                (assoc m k (f (get m k) v)))
                              (assoc m k v))))
            merge2 (fn [m1 m2]
                     (reduce merge-entry (or m1 {}) (seq m2)))]
        (reduce merge2 maps))))

(defn map-keys [f m]
  (->> m
       (map (fn [[k v]]  [(f k) v]))
       (into (empty m))))

(defn map-vals [f m]
  (->> m
       (map (fn [[k v]]  [k (f v)]))
       (into (empty m))))

;; ------------------ W E A V I N G (function composition)
(defn- relative-key? [kw]
  (-> kw name first (= \?)))

(defn- absolutize [kw]
  (-> kw
      (when->> relative-key?
        (->> name rest (apply str) keyword))))

(defn- bound-together? [m pred & ks]
  (let [finds (map->> ks absolutize (find m))]
    (and (every? some? finds)
         (apply pred (map->> finds val)))))

(defn- pred-map| [m outer]
  (letfn [(transform
            [[k v]]
            [k (condp #(%1 %2) v
                 (or| fn? set?) v
                 relative-key?  (| bound-together? = k v)
                 sequential?    (walk transform
                                      (fn [preds]
                                        (fn [e]
                                          (every? identity (map #(% e) preds))))
                                      v)
                 associative?   (pred-map| v outer)
                 ;; else
                 (fn [e]  (and (contains? e k)
                               (= (get e k) v))))])]
    (if (fn? m)
      m
      (->| (juxtm| (map transform m))
           (|| into {})
           outer))))

;; ------------------ P L U R A L
(def ^:private *singularizer* nil)

(defn- simple-singular [k]
  (->> k name butlast (apply str) keyword))

(declare entity-schema)
(defn- singularize
  ([table-name] (singularize  (entity-schema table-name)  table-name))
  ([schem table-name]
    (or-> (-> schem :plural)
          (*singularizer* table-name))))

;; ---------- S O U R C E S
(declare config schema data)

(def sources
  {:default {:config config
             :schema schema
             :data   data}})

(defmacro with-source [k & body]
  `(let [src# (get sources k)]
     (binding [*config* (:config src#)
               *schema* (:schema src#)
               *data*   (:data   src#)]
       [email protected])))

(defn set-source! [k]
  (let [src (get sources k)]
    (alter-var-root #'*config* (constantly (:config src)))
    (alter-var-root #'*schema* (constantly (:schema src)))
    (alter-var-root #'*data*   (constantly (:data   src)))))

;; ---------- M I N I - S Q L
(defn from [table]
  ^{:name :from}
  (fn []
    (-> table
        (when-> keyword?
          *data*
          (with-meta {::from table})))))

(defn AND [m]
  (pred-map| m (->| vals (|| every? identity))))

(defn OR [m]
  (pred-map| m (->| vals (|| some identity))))

(defn where [pred-map]
  ^{:name :where}
  (fn [table]
    (let [pred  (-> pred-map (when-not-> fn? AND))]
      (filter pred table))))

(defn fields [& fields]
  ^{:name :fields}
  (fn [table]
    (map-> table (select-keys fields))))

(defn order [& fields]
  ^{:name :fields}
  (fn [table]
    (let [fields   (-> fields (map-> (when-not-> sequential? (vector :asc))))
          ks       (map first fields)
          ords     (map second fields)
          get-vals (|| map  (|| get table)  fields)
          cmpr     (comparator (fn [a b]
                                 (->> (map (fn [x y ord]
                                             (let [res (compare x y)]
                                               (case ord
                                                 :asc  res
                                                 :desc (- res))))
                                           a b ords)
                                      (drop-while zero?)
                                      first)))]
      (sort-by get-vals cmpr table))))

(def ^:private exec-order
  [:from :hydrate :include :where :fields :order])

(declare hydrate)
(defn select [& fs]
  (let [fs (vec fs)
        fs (->> (cons (hydrate)
                      fs)
                (sort-by (juxt (->| meta :name (|| position-in exec-order))
                               (|| position-in fs))))]
    ((apply ->| fs))))

;; ---------- S T R U C T U R A T I O N
(def ^:dynamic ^:private *section* nil)

(defn- parse-section-args [args]
  (->> (split-with (not| coll?) args)
       (map vec)))

(defmacro section [& args]
  (let [[keys contents] (parse-section-args args)]
    `(let [ks# ~keys]
       (binding [*section* (vec (concat *section* ks#))]
           (with-meta [[email protected]] {::section ks#})))))

(defn- parse-defaults-args [args]
  (->> args
       (partition-all 2)
       (split-with #(some-> % first coll? not))
       (>>- (let-> [m        (->> first (into (ordered-map)))
                    contents (->> second (apply concat) vec)]
              (<- [m contents])))))

(defmacro defaults [& args]
  (let [[m contents] (parse-defaults-args args)]
    (doto `(section [email protected](apply concat m)
                    (with-meta [[email protected]] {::defaults ~m}))
          pprint
          (<- (newline)))))


;; ---------- H Y D R A T I O N
(def ^:private merge-entity-schemas
  (partial plan-merge {:include (->| concat vec)
                       :fields  merge}))

(defn- expand-entity-schema [entity-name]
  (let [entity     (-> *schema* :entities entity-name)
        inclusions (:include entity)]
    (->> inclusions
         (map (->| expand-entity-schema (| merge-entity-schemas entity)))
         (apply merge-entity-schemas))))

(defn- entity-schema [entity-name]
  (-> (expand-entity-schema entity-name)
      (when-> identity (assoc :type entity-name))))

(defn- apply-sections+defaults [data]
  (let [merge-defaults (|| merge (-> data meta ::defaults))
        add-section    (| update :section
                          (->| (|| concat (-> data meta ::section))
                               vec))]
    (-> data
        (when-not->> (every? map?)
          (mapcat #(-> (apply-sections+defaults %)
                       (map-> merge-defaults
                              add-section)))))))

(defn install-type [e schem table-name]
  (when-not-> e :type
    (assoc :type (singularize schem table-name))))

(defn- install-id [e schem]
  (when-not-> e :id
    (assoc :id (select-keys e (:id schem)))))

(defn- hydrate []
  ^{:name :hydrate}
  (fn [table]
    (let [table-name (-> table meta ::from)
          schem (entity-schema table-name)]
      (-> table
          (apply-sections+defaults *data*)
          (map-> (install-type schem table-name)
                 (install-id schem))))))

(print-table
  (select #_(fields :titre)
          (from :articles)
          #_(where {:epreuve :oral :genre :theatre})
          #_(order [:id :desc])))
; (select (from :articles) (where {:name :les-bons-tuyaux}))