Spec-ing entities and how to organize the specs and our entity keys? A new approach?

didibus · November 12, 2020, 8:45am

TL;DR Jump to “Fourth pass idea” section at the very end, where I explain a new approach I’m thinking of using to name my specs and qualify my entity map keys.

Otherwise read the whole thing if you want to follow my iteration towards the idea, and the issues I’m trying to solve with it.

First pass

When I first used Spec, I went with the approach of using :: to declare my domain model Specs. And I created a namespace to put them all in, imagine: com.org.app.data-model. So in it you’d have:

(ns com.org.app.data-model)

(s/def ::person
  (s/keys :req [::name ::address]))

This turned out to be a bit of a mistake. Mostly because eventually some of those entities breached the boundary of my app, in that I persisted a ::person map to some DB, or I had client pass me one on a request, etc. And when that happened, :: became a liability. So we had someone move the data-model from one place to another, so the namespace became (ns com.org.another-app.data-model) and of course that means that all the specs and keys changed. So data generated prior had keys such as :com.org.app.data-model/name which was now no longer matching the spec of :com.org.another-app.data-model/name.

So it wasn’t the worst thing to fix, we could use the long form in the s/keys so that it would be (s/keys :req [:com.org.app.data-model/name ...]). Which is uglier, and took a little bit of time to do, but not that bad.

That said, then there was a worse problem on the other side. We were creating the ::person map in some other namespace, but also using ::.

(ns com.org.app
  (:require [com.org.app.data-model :as dm]))

(defn make-person
  [name address]
  #::dm{:name name :address address})

And when we refactored the namespace to com.org.another-app.data-model it meant the namespace for the keys in our person map also changed. Those also got exchanged with clients and persisted.

So now we have two types of person map in prod:

{:com.org.another-app.data-model/name "..."
 :com.org.another-app.data-model/address "..."}

;; and

{:com.org.app.data-model/name "..."
 :com.org.app.data-model/address "..."}

Well, now what do we do with the Spec?

(s/def ::person
  (s/keys :req [(or (and :com.org.app.data-model/name :com.org.app.data-model/address)
                    (and :com.org.another-app.data-model/name :com.org.another-app.data-model/address))]))

So we learned our lesson, and stopped using ::.

Second pass

Next time, I thought, well, using :: is prone to issues, so I won’t use it, and I will just type fully qualified namespaces everywhere. But that’s ugly, and I am lazy, so I thought, let’s not make it a really long namespace then:

(ns com.org.app.data-model)

(s/def :app/person
  (s/keys :req [:app/name :app/address]))

(ns com.org.app
  (:require [com.org.app.data-model]))

(defn make-person
  [name address]
  #:app{:name name :address address})

And this has been fine for now, no collision on app yet. But one can see how there could be in theory, so I still don’t find this ideal. Also, typing :app all the time is annoying (our real app has a long name, not a nice little 3 letter one like app)

With this approach, you can refactor rename your namespaces, move things around, and all still works, since your spec keys are unchanged, and your data keys are unchanged, thus they remain in sync always. If you ever change the name of your app though, you’re stuck on the old name.

Third pass

I also thought, do I even need my keys to be namespaced in my data? The only use case for it I can think of is if someone finds the data, and wants to know what is the authority on it, and somehow they have no idea where the data comes from or who owns it. The namespace would be self-describing in that sense, they could figure out where, and they’d know what exact spec is the schema for it. But I’m not sure how valuable this is, so I thought, screw it, going to back to unqualified. Now, when you do that, the problem I had with :: partially disappears. Since now :: is only used as the lookup for the spec in code, so you can do this again:

(ns com.org.app.data-model)

(s/def ::person
  (s/keys :req-un [::name ::address]))

(ns com.org.app
  (:require [com.org.app.data-model]))

(defn make-person
  [name address]
  {:name name :address address})

Now I could freely move my make-person function elsewhere, or refactor rename my data-model namespace, and it would all still work.

This is currently my favorite approach.

Fourth pass

But yesterday, I had an issue with the Third pass approach:

(ns com.org.app.data-model)

(s/def ::name string?)

(s/def ::person
  (s/keys :req-un [::name ::address]))

;; Uh oh! I have a key conflict!
(s/def ::name #{:sony :microsoft :nintendo})

(s/def ::business
  (s/keys :req-un [::name ::address]))

So the solution is to create another namespace for the business entity, but go down this path, and you get yourself a lot of little files, and my code base starts to look like Java (unless I use multiple namespaces in a single file, which I feel I shouldn’t, but that might just be an uncalled fear of mine).

(ns com.org.app.data-model.person)

(s/def ::name string?)

(s/def ::person
  (s/keys :req-un [::name ::address]))

(ns com.org.app.data-model.business)

(s/def ::name #{:sony :microsoft :nintendo})

(s/def ::business
  (s/keys :req-un [::name ::address]))

This would happen with the Second pass approach as well:

(ns com.org.app.data-model)

(s/def :app/name string?)

(s/def :app/person
  (s/keys :req-un [:app/name :app/address]))

;; Uh oh! I have a key conflict!
(s/def :app/name #{:sony :microsoft :nintendo})

(s/def :app/business
  (s/keys :req [:app/name :app/address]))

Here I can fix it by just calling it: :app.business/name to distinguish it, which comes with its own drawbacks, like making it weird to use #:app{:app.business/name "" :address ""}. You could choose to always append the entity so you’d have :app.business/address as well, but now the typing gets longer and longer.

Fourth pass idea

So I thought a little about all of this, and I came up with this:

(ns cool-def-key-lib)

(def key-ns
  (atom {}))

(defn defkey
  [alias keyns]
  (swap! key-ns assoc (str alias) (str keyns)))

;; This would go in `data_reader.clj` ideally
(set! *default-data-reader-fn*
      (fn[tag-sym value]
        (when (.startsWith (str tag-sym) "key")
          (if (map? value)
            (let [tag-key-str (second (re-find #"key:(.*)" (name tag-sym)))]
              (if-let [tag-key (@key-ns tag-key-str)]
                (reduce (fn[acc [k v]]
                          (if (qualified-keyword? k)
                            (assoc acc k v)
                            (assoc acc (keyword tag-key (name k)) v)))
                        {} value)
                (throw (ex-info (str "No keyword namespace defined for " value) {}))))
            (let [ns (namespace value)
                  na (name value)]
              (if-let [k (@key-ns (str (or ns na)))]
                (keyword (name k) na)
                (throw (ex-info (str "No keyword namespace defined for " value) {}))))))))

This gives you a way to defkey, which creates a keyword alias in some separate alias to keyword namespace registry. And then a #key and #key:foo {:bar "baz"} tagged literal where you can use the keyword namespace alias to give you a fully qualified keyword that is decoupled from the current namespace or other code namespaces.

With this I can now do:

(ns com.org.app.data-model
  (:require [cool-def-key-lib :refer [defkey]]
            [clojure.spec.alpha :as s]))

(defkey 'person 'com.org.app.data-model.person)

(s/def #key person/name
  string?)

(s/def #key person
  (s/keys :req [#key person/name
                #key person/address]))

(defkey 'business 'com.org.app.data-model.business)

(s/def #key business/name
  #{:sony :microsoft :nintendo})

(s/def #key business
  (s/keys :req [#key business/name
                #key business/address]))

(ns com.org.app
  (:require [com.org.app.data-model]
            [cool-def-key-lib :refer [defkey]]))

(defn make-person
  [name address]
  #key:person {:name name :address address})

To help you understand:

(defkey 'foo 'com.my.long.namespace.foo)

#key foo
;=> :com.my.long.namespace.foo/foo

#key foo/bar
;=> :com.my.long.namespace.foo/bar

#key:foo {:bar "baz" :biz "fuzz" :some/other "ns"}
;=> {:com.my.long.namespace.foo/bar "baz"
     :com.my.long.namespace.foo/biz "fuzz"
     :some/other "ns"}

I don’t like using something that is pseudo some new syntax and non standard, but I also like this approach quite a bit. It means my keys can be fully qualified and guaranteed globally unique, making them self-describing and also letting you know exactly what spec in the whole world specifies their value.

It solves my issue where I can move things around, refactor, and it still all works, because the code namespaces and the keyword namespaces are separate.

And it creates a convention for naming entities and their keys, where the entity is uri.entity-name/entity-name and the keys are uri.entity-name/key-name

So I’m curious how others deal with this, if there are strategies I missed, what you think of these approaches, and especially your thoughts on that Fourth pass one.

P.S.: @alexmiller told me on the slack:

just fyi, I am currently working on a solution to this (lightweight alias) for Clojure with Rich (probably 1.11)

No idea if this will resemble at all my Fourth pass or not, but something to look out for.

thheller · November 12, 2020, 9:39am

In my opinion all approaches are relying too much on the namespace and too little on the name.

I stopped using “generic” keyword names everywhere and instead use more specific names. So instead of :name I’ll use :person-name or :product-name or whatever else “prefix” makes sense. Often I’ll still add an app specific namespace because namespaces still make sense. Often I’ll go with an otherwise empty namespace (or specs live there) like app.model instead of dedicated app.person namespaces. Yes the model ns can get huge if you add specs there but I’m fine with that. The major goal here is to have a namespace that everything can alias easily. It is often a .cljc namespace shared between CLJ/CLJS.

Apart of solving the alias headaches this also solves the problem of accidentally shadowing clojure.core functions or conflicting destructuring names.

(defn order-list-item [{::m/keys [product-name]} {::m/keys [person-name]}] ...)
;; vs.
(defn order-list-item [{::product/keys [name]} {::person/keys [name]}] ...)
;; can't do this, also hides clojure.core/name

Maybe this is something worth considering. It does work well for me.

alexmiller · November 12, 2020, 2:10pm

Seems like you have reinvented aliases and alias support in namespace map syntax. Aliases are the right answer here - use long names for proper qualification, use short names in code for conciseness.

The only downside of aliases are that they require a real loadable namespace right now. We are working on fixing this problem in Clojure, probably for 1.11. In the meantime, I think using the create-ns/alias trick is probably what I’d recommend.

didibus · November 12, 2020, 7:29pm

@alexmiller A question about migrating to whatever Clojure 1.11 or next will have for lightweight aliases. Would the :: just operate the same over them, so if I’m currently creating namespaces with create-ns and calling alias on it, in the future I’d be able to just swap out my aliasing function to create a more lightweight variant? But the usage pattern would be the same?

This was the create-ns/alias trick by the way (it was my first attempt at Fourth Pass on the slack)

(defn defalias
   "Aliases the given alias-sym in the current namespace to a new namespace
    of `<current-namespace>.alias-sym`."
   [alias-sym]
   (let [sym (symbol (str *ns* "." alias-sym))]
     (create-ns sym)
     (alias alias-sym sym)))

alexmiller · November 12, 2020, 8:27pm

Would the :: just operate the same over them, so if I’m currently creating namespaces with create-ns and calling alias on it, in the future I’d be able to just swap out my aliasing function to create a more lightweight variant?

It’s a little early to definitively answer, but yes, that’s the plan.

jwr · November 16, 2020, 1:39pm

I’ll offer some thoughts, based on my experiences developing a complex application. I’ve gone through a similar thinking process, and my code was reworked several times.

First, I do not use unqualified keywords anymore, except when the usage of the data structures is very limited. I found that unqualified keywords always came back to bite me eventually. When I needed to implement coercions (specified alongside specs), I ran into clashes. When I wanted to freely merge data tuples, I could not, because of keyword collisions. And I couldn’t use the keys in my data to reference something else (UI transformations, table columns, etc).

However, long names (namespace+keyword) are a problem for me. Namespace aliasing solves this to some extent in Clojure, but when my data makes its way into JSON, they are still an issue.

Eventually I settled on a compromise: for all data types internal to a module, I’ll use the local namespace and ::qualified-keywords. Problem domain data types are qualified using a non-namespace symbol, like :membership/uid, which maps well into a JSON database, JSON-based API or JSON data export. Non-qualified keywords are used sparingly, mostly in maps with optional function parameters.

One additional observation: I consider “greppability” of the code to be a factor. Emacs with helm-ag is an extremely useful tool (so is ag itself, ack, or even grep -r if you’re desperate). I make a conscious effort to name things so that it’s easier to find all usages in a large codebase later. With namespace-qualified keys this is much easier to achieve.

didibus · November 16, 2020, 11:52pm

Thanks for explaining how you’ve been handling it. I’ve been separating JSON from my internal Clojure model personally, I’ve found mixing JSON with Clojure has made things worse for me in the past. So I have a json-entity spec and an entity spec, the former is a Spec for a JSON string, the latter for the Clojure representation of it. It might look like:

(require '[cheshire.core :as json])

(s/def ::user/user
  (s/keys :req [::user/name]))

(s/def ::user/json-user
  (s/and string?
         (fn [json-user]
           (s/valid? ::user/user (json/parse-string-strict json-user))))

So internally I work exclusively with ::user/user, and only when I’m about to receive/read or send/write outside my application I convert it to/from JSON, and possibly alter keys to whatever I want them to be or not.

Its a bit more annoying when you convert to/from JSON, as you need to process keys as well, but I like hoe explicit things are. Internally, I can work freely with nice Clojure data that is Clojure friendly, and externally I get a chance to convert things for people to get JSON that is also friendlier to work with in non Clojure languages.

didibus · November 17, 2020, 12:04am

Where I’ve landed with my experimentation is something very similar to how symbols and namespaces work, except for keywords.

I have a defalias function which works the same as alias except under the hood it’ll call create-ns to allow the alias to be created in the current code namespace:

(defalias 'foo 'com.bar.baz.foo)
;; => ::foo is now resolved to :com.bar.baz.foo in the current namespace

But I also created in-scope which works like in-ns except it creates an “alias scope”. So I can do:

(in-scope 'com.bar.baz)
(defalias 'foo)
;; => ::foo is now resolved to :com.bar.baz.foo in the current namespace
(in-scope 'com.fizz.buzz)
(defalias 'foo)
;; => ::foo is now resolved to :com.fizz.buzz.foo in the current namespace

So what I do now is I can have something like this:

(ns com.org.app.data-model ...)

(in-scope 'com.org.app)

(defalias 'user) ;; Means ::user will be an alias for com.org.app.user because that's the current scope

(s/def ::user/name string?) ;; ::user/name resolves to :com.org.app.user/name
(s/def ::user/email string?)
(s/def ::user/user
  (s/keys :req [::user/name ::user/email]))

And in another file:

(ns com.org.app.core
  (:require [com.org.app.data-model]))

;; Can use in-scope here as well for the default alias scope so you don't have
;; to repeat the full namespace in all defalias calls here.
(in-scope 'com.org.app)
(defalias 'user)
(defalias 'cart)
(defalias 'transaction)

;; If you have an entity name conflict, you can use the 2-ary variant
;; to alias it directly
(defalias 'other-user 'com.org.other-app.user)

(defn user->other-user [user] ...)
(s/fdef user->other-user
  :args (s/cat :user [::user/user])
  :ret ::other-user/user)

didibus · November 17, 2020, 1:27am

Got feedback from @alexmiller and arguably, the in-scope thing is a bit unintuitive, you need to mentally understand where you are in the file, what could the current scope be, etc. Its probably prone to error and a bad idea.

So with rethinking it again, I thought, ok, maybe its really just some kind of require but for keyword namespaces. So now I thought of this:

(ns com.org.app.data-model ...)

(alias '[com.org.app :as app :refer [user account transact cart]]
       '[com.org.other-app :as other-app :refer [user] :rename {user other-user}])

::user/name
;; => :com.org.app.user/name
::app/user
;; => :com.org.app/user
::other-user/name
;; => :com.org.other-app.user/name

Not sure about the DSL here. Maybe it doesn’t have to shoehorn :refer and :as and :rename, could use some other DSL. But I think with this one, I’ll be able to tell clj-kondo to lint it as require and it should work which is nice.

didibus · November 17, 2020, 3:40am

Hopefully this will be my last update here , but I think others might find my process through this interesting, so I’ll leave all the above posts in place.

Again, with more help hammocking this from @alexmiller and also @seancorfield on Slack, which I appreciate greatly. I think I finally boiled it all down:

My real problem can be re-formulated as follows:

The way I want to use namespaces on specs and keywords for my entities is like so:

Have my entity specs keyed on: :unique-namespace/entity and my entity fields keyed on: :unique-namespace.entity/key

Thus, at the end of the day, I just need a way to make unique-namespace shorter, because to make it unique I make it really long as I follow the format: com.my-company-name.my-app-name for it. And to have it not tied to code namespaces for the problems I outlined in my initial post (basically so I can refactor freely without worrying I break my entities or my specs as I do).

Here’s an example to make that clear:

(s/def :com.my-company.my-service/user
  (s/keys :req [:com.my-company.my-service.user/id]
                :com.my-company.my-service.user/name
                :com.my-company.my-service.user/address
                :com.my-company.my-service.user/email
                :com.my-company.my-service.user/dob]))

Well turns out alias is the answer after all:

(alias 'my-service 'com.my-company.my-service)
(alias 'user 'com.my-company.my-service.user)

(s/def ::my-service/user
  (s/keys :req [::user/id]
                ::user/name
                ::user/address
                ::user/email
                ::user/dob]))

Now in the current Clojure alias (< 1.10), this does not work, since you can’t alias to a namespace that doesn’t exist. This is what is being worked on in 1.11, some other form of alias whose exact API and semantic/implementation is TBD, but you’ll be able to alias keywords even if there is not real namespace for it is the idea. For now, you can solve it pretty simply:

(defn key-alias
  [sym ns]
  (create-ns ns)
  (alias sym ns))

(key-alias 'my-service 'com.my-company.my-service)
(key-alias 'user 'com.my-company.my-service.user)

(s/def ::my-service/user
  (s/keys :req [::user/id]
                ::user/name
                ::user/address
                ::user/email
                ::user/dob]))

That’s it. Now in another namespace that uses the spec, you can do the same:

(key-alias 'my-service 'com.my-company.my-service)
(key-alias 'user 'com.my-company.my-service.user)

(defn make-user-api
  [request]
  (let [user {::user/id (UUID/randomUUID)
              ::user/name (:name request)
              ::user/address (:address request)
              ::user/email (:email request)
              ::user/dob (:dob request)}]
    (if (s/valid? ::my-service/user user)
      user
      (throw (ex-info "Failed to make a valid user." {:request request :user user})))))

Alex recommended this, but also thought that just fully typing the long namespace on every use of the key was fine as well, so to each their own.

In my case, I started with this, and then I ended up with key-alias definitions that spanned 10 to 20 lines, because we have some big namespaces that make use of a lot of entities in them. So I have to alias a lot of entities which require two calls to key-alias per entity (one for the entity spec namespace, and another for the entity field namespace). That kicked in my OCD, and my hate of verbosity, and maybe I should fight it in this case, but I might play around with a util or macro that lets me shrink those definitions even further when the unique-namespace between them is just repeated. I won’t be reporting back on that here anymore, also, maybe it is best avoided, and doing so is madness

mdiin · November 20, 2020, 6:05am

Thanks for documenting your process on this @didibus! It’s interesting to read about the rabbit holes other people fall into.

system · May 21, 2021, 6:05pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.