Do you encapsulate domain entity maps?

shomiyamoto · June 23, 2020, 4:42am

Recently I found myself struggling with endless refactoring that results from the fact that spec’ed entity maps are referenced all over my codebase. I remembered this talk, You are in a maze of deeply nested maps, all alike, by @ericnormand and rewatched it. While my problem is more about the need to change entity specs in development than about avoiding the complexity of deeply nested maps, I realized that his solution, encapsulating domain entities, would solve my problem as well.

I have one repo, let’s call it myapp.spec, dedicated to spec’ing domain entities. I renamed this to myapp.domain and started adding api functions for CRUD operations. The result: each domain entity has one namespace in which functions encapsulate specs, namespaced keywords, and namespaced-maps. As Eric mentions in the talk, this is encapsulation and duplication, which we normally try to avoid, but I strongly agree with him that it’s a necessary evil to avoid refactoring. I now have this feeling that I can develop entity specs as I go along, knowing I only need to refactor at one place.

One self-critique that occurred to me is whether I’m doing something against the spirit of clojure.spec, which is, in my interpretation, embracing global semantics. I think the anser is NO, because I’m still benefiting from sharing specs when defining entity semantics, but I haven’t seen what the current act of encapsulation entails and I’m not entirely sure about this.

I’m curious what others think about this type of encapsulation.

Phill · June 23, 2020, 9:39pm

Not sure I understand. Would you give a brief example?

seancorfield · June 23, 2020, 10:45pm

I didn’t know what the OP meant either until I watched Eric’s talk from IN/Clojure. With that background, I think this is a great question to ask and it isn’t something I’d really thought about (even with ten years of production Clojure under my belt).

There’s a lot to unpack here.

First off, I think “the need to change entity specs in development” perhaps speaks to trying to write specs that are too detailed before the problem space is understood. I write fairly minimal specs when I’m starting out on a new solution because “I don’t know” – but as I learn more by exploring the problem space (via the REPL), I formalize my data structures further and refine the specs, making them more specific only as I become sure that I’m “right”. If I don’t know about an attribute, I just leave it out and let it be an unchecked map element. So I don’t run into any pressures of (constantly) needing to change specs while developing – and I don’t find those specs change much once I have an MVP up and running.

Second, regarding CRUD operations, I am fairly ambivalent. In principle, I try to avoid having CRUD for every entity – if I can just leverage basic clojure.java.jdbc functions or next.jdbc.sql functions since those libraries have generic insert!, get-by-id/find-by-keys, update!, delete! operations on tables/hash-maps. But there’s also something to be said for isolating those behind an API if you want to avoid leaking the database schema into your code – the trade-off being a lot more (repetitive) functions and the overhead of transforming between the domain model and the persistence model (which can seem really tiresome if they are, in fact, very similar). So I’m on the fence there and my code sometimes has a CRUD API and sometimes doesn’t.

One thing that stood out for me as a big omission from Eric’s talk was that it doesn’t mention namespace-qualified keys at all. Those are very useful for providing globally unique and meaningful names for things in a hash map and they can really reduce the need for nesting in data structures, e.g., having :address/street, :address/city, :address/country instead of :address {:street "..." :city "..." :country "..."} – proper global names address a lot of what Eric was talking about in terms of the difficulties of data access, update, and even just basic comprehension.

All that said, I think if you have a naturally layered domain (and not all domains are), then having the code match that layered structure is a good approach. In general, having code organized to match the domain, using verbs and nouns from the domain, is going to make the code easier to read and navigate. I don’t think that having, essentially, getters and setters for every domain entity is valuable in general (because of the duplication and cognitive overhead of “more code”).

So I guess that’s a long-winded way of saying “It depends!”.

shomiyamoto · June 24, 2020, 5:48am

@Phill @seancorfield

Let’s consider an example of user entity with a focus on its address. In my previous approach, I would always reference the repository where the entity is defined and use namespaced keywords.

(ns myapp.spec.user 
  (:require [clojure.spec.alpha :as s]))
(s/def ::name    string?)
(s/def ::address string?)
(s/def ::user    (s/keys ::req [::name ::address]))

;; construct a user entity somewhere
(require '[myapp.spec.user :as user])
(let [user-name    "iamgroot"
      user-address "1 Park Ave, NY"
      user         {::user/name user-name ::user/address user-address}]
   (swap! *users assoc user-name user))

;; update the address somewhere else
(require '[myapp.spec.user :as user])
(let [user-name   "iamgroot"
      new-address "2 Union Sq, SF"]
  (swap! *users assoc-in [user-name ::user/address] new-address))

Consider now a new requirement that

user can register up to 5 addresses
user can choose a color for each address
user can update the priority of the addresses by UI dragging.

This necessitates refactoring. The following is just one way of doing it.

(ns myapp.spec.user.address
  (:require [clojure.spec.alpha :as s]))
(s/def ::index   (s/int-in 0 6)
(s/def ::color   string?
(s/def ::address string?)
(s/def ::entity  (s/keys ::req [::index ::address] :opt [::color]))

(ns myapp.spec.user 
  (:require [clojure.spec.alpha :as s]
            [myapp.spec.user.address :as address]))
(s/def ::name      string?)
(s/def ::addresses (s/coll-of ::address/entity)
(s/def ::user      (s/keys ::req [::name ::addresses]))

;; construct a user entity somewhere
(require '[myapp.spec.user :as user])
(require '[myapp.spec.user.address :as address])
(let [address {::address/index   0 
               ::address/color   "orange"
               ::address/address "1 Park Ave, NY"}
      user    {::user/name      "iamgroot" 
               ::user/addresses [address]}]
   (swap! *users assoc "iamgroot" user))

;; update a primary address somewhere else
(require '[myapp.spec.user :as user])
(require '[myapp.spec.user.address :as address])
(let [user-name   "iamgroot"
      new-address "2 Union Sq, SF"]
  (swap! *users assoc-in [user-name ::user/addresses 0 ::address/address] new-address))

In an alternative approach that is more domain-driven than data-driven, I would’ve first written

(ns myapp.domain.user
  (:require [clojure.spec.alpha :as s]))

;; * Specs
(s/def ::name    string?)
(s/def ::address string?)
(s/def ::user    (s/keys ::req [::name ::address]))

;; * APIs
(defn new [name address]
   {::user/name name ::user/address address})

(defn set-address [user new-address]
  (assoc user ::user/address new-address))

;; There should be as many APIs here as the domain requires.

;; construct a user entity somewhere
(require '[myapp.domain.user :as user])
(let [user-name    "iamgroot" 
      user-address "1 Park Ave, NY"
      user         (user/new user-name user-address)]
   (swap! *users assoc user-name user))

;; update the address somewhere else
(require '[myapp.domain.user :as user])
(let [user-name   "iamgroot"
      new-address "2 Union Sq, SF"]
  (swap! *users update user-name user/set-address new-address))

This is definitely more code than before, but refactoring becomes more localised.

(ns myapp.domain.user.address
  (:require [clojure.spec.alpha :as s]))
(s/def ::index   (s/int-in 0 6)
(s/def ::color   string?
(s/def ::address string?)
(s/def ::entity  (s/keys ::req [::index ::address] :opt [::color]))

(ns myapp.domain.user 
  (:require [clojure.spec.alpha :as s]
            [myapp.domain.user.address :as address]))

;; * Specs
(s/def ::name      string?)
(s/def ::addresses (s/coll-of ::address/entity)
(s/def ::user      (s/keys ::req [::name ::addresses]))

;; * APIs
(defn new [name address]
  (let [address-entity {::address/index   0 
                        ::address/color   "orange" ;; default-color
                        ::address/address address}] 
    {::user/name      name 
     ::user/addresses [address]))

(defn set-primary-address [user new-address]
  (assoc-in user [::user/addresses 0 ::address/address] new-address))

;; update a primary address somewhere else 
(require '[myapp.domain.user :as user])
(let [user-name   "iamgroot"
      new-address "2 Union Sq, SF"]
  (swap! *users update user-name user/set-primary-address new-address))

Here, most of refactoring takes place in the myapp.domain repository. No change is needed in the repository where a user entity gets created. I needed to touch the repository where the setter API is called because of its renaming, but this is a trivial change.

What I wanted to highlight in this made-up example is that if you strive for domain driven programming and if you reference namespaced keywords across multiple keywords, you may end up leaking the implementation details of your domain entities.

@seancorfield
I agree with you. This is a tradeoff. The tradeoff between the verbosity of wrapping/hiding and the ease of refactoring. If it is an environment where the design of domains entities is more ore less stable, I wouldn’t avoid this type of encapsulation. But, in the environment where enriching and improving domain semantics drives business and unexpected refactoring is norm rather than exception, which is where I find myself, I am willing to pay upfront some verbosity. How costly will this price be? I don’t know. But one thing I know for sure is that I don’t need setters and getters for all entity keywords. The number of API functions should be no more than the domain requires.

didibus · June 25, 2020, 10:04am

The first thing I want to say is, forget the Specs. Specs are not defining entities in any way. They are not like Classes.

The second thing I want to say is, feel free to have a function that helps you construct a data-structure of the shape you want, but do not think of this as if you are creating an Object. You are only factoring out the code for creating the structure into a common utility function for re-use.

;;; Pure Core Functions

(defn make-user
  [user-name address]
  #:user{:name user-name
         :address address})

(defn change-address
  [user new-address]
  (assoc user :user/address new-address))

(defn make-users
  []
  {})

(defn add-user
  [users user-name user]
  (assoc users user-name user))

;;; Impure State Management at the boundaries

(def users (atom make-users))

;; construct a user entity somewhere
(swap! users add-user "iamgroot" "2 Union Sq, SF")

;; update the address somewhere else
(swap! users update "iamgroot" change-address "1 Park Ave, NY")

A few notes here:

The core logic was kept functionally pure, and impure state was pushed to the edge.
Nothing was encapsulated. All we did was create utility functions to help us factor out code which manipulates our data-structures with regards to the invariants we want for our domain entities. It is still possible to modify the data-structures and all their elements directly, without making use of our functions, thus not providing any real encapsulation.
Since you brought up DDD, in DDD, it is acknowledged that domain entities are the hardest to modify over time, and that’s why the emphasis is on spending lots of time upfront on getting them right.
I think your example feels like you’re trying to introduce some in-app entity layer, but if you have a DB, it’s much easier to just create stateless APIs that operate over the DB schema directly, and that bypasses your entire problem altogether.
I got rid of Spec, but you could bring it back. The idea of Spec here would be validation, you could validate that after each change to an entity the entity still satisfies the Spec for it. If not, you have a bug in your transformation logic, or you forgot to update your Spec.
Factoring code into re-usable functions can help with refactoring, but can also make it harder depending on the change needed. If you need the same change applied everywhere, having factored all usage to a common place means you only need to touch that common place. But if you need to make the change to only a subset of all places, you now need to factor out all those places so they can deviate from the rest.
In a production scenario, that change to your entity is most likely backwards incompatible, since your old persisted entities are not of this new shape (of no longer requiring :user/address and now requiring :user/addresses). That means you’ll either need a data migration project from old entity to new in your persistence store, or you need to make it backwards compatible, either by creating a whole new type of user entity, or making both address and addresses keys optional. And then your code which manipulates user entities must all be made aware of both possibility existing. So like I said in the DDD comment, a change to your domain entities will be costly in real production scenarios.

shomiyamoto · June 25, 2020, 12:59pm

Thanks for your input.

Encapsulation was the wrong word. The title should have been something along the lines of “what’s the optimal boundary of namespaced maps?” or “Is it a code smell to referece namespaced maps all over the place?”

The point I wanted to discuss from my learning by doing is that placing a right boundary/interface is important and this does not just apply to private variables and functions but also to specs, namespaces, and spec’ed entities. I made a big transtion from not using spec’ed entities effectively to using them everywhere, and during that transition, I wasn’t paying attention to placing semantic boundaries and ended up leaking implementation details. Right now, I’m trying to find a middle point of balance.

Yes, I strive for DDD and am currently shrinking what I thought was my domain to something smaller and more stable. You’re absolutely right that since everything depends on domain semantics, I should do my best to minimize the risk of changing them. Actually, it’s not at the core domain semantic layer but at the next few layers that I want to keep open the possibility of improvements/refactoring by placing right boundaries and hiding implementation details.

I feel comfortable with clojure.spec and appreciate its multiple functionalities as documentation, definition, validation, and instrumentation tools. I think my lesson learned is not to share spec’ed entities beyond their semantic boundaries.

jjttjj · June 25, 2020, 2:26pm

The thing is, none of the code in this thread actually does anything, it’s just how you organize thigns. Ultimately your code will have to do stuff and that will guide the organization more than anything else. Spec is relevant because it allows you to specify how things should be and then you can check on that wherever you want. Maybe in a constructor or maybe at the call site where it does something. It helps you separate the validation from the other things you might want to do in a constructor (coercion, populating defaults).

I think it might be useful to start with the functions that do stuff and then work backwards to the domain entities. Imagine that there was a rule that every function can only take a single map argument and write your function calls

(email-user!
  {:user/email "foo@bar.com"
   :message/subject "hi"
   :message/body "hello, world"
   :email/service {:email.service/api-key "asdf1234"}})

(save-address!
  {:user/id 1234
   :user/address {:address/street "123 main st" :address/zip 12345}
   :database/conn { ,,, }})

Now each thing that does something takes its own custom tailored entity thing.

Using positional argument constructors and what are essentially getters and setters works ok for small examples but I think quickly becomes unwieldy in real environments. user is a good example because it’s extremely common, and in real life will tend to grow a lot. Do you want to add X new positional arguments to your user constructor, and 2X new functions to your user namespace every time you want your user to be able to do an new thing? Versus just making a function that does the new thing and being passed the stuff it needs.

Spec cleanly separates one of the aspects of the things done in these constructors: validation. And now you can validate anywhere you want. You can do it, for example, in the above functions that do stuff! What else doe these constructors do? Coerce values and add defaults? Why not have those be their own functions too and use them where needed.

None of the constructors/getters/setters prevent you from having to name the concepts of your domain entity. Why not just use the first class names clojure offers, keywords, for the names. And for the stuff you need to actually need to do with your entities, clojure has a first class thing for that too: functions

shomiyamoto · June 25, 2020, 4:20pm

Please ignore the example if that confused you. I made it up to illustrate the tradeoff between ease of development vs ease of refactoring. And I swear I’m not trying to revive OOP by any means.

I just wanted to share my experience that I had one core repository that contained entity specs to which many other repositories had unconstrained access. When I wanted to rewrite some entity spec, I found that I’d need to refactor all over the place because everyone knew too much about the entity internals.

In short, I’m just rediscovering the importance of setting interfaces at semantic boundaries.

didibus · June 25, 2020, 11:03pm

I would say it depends. It is a code smell in my opinion to have a function operate over the aggregate-root when it only is concerned with an entity within it.

For example:

(defn change-address
  [users user-name new-address]
...)

I’d say this is bad, because it takes as input the aggregate users and returns the modified aggregate, but all it does is modify the one user inside it.

So it’s better to keep them local to their related aggregated level. That means some functions like change-address would relate to the user aggregate, and others like say add-user would relate to the aggregate-root.

And if you had a address related logic, like say validate-address that should operate over the address itself, not the user like:

(defn validate-address
  [address]
  ...)

Basically always make your functions input as small in scope as possible. That’s my take at least.

And it totally makes sense to not re-implement the same change address code all over the place, factoring code that manipulate a user in a user namespace can make sense, but I’d say more important than the namespace is factoring it inside functions. The namespace is just organizational. It is the function that gives you the ease of refactoring. By sharing the same code through a shared function, it means a change in that code reflects everywhere at ounce.

When it comes to the entities themselves, I’d say you want to bound them by context. Like DDD says, bounded context. The value specs should be shared globally, but a user entity is more possibly a :marketing/user and a :auth/user and a :report/user where each of these different context have potentially different keys for their user aggregate.

shomiyamoto · June 26, 2020, 2:33am

It is a code smell in my opinion to have a function operate over the aggregate-root when it only is concerned with an entity within it.

Forgive me, this part of the example was just bad. I wouldn’t write it that way in my codebase.

And it totally makes sense to not re-implement the same change address code all over the place, factoring code that manipulate a user in a user namespace can make sense, but I’d say more important than the namespace is factoring it inside functions. The namespace is just organizational. It is the function that gives you the ease of refactoring. By sharing the same code through a shared function, it means a change in that code reflects everywhere at ounce.

I think this is the lesson I recently learned. Instead of calling functions that operate on entities, I was manipulating the inside of the entities freely, often using assoc-in and update-in, and sometimes even changing entities inside entities. Given the idea of global semantics and the fact that namespace dependence was explicit, I’d somehow developed an illusion that specs would protect me from any hazard. In reality, the primary role of specs is validation and they alone don’t provide semantic boundaries. While it is handy to operate on entity maps directly, delegating the operations to functions that do not expose the insides of the entity maps leads to better code organization in the long run.

Speaking of DDD, do you have a strict preferece between a flat map and an organized map with one additional layer (but not overly nested map)? For example,

{::account/id    1
 ::account/name  "Me"
 ::account/email "abc@abc.com"
 ...}

versus

{::account/id    1
 ::account/profile {::account/name  "Me"
                    ...}
 ::account/contact {::account/email "abc@abc.com"
                    ...}
 ...}

or perhaps

{::account/id    1
 ::account/profile {::profile/name  "Me"
                    ...}
 ::account/contact {::contact/email "abc@abc.com"
                    ...}
 ...}

I find it difficult to work with the format of {::A/B {::A/C nil}}, so I’d choose the first or the last option. The latter gives me better organization but creates more entities inside on entity (profile, contact, status, preferences, family-members). The former gives me a map with many many keys but I can still organize it well using the idea of schema/select in clojure.spec-2. In this case, the groupings of profile, contact, status, preferences, family-members are schemas and not entities. Currently I’m gravitating towards this option. I guess it’s a matter of personal taste in the end but I’m curious to know if you have a take on this as a practitioner of DDD.

jjttjj · June 26, 2020, 4:36am

Entity maps should be as flat as possible (but no flatter). I’d go with the first option as a default and just destructure the keys you need in the functions you use the entity in. Or you can use select-keys if you need just a “profile” subset. A reason to nest is if you have a one to many relationship, like your addresses example earlier, but otherwise I’d keep things flat.

In the latter two examples you’re really building a view on your data into the data scructure itself, so that you could do something like (::account/profile user). But I think it’s better to do something like the following if you want a view on your data:

(def user
  {::account/id    1
   ::account/name  "Me"
   ::account/email "abc@abc.com"
   ::profile/image "me.jpg"
   ::profile/nickname "meeee"})

(def profile-keys [::profile/image ::profile/nickname])

(defn profile [user] (select-keys user profile-keys))

A flat map is inherently less complex than a nested one. A user entity with any number of attributes from wherever is still one concept. A user that “has a profile” is two concepts. Sometimes you need that but often, like in your example, you don’t.

You will need to have aggregate values in your user entity like addresses, but IMO you shouldn’t just nest for organizational purposes. The user is an entity and the user map should be all of the user’s attributes.

shomiyamoto · June 26, 2020, 8:38am

I understand your reasoning. In general, I prefer flat maps over nested maps. One point I’d disagree with you is that I care about code organization if it helps reduce the cognitive cost of working with code. For example, I prefer to organize my codebase in such a way that symbols and namespaces match up nicely:

(defn phone-number [entity]
  (let [contact      (::entity/contact entity)
        phone-number (::contact/phone-number contact)]
    phone-number))

That said, I also understand your point. Creating separate namespaces/entities leads to more design considerations.

To analyze it from a DDD point of view, it’s ultimatley a matter of whether a concept in question is a first-class domain entity. Without doubt, “account” is a first-class entity. “Profile,” “contact,” or “preferences” may or may not be. If one wants a rich functionality on a concept and if the concept appears in multiple entities, one could justify giving the concept a first-class treatment by offering a proper namespace and upgrading it to entity.

From a stylistic and organizational perspective I discussed above, I have so far avoided having keywords with different namespaces in the same map so that an account entity can contain ::account/some-key but not ::profile/some-key. I am curious which side clojurians tend to take on.

Anthony_Leonard · July 14, 2020, 1:16am

I am also very interested in several areas raised in this thread. All my “insights” are second hand - if you’ll allow me to name drop…

APIs between code “modules” - reminds me of Uncle Bob’s “clean architecture” which I’d broadly recommend, and see often in Java now, but rarely in Clojure.
spec, information and code growth - Rich Hickey’s Speculation and Effective Programming (and Datomic) talks really opened my brain about how information has no “internals” to leak, is “sparse” and is easy to reuse when defined as leaves/datoms but very hard to reuse when defined inside in entities. Also if you only grow the semantics (i.e. no breaking changes) there’s no refactoring needed at all. If an attribute or relationship needs a breaking change, give it a new name.
DDD and “semantic boundaries” - reminds me of the “semantic coupling” Michael Nygard talks about in his “Uncoupling” talks. Also Eric Evans on “context maps”. They make clear that although information has no internals to leak, it can still be leaked itself - so passing around too much information should be avoided. As I’ve said before - the namespacing of information is an important tool here. I do begin to wonder whether separate bounded contexts are better described as separate namespaces of information, allowing clarity over whose definition holds sway when passing info between them.
flat vs nested maps. I have been promoting flat over nested maps for a while, but had to accept there was more to it. A user modelled as an enitity map with 3 address line values at the root level causes problems when it changes to a 2 line address; or indeed changes to next door (was that a typo edit or a house move?) In the end the underlying graph like way of presenting information as triples [entity attribute value] is much less ambiguous. Mark Bastian’s “Data modelling for heroes” is a great place to start with this. With that in the tank the flat vs nested question reduces somewhat as it’s no longer a data model and just depends what suits you as a client at the time, but in general - to use my example above - using a nested map allows you to name the :hasAddress relationship between the user entity and the address entity at least. (Following up this question was actually what took me to Erics talk on this in the first place )

shomiyamoto · July 14, 2020, 4:19am

Information is easy to reuse when defined as leaves/datoms but very hard to reuse when defined inside in entities.

I second this. Before my recent refactoring, I’d modeled in such a way that a user entity owns a language entity that in turn owns a region entity. For example, (is (= "US" (get-in some-user [::user/language ::language/region ::region/name])). I removed language and region entities and made a user entity own ::user/language (en) and ::user/language-region (US) as attributes. I initially resisted to the sounding of “lanuage-region”, but in my use case, I didn’t see any rationale for defining language and region as entities. When I removed these superfluous entities from my codebase, not only could I simplify relevant APIs but I could see more clearly what really are essential entities in my domain.

If you only grow the semantics (i.e. no breaking changes) there’s no refactoring needed at all. If an attribute or relationship needs a breaking change, give it a new name.

I am grateful that the clojure core team commits not to introduce breaking changes and think that this is a good principle for language designers and library maintainers. But, I would not apply it to the programming of an information system that is domain-specific and as such is private in nature. There is a value in deleting semantics, especially if they are incorrect, odd-sounding, or outdated. In these cases, keeping them in a codebase is bad, confusing, or distracting.

Anthony_Leonard · July 15, 2020, 1:07am

Agreed - but if those old semantics are in your production data, or exposed via APIs to other teams (other domains? other contexts?) then you’re stuck with them, barring some nasty data migrations or coordinated changes to get rid of them. If those teams are in the same organisation then “expand and contract” udually works out just fine here - but the expand bit is exactly the same process as “growing” in the sense mentioned above. So in the end it’s not just an issue for language and library designers but for everyone who persists or exposes information beyond their standup I think?

shomiyamoto · July 15, 2020, 4:23am

I suppose that’d be the norm. I just want to challenge the norm as I happen to be a business owner and can exercise full control over my code base. I haven’t deployed my code to production yet and I can’t make any assertive claim, but it’s my plan to keep everything up-to-date at all stages. I’d have to see how this strategy will play out, but I see it the only way for me to keep my sanity.

From a theoretical point of view, I would claim that even in that “expand and contract” situation, it’d be a right thing to go from an old name with some code that does’t do justice to the name, to some temporary name, and eventually to the original name with good code. My reasoning here is the presence of limited cognition plus the idea that there is no such thing as a true synonym. It’s just an idealistic thought that there should be a 1-to-1 mapping between the space of the signifier and that of the signified.

system · January 13, 2021, 4:23pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.