Should we really use Clojure's syntax for namespaced keys?

chpill · February 2, 2018, 9:45am

My knowledge of Spec is still a bit superficial, but I think this is what :req-un and :opt-un are for?

You can definitely check unqualified keywords using spec. But namespaced qualified keys when used in maps get a special treatement, as their spec is globally enforced once registered. Consider the following example:

(require '[clojure.spec.alpha :as s])
(require '[clojure.spec.test.alpha :as stest])

(s/def :mc.cust/first-name string?)
(s/def :mc.contact/email string?)

(defn greet [customer]
  (str "Hello, " (:mc.cust/first-name customer)))

(s/fdef greet
        :args (s/cat :customer (s/keys :req [:mc.cust/first-name])))

(stest/instrument `greet)


(greet {:mc.cust/first-name "Bob" :mc.contact/email {}})
;; Throws, the following:
ExceptionInfo Call to #'user/greet did not conform to spec:
In: [0 :mc.contact/email] val: {} fails spec: :mc.contact/email at: [:args :customer :mc.contact/email] predicate: string?
clojure.core/ex-info (core.clj:4739)

Our spec’ed function only defined :mc.cust/first-name, and really it is the only thing it cares about. But because :mc.cust/email is a fully qualified keyword in the standard sense, spec enforces that if it is present in the map argument, it has to be a string.

I think it’s a really smart way of keeping dynamicity in the system. You do not have to provide for each function spec all the :opt-un [...] for every possible piece of information that might flow through your system. But you still maintain a strong integrity of the value attached to your global names, and hopefully you get an error message closer to where you corrupted your data.

As I said before, maybe it’s just a matter of convincing spec to treat “all-terrain” keywords as fully-qualified to regain that property.

vvvvalvalval · February 2, 2018, 9:52am

@chpill In your opinion, what do I lose if I transform your example to the following?

(require '[clojure.spec.alpha :as s])
(require '[clojure.spec.test.alpha :as stest])
(require '[my-company.specs :as msp])

(s/def ::msp/mc_cust_first_name string?)
(s/def ::msp/mc_cust_email string?)

(defn greet [customer]
  (str "Hello, " (:mc_cust_first_name customer)))

(s/fdef greet
        :args (s/cat :customer (s/keys :req-un [::msp/mc_cust_first_name])))

chpill · February 2, 2018, 10:24am

@vvvvalvalval calling (greet {:mc_cust_first_name "Bob" :mc_cust_email 42}) won’t raise any error here. We would need to provide an :opt-un [::msp/mc_cust_email] to detect an issue early.

The example here isn’t very interesting because the function does not return a map, but we write and compose functions that assoc, dissoc, update on maps all the time. They only deal with a subset of possible keys , and do not assume much about the rest. That bring us great composability, and also a great clarity thanks to the -> thread first macro.

Being able to continue using that style while enforcing globally that pieces of information are valid wherever they may be is what spec is about I think. you lose that by not using classical namespaced qualified keywords in the maps you pass around your program.

vvvvalvalval · February 2, 2018, 1:01pm

Oh I see. Well, as I said above, I believe that this is a limitation of Spec, not of the program - that’s spec saying “I will encourage you to use Clojure’s convention for namespacing keys and I refuse to cooperate with a system that doesn’t use this convention.”.

This could be solved by allowing spec to accept non-Clojure-namespaced keys. Maybe with an s/def-un macro:

(s/def-un :mc_cust_first_name string?)

This way the fact that the key needs to be unique would still be very ‘in the face’ of the user.

didibus · February 2, 2018, 9:32pm

I believe that this is a limitation of Spec

There’s no such restrictions. The name of the spec is not the same as the key on an associative data structure. Spec must be keyed by a namespaced keyword, but your data being specced need not. You can use s/keys with req-un and opt-un for that.

So you’d have:

(s/def :customer/customer_name string?)
(s/def :customer (s/keys :req-un [:customer/customer_name]))

Which would spec the following associative:

(s/valid :customer/customer_name {:customer_name "John Doe"})
true

So as long as your other systems can support a colon as their first character, you can keep their name the same accross Clojure. You’ll still need to coerce the keyword into a string and back at the boundary though.

There’s actually also a way to do this with string keys or any other using s/keys* I was told, but I haven’t tried it. That way you could even keep the type a string in Clojure, meaning you wouldn’t even need to coerce the types back and forth.

My recommendation if you were really interested would be to rally around JSON and model your data inside Clojure to always be valid JSON, that seems like it’ll give you the biggest reach. If you need more powerful modeling then JSON affords, I would look into Transit or ION as the next level up, both will have a good reach and compatibility across languages.

vvvvalvalval · February 2, 2018, 10:10pm

you can use s/keys with req-un and opt-un for that.

It seems to me though that @chpill just demonstrated that req-un and opt-un have limitations that req and opt don’t have?

This seems to be confirmed by Spec’s rationale: “Note that this cannot convey the same power to unqualified keywords as have namespaced keywords - the resulting maps are not self-describing.”

So as long as your other systems can support a colon as their first character, you can keep their name the same accross Clojure.

I really don’t think the colon matters it’s fine in practice if the key is :customer_first_name in Clojure and customer_first_name in JS / GraphQL / Postgres / ElasticSearch / whatever, they’re equivalent for most practical purposes (including text-based search, which is really what I want to emphasize here).

alex-dixon · February 3, 2018, 3:52am

I’d follow the conventions of the language or the format I’m serializing to.

For JS/JSON I’d convert namespaced keywords to camelCase with no hyphens or dots, strip out com.my-comp if it’s there. Convert to the Clojure convention if I ever got it back.

vvvvalvalval · February 3, 2018, 7:45am

I’ve been there (I just went through a massive refactoring of our JS code to use namespaced keys, because keeping track of the data was harder and harder). That’s essentially Approach 1. I strongly recommend against it. Here’s what I took away from this experience:

The benefits of having namespaced / globally-unique keys outweighs the convenience of following JS convention for keys (or of having short keys for that matter)
What matters is the ability to identify a key at first sight without any more context, and to perform whole-system searches of the uses of a key.

timothypratley · February 7, 2018, 2:13am

Just an off the wall thought here that might be obvious;
If your primary concern is search-ability… consider thinking of the separator as a dot instead of an underscore or hypen.

$ ag foo.bar
foo.txt
1:foo_bar
2:foo-bar

baz.cljs
64: (let [m {:foo.bar/booz {“baz%?” 2}}]

Instead of searching for my.company/foo-bar or my_company_foo_bar, if you just search for
"my.company.foo.bar" you will find all references (assuming your search supports regex, which most do). Regex matching “.” doesn’t care if you use underscores, hypens, slashes.

So maybe it’s worth thinking of the separator as . even if it’s not /shrug

For camel case it’s a bit harder to just think of it as . because you need .? to match:
ag foo.?bar
foo.txt
1:fooBar
2:foo_bar
3:foo-bar

Possible, but not fun.

I realize it is somewhat tangential to the discussion… I’m just offering this as a practical approach to surviving where various styles exist.

Anthony_Leonard · May 31, 2018, 12:01am

Thanks for the discussion. I am having exactly the same thoughts. For my purposes the benefits of namespaced keys are:

reducing complexity from ill designed (“subjective”, “parochial”) nested document data and instead sharing flattened maps with longer keys. This avoids the problems of nested data including duplication of the same data item at different levels, ambiguous “deep” merging policies, loss of transparancy of which functions use which data items at the calling site, and terrible java support for nested data structures full stop.
Defining a catalogue of data items (perhaps even collecting their sources and usage) principally to give BAs back a data “schema” of the kind they used to understand and enjoy with older RDBMS data, so that they can define constraints and invariants about their data items formally and specifically.

In our company we have an ADR that all externalised (i.e. wire or stored) data use JSON convention camel case keys with translation to language conventions as required (Clojure, Java, JS), so I guess we chose 2 and lost the ability to grep the codebase in one for each item. Still for me the nesting thing is more of a killer as data items are constantly being rehashed into different names and structures for convenience in every given context anyway.

wilkerlucio · May 31, 2018, 12:08pm

I like the idea 2. Specially because in Clojure we already use the - pattern everywhere. But I really like the idea of defining a standard about how to externalize it. It doesn’t have to be that pretty, just expressive enough to still have the name uniqueness. I think it’s safe to say we can rely on what JSON support (which in fact supports full strings, we can pull the entire keyword as-is there if we want, then it’s just matter of how each client wants to deserialize it).

An extra advantage of keep the full names on JSON (instead of throwing away the namespace), is that you could have a similar “spec db” on the client side, which could be used to transform that data in whatever format the reader wants it too, I think it’s bad that we lose information when drop the namespace.

ericnormand · July 5, 2018, 11:58pm

Hey @vvvvalvalval,

That’s an interesting problem. I’ve encountered it in specific circumstances and found workarounds, but never had to address it generally. These are just some random thoughts.

One solution I’ve used is to just use fully qualified keywords as strings (pr-str them and drop the “:”) for JSON keys and Postgres column names. In Postgres, you have to use the quoted identifier, which allows anything except the null character. Then you can output them and read them back in with a simple call to keyword. This should also work generally with languages that use hashmaps.

I’m not a fan of #1. There are too many collisions.

For approach #2, this is how Clojure manages to use hyphens, question marks, and bangs in symbol names, yet still use those to generate valid Java classnames. There’s a function called clojure.core/munge that does this systematically. It’s a one-way function, though, which suits the Clojure compiler’s purpose.

There could be something like munge, with similar conventions. One you might try is to use convert . to _ in the namespace, and - to _ in the name. The / could become __ (double underscore).

It’s not perfect, but it corresponds to other conventions I’ve seen. The real issues are what to do with hyphens in the namespace and dots in the name. They’re less common, but that would only make the times they do occur harder to debug.

A hybrid with #3 is probably the sanest. There’s no reason the namespaces of Clojure keywords need to correspond to Clojure’s lib namespaces (typical code files with an ns declaration at the top). I think with the profusion of Spec examples that use the :: syntax, we often get confused about that. :: is just a lazy way of adding a namespace. We should be thinking more about what the public, global, permanent name of a thing should be than whatever file it happens to be in when we write the code.

So, there’s no reason you can’t do this as a namespaced key:

org_clojureverse_user__first_name, which converts to :org_clojureverse_user/first_name.

That said, I’d carefully evaluate the systems that will need to use these names. I’d rather use JavaScript’s object["org.clojureverse.user/first-name"] syntax than have to come up with a translation system that can go both ways. It’s not like object.org_clojureverse_user__first_name is any less ugly

Eric

mvarela · November 16, 2018, 8:42pm

Actually, the output of munge, when applied to keywords, seems quite reversible:

user> (munge :foo.bar/baz)
: => "_COLON_foo.bar_SLASH_baz"
user> (munge ::foo)
: => "_COLON_user_SLASH_foo"

Probably not nice to deal on the JS side, but for programmatic access, it might be ok.

danielcompton · November 18, 2018, 7:32pm

I think the issue with reversibility is that information is lost in the transformation. The result of munging the two keywords below is the same, but the source is different.

(munge :foo.bar/baz)
=> "_COLON_foo.bar_SLASH_baz"
(munge :foo.bar_SLASH_baz)
=> "_COLON_foo.bar_SLASH_baz"

Given "_COLON_foo.bar_SLASH_baz" it’s not possible to reliably map that back to the original keyword. You can make a pretty good assumption, but it’s not strictly reversible.

mvarela · November 18, 2018, 11:12pm

Indeed, though in practice, and if we’re writing idiomatically, you won’t be likely to find keys like this.

greybird · November 29, 2018, 10:05pm

I think this is a worthwhile discussion, but I’m confused by the fact that the examples being used are :mc.cust/first-name and :mc.contact/email. My understanding is that the reason for namespaced keywords in specs is to avoid conflicts when libraries are used by more than one group/company. But these examples are putting the entity type (cust and contact) into the namespace, which I think is not intended.

I think the idea in spec is to have an mc/email spec and keyword that would be used for email attributes of multiple entities within mc, and that would distinguish it from email keywords used outside of mc. Within mc, you wouldn’t want two specs for customer email and admin email properties. Maybe email is a bad example because it could be standardized globally, but hopefully you see what I mean. My point is that, from reading the spec doc, I don’t believe the namespace is intended to contain the entity type.

If the spec/keyword namespace is simply an org or product name of some kind then it should not be included in SQL column names or JS/JSON field names, since these are not namespaced identifiers – their namespace is their container’s name or is implicit. The fact that keywords can be namespaced in clojure is pretty unique – it’s not an option for most other types of property identifiers.

Is there an assumption in this discussion that keywords should contain entity types (either in the namespace or the keyword name itself)? Am I missing something and is this something commonly done in clojure? (I’m new to clojure.)

vvvvalvalval · November 29, 2018, 11:26pm

My understanding of Spec disagrees with yours. You would typically want to have a dedicated :mc.customer/email spec, both so that customer maps are self-descriptive, and also to have a level of indirection allowing for changing your mind about the contract which should be fulfilled by a customer email. You could, of course, define the :mc.customer/email spec in terms of a more generic email spec:

(s/def :mc.customer/email :mc.specs/email)

This is not about entity types - namespaces are here to prevent conflicts by placing the name in the context of a domain. That domain could be the entity type, it could be something else - the point is, there has to be a namespace, for the benefits outlined above. You just don’t want your code to contain keys which intended context (and therefore meaning) is not visible, such as :id, :name or :type.

I simply disagree with that, from experience using both approaches. I know it’s not common practice to put namespaces in SQL column names or JSON keys, but I believe many projects would benefit a lot from adopting this convention. More effort writing the code, less effort reading it - we’re often too lazy to do that, and as a result most people don’t do it, but it’s worth it.

greybird · November 30, 2018, 1:07am

The map could be self-describing using a :type property of some kind. I think you’re saying that the individual properties should also be self-describing. I see the advantages of that, but there are also some drawbacks.

vvvvalvalval:

… and also to have a level of indirection allowing for changing your mind about the contract which should be fulfilled by a customer email. You could, of course, define the :mc.customer/email spec in terms of a more generic email spec:
(s/def :mc.customer/email :mc.specs/email)

I do see your point, but I doubt that the benefits are worth the cost. Using plain old :id, :first-name and :last-name (or for a shared definition, :mc/id, :mc/first-name and :mc/last-name) makes it easy to treat all entities generically. That way you can have common functions for different types of entities that have attributes in common. Or were you thinking such common functions would ignore the namespace?

I know entity types are just one example of a “domain” but they’re the only concrete thing I’ve heard discussed so far.

To me whether this is worthwhile depends on exactly what the “domain” is. I think the negatives outweigh the positives when domain is entity type.

Imagine using this approach with entity types in SQL/JS. The SQL/JS code would be much more verbose than the clojure code, because in clojure at least we have namespace aliases to reduce verbosity. Maybe this is a worst case example, but imagine that all SQL columns in the employee table are prefixed with employee_. This wold be completely redundant since the table is always a qualifier (implicitly or explicitly) for the column name.

Do you have any examples of domains other than entity type where you would use this approach? Maybe the drawbacks are not so large in other cases.

vvvvalvalval · November 30, 2018, 6:31am

Self-describing, namespaced keys have been the idiomatic approach in Clojure projects for years now, and appeared out of experience from the alternative you suggested - so if you’re refuting the value of this approach, you probably want to take this to a broader discussion, and be ready to contradict years of empirical evidence in the Clojure ecosystem . Note that the self-descriptive nature of keys is an explicit intention of spec.

If you’re new to Clojure, I understand how this can be unintuitive, especially if you’re coming from nominally-typed languages such as Java, Scala etc.

Using a single :id key instead of :mc/customer-id and :mc/product-id does not usually give you more benefits in genericity, it just makes your code ambiguous and full of implicit assumptions. You already get the genericity by treating keys as first-class values, and when doing that the genericity is explicit in your code.

Again, that’s not just a theoretical argument: there’s a lot of empirical evidence of this in the Clojure community. People have moved from unnamespaced to namespaced keys.

Well, I don’t need to imagine since I have used this approach with entity types in SQL and JS . I can testify it’s slightly more verbose, not much more. The code is actually more readable, because you don’t need to run a type inferencing algorithm in your head to understand the meaning of a key. Namespace aliases are just the icing on the cake - I usually don’t use them in Clojure either. And as I mentioned in the original post, I’m happy to give up on Clojure’s namespace sugar if that enables me to use the same namespacing syntax across the whole stack.

That’s simply not a big deal. Actually, this can be helpful in obviating what columns are specific to this tables (e.g employee_salary) and what columns are more general than that (e.g entity_last_modified or person_name).

greybird · November 30, 2018, 2:36pm

You’re right that using namespaces is important when using spec and I shouldn’t have given the examples with no namespace. It’s specifically using entity type in the namespace that doesn’t seem best to me, and I was convinced of that by reading about spec and datomic. I’ll spend some time this weekend or next week to try to find the parts of the doc that gave me this impression and post more about it.

You’re also right that this is not the topic of your post (it is about the syntax of the namespaces), so I probably should start a new topic. If you happen to know of previous discussions on putting the entity type in the namespace, please let me know, but please don’t spend a lot of time searching. I can also accept that some people may put entity types in their namespaces and others may not, if that turns out to be the conclusion, but I need to decide for myself what approach I’d like to take.

Thanks for the discussion.