Should we really use Clojure's syntax for namespaced keys?

vvvvalvalval · January 27, 2018, 3:52pm

Namespaced keys are awesome. As many others, I’ve found that using keys/attributes/fields/etc. which name uniquely identify one type of information without any more context needed is a very efficient strategy to make programs that are clearer and more regular all across the stack.

What I’m questioning here is the use of Clojure’s syntax/convention for namespacing keys, i.e the use of dots . and slashes /, e.g :org.foo.user/first-name instead of say org_foo_user_first_name.

Clojure’s convention is arguably more visually pleasing, but it can get unwieldly or even impossible to use outside of Clojure code, e.g in a JavaScript client, as PostgreSQL column names, in an ElasticSearch index… which seems to defeat the purpose of namespaced keywords, which is to make the flow of information straightforwardly to follow in a very pervasive way (more pervasive than a type system for instance). This seems to go against an excellent piece of advice I heard from Stuart Halloway: don’t put your language semantics over the wire.

One can imagine several strategies to mitigate this problem:

Ditch the namespace part outside of Clojure code, e.g :org.foo.user/id -> "id" .
Have some form of systematic translation from one namespacing convention to another, e.g :org.foo.user/first-name -> orgFooUser_firstName
Stop using Clojure’s namespacing convention for keys that will travel outside of your Clojure app’s boundaries - which is probably most keys representing information.

Approach 1 is what I first tried when building BandSquare, and I really regretted it - I think it eliminates many benefits of namespacing keys. Approach 2 seems hacky at best, and you get much less leverage from your tooling. So I’m starting to think approach 3 is the sanest default.

This question is even more important now that Spec exists, because Spec encourages Clojure’s syntax for keywords and associates specific semantics to it.

What do you think?

timothypratley · January 28, 2018, 12:53am

(3) can be leaky. I have a bunch of data in my database, I query it, it comes back like {:my_column “foo”} … no big deal right? That data gets passed through the system and somewhere I have a function that destructures {:keys [my-column]}. The query followed underscore convention and the function followed lisp hyphenated convention, so this is a bug and my-column is always nil.

The fix is to destructure it as my_column and all is well. Then I call that function in another place and I pass in {:my-column “foo”} because I constructed the map instead of reading it from a database. And I get the reverse bug pop up because now I should be using {:my_column “foo”}.

The underscores slowly spread through my code further and further as more functionality gets added… and that increases the chances of an _ vs - bug to creep in. (3) ends up permeating the code. For better or worse the hyphen style of Clojure is the defacto, libraries etc follow that convention, so (3) leads to having a mix of styles in code.

I like (2)… I used to think it was unnecessary and just hair-splitting but for a large project I prefer it to dealing with bugs due to style mismatches. Having said that, its only a preference based on my experiences; and other approaches work well too!

vvvvalvalval · January 28, 2018, 2:23pm

@timothypratley Well, your example doesn’t really fall in the scope of the discussion IMHO, because in this case (:my_column) you’re not using namespacing at all!

Regarding the use of underscores: I really don’t think these little inconsistencies / typos are significant compared to the ability to know exactly with a simple text search all the places where a particular piece of information is used. (Granted, this may be because I’m dealing with a 4-years old 100k LoC codebase mixing a lot of different technologies.)

I will say however that I don’t feel sticking to the hyphens convention really makes for fewer typos (it doesn’t prevent me from writing :customer/fisrt-name for instance; what does is tooling like Cursive’s completion, and tests), and that mixing hyphens and underscores can actually improve readability by adding contrast: you can immediately tell if a given key is “internal” or meant to travel across the surrounding system.

But again, even assuming that Approach 3 does cause more typos and a disturbing mix of styles, I’m totally willing to pay that price.

timothypratley · January 28, 2018, 9:12pm

Cool; I can’t think of any disadvantages, so I say go for it!

myguidingstar · January 29, 2018, 3:54am

Interesting. Most of us can’t avoid using external software systems that don’t have equivalent of namespaced keywords.
I think we can set up a github repo providing both convention (like clojure-style-guide) and implementation for converting between namespaced kewords and the plain versions. As an open source project, we will figure out best strategy for each software systems, as well as common pitfalls.

didibus · January 29, 2018, 6:14pm

I’m in favor of approach 2. With 1, you just lost the context of the key name. With 3, you’re replacing the issues with forcing Clojure’s namespace syntax everywhere with the reverse, you’re now forcing your PostreSQL namespace syntax everywhere.

#2 I feel respects boundaries, in that the data is modeled to the liking of the component using it. The data model and the component using it is always coupled, but interacters of the component aren’t coupled with it, because you have a data model translation between them.

DjebbZ · January 29, 2018, 9:21pm

I’m already using #2, and I agree with people favoring it too. It’s a simple translation function for us (a custom version of clojure.walk/keywordize-keys) and that’s it.

timgilbert · January 29, 2018, 9:55pm

In fairness, most external systems don’t have support for keywords as keywords, let alone namespaced keywords. In the past I’ve had some success stringifying them and turning them into map keys, ala { 'user/name': 'Joe User', 'user/zip' 10013 }, but I think that when you accept that you are going to be translating between strings and keywords anyways at your language boundaries, option 2 begins to seem less hacky.

vvvvalvalval · January 30, 2018, 8:38am

most external systems don’t have support for keywords as keywords, let alone namespaced keywords

Even in Clojure, there’s no objective notion of a “namespaced keyword” - a keyword is namespaced if it makes it clear in what context it should be used, and :customer/first-name or :customer_first_name both achieve that goal.

Granted, Clojure provides some conveniences for its own keyword convention - but IMO that’s a minor win compared to the pervasive use of one all-encompassing convention.

you’re now forcing your PostreSQL namespace syntax everywhere.

I just don’t see how that’s an issue - this ‘underscore-based’ convention I’m advocating is not equivalent to Clojure’s convention because it’s a “lowest common denominator” approach, which will be supported in any system / language (at least I can’t think of one that won’t support it.)

This discussion is getting way too theoretical even for me, so let me give some empirical evidence.

I’m refactoring a Clojure + Datomic + JavaScript system which used to have Clojure-style namespaced keywords on the Clojure-side and non-namespaced keys on the JavaScript, with translation layers that added/stripped namespaced between Clojure and JavaScript. Now the JavaScript side uses Clojure-style keys as well; it’s ugly and impractical to write, but still I’m winning massively in terms of ease of maintenance and simplicity - which shows that respective JavaScript’s convention for keys is less important than having the same convention everywhere. By far, in my experience, the ability to search for the uses of a key across all languages outweighs the niceties of a given specific notation for the key. If this is true for JavaScript, it should be true for Clojure as well.

Now the problem with my current state of things is: while I can get away with Clojure’s convention in JavaScript, I can’t push it to my ElasticSearch and PostgreSQL materialized views, not to potential external API consumers.

didibus · January 30, 2018, 9:08am

Even in Clojure, there’s no objective notion of a “namespaced keyword” - a keyword is namespaced if it makes it clear in what context it should be used, and :customer/first-name

Clojure keywords have a concrete namespace part and name part. These can be programmatically queried unambiguously. Your convention can not, because you can’t parse it and know where the namespace end and the name begin.

Practically, namespaces for keywords serve only one purpose, to avoid key clashes. When you insert them into Postgress or transfer them over the network to JS, you don’t have a keyword anymore, but a string. You therefore need to choose a serialization strategy. EDN is the default one, pr-str will output EDN, and in EDN keywords are represented as :namespace/name. EDN is of type String, why can’t Postgress or ElasticSearch work with a string? I find it strange that it would restrict the character set of a string to be less then ascii.

vvvvalvalval · January 30, 2018, 9:20am

Clojure keywords have a concrete namespace part and name part. These can be programmatically queried unambiguously. Your convention can not, because you can’t parse it and know where the namespace end and the name begin.

Yes, and this is what I call “some conveniences for [Clojure’s] own keyword convention”.

Your convention can not, because you can’t parse it and know where the namespace end and the name begin.

Not really important IMO - this “namespace + name” decomposition really only exists so that Clojure can provide the above-mentioned ‘concision-oriented’ conveniences.

Practically, namespaces for keywords serve only one purpose, to avoid key clashes.

Yes, exactly! So we can achieve that with any convention, don’t you agree?

why can’t Postgress or ElasticSearch work with a string?

AFAIK you can’t use any string as a column name in Postgres. Nor as a GraphQL field name. Nor as a class member in Java or Scala or Ruby or whatever language some part of your system may choose to use; and if they choose to use classes to represent data, well, they have every right to do so because it’s idiomatic in their language, and we should not add hurdles to that (back to this principle of making data as language-agnostic as possible).

didibus · January 30, 2018, 7:13pm

AFAIK you can’t use any string as a column name in Postgres. Nor as a GraphQL field name. Nor as a class member in Java or Scala or Ruby or whatever language some part of your system may choose to use

Oh okay, I see what you meant now.

What’s the problem you’re trying to address? I ask because the way I see it, a postgress table, a java class, an ElasticSearch schema, these things already have their own mechanism to avoid key collision. A sql table has a name, so customer table has first_name column. A java Class already has a namespace, so the field names don’t have to have namespaces. So I would find it strange to have customer table and column customer_first_name. That seems redundant.

Yes, and this is what I call “some conveniences for [Clojure’s] own keyword convention”.

Before spec, there was very little utility in namespacing Clojure keys, because key clash even in a map is very rare. So I’d say Spec is the biggest reason for namespacing keys in Clojure. Spec will not work with your convention. So if it doesn’t work with Spec, why use namespaced keys at all?

If you want to store a Clojure map in a sql table, you have a kind of datastructure to relational mapping problem. Maps are sparse matrixes, have no schema, no name and no namespace. You’ll have to have a mapping function to go from one to the other no matter what. You can’t avoid having a mapping function from Map keys to columns and vice versa, and because Keyword is not a supported column name type, you’ll need to transform it to string, so why not underscore it then?

vvvvalvalval · January 31, 2018, 8:58am

What’s the problem you’re trying to address?

The problem I’m trying to address is reality code clarity / ease of reasoning about code, including the tooling that may be involved in that.

I ask because the way I see it, a postgress table, a java class, an ElasticSearch schema, these things already have their own mechanism to avoid key collision.

They do, absolutely! However their client code doesn’t, if you know what I mean.

because key clash even in a map is very rare.

This may be where our disagreement stems from.

If I had a dollar every time I found something like obj.id or ent.type in my JavaScript code, and then had to do type inference by hand to understand what the type of obj / ent was… Fundamentally, every time I struggle with this, I’m asking myself the question: ‘by id, does it mean customer/id, or confirmation-email/id, or blog.post/id… ?’

So I’d say Spec is the biggest reason for namespacing keys in Clojure.

To me the biggest benefit of namespaced keys, by far, is not that they play nicely with spec - it is that they identify a type of information without any more context needed, and make that easily searchable across the codebase of the entire system. Yes, Java and Postgres etc. have type systems and associated tooling to make the context easier to keep track of, but these tools stop helping you once you cross language boundaries - which happens pretty often, especially for debugging and maintenance.

You can’t avoid having a mapping function from Map keys to columns and vice versa

I believe you can, that is the point I’m trying to make. You can, by choosing to write customer_first_name everywhere instead of customer/first-name in your Clojure code and customer_first_name in your SQL and JavaScript and GraphQL code. That is what I mean by “another convention for namespacing”.

Granted, customer_first_name does not mechanically make a namespace+name decomposition apparent - I believe this does not matter, because the goal of preventing name collisions is achieve. I will even go so far as calling it an antipattern to rely programmatically on that decomposition in Clojure code - a key is a key, it’s a scalar, you should not try to treat it like a composite.

chpill · January 31, 2018, 5:08pm

I’m not sure I follow you there… customer_first_name and customer_id have a semantic coupling anyway. I feel like having a way to reify this connection, and mechanically exploit it is a good thing.

That is not to say I don’t understand the value of using names that are self-sufficient AND can survive going over the wire to a foreign land and back. But if we deprive ourselves from every semantic that does not exist in places our data is going be manipulated, the “lowest denominator” effect is going to hit us pretty bad… Like, in a typical web setup, should we not use sets because our JS client and the JSON on the wire between them do not understand them? (Yes, I know about transit and ES2015 sets let’s pretend it’s an old js client…)

My preference would go to option 2) systematic translation. Maybe it wouldn’t feel so hacky if we could share a community agreement (or even tooling?) about the translation.

vvvvalvalval · February 1, 2018, 8:24am

I’m not sure I follow you there… customer_first_name and customer_id have a semantic coupling anyway. I feel like having a way to reify this connection, and mechanically exploit it is a good thing.

And then, how do you mechanically exploit the semantic coupling between :person/first-name and :contact/email-adress?

I strongly recommend against this sort of automatically relying on keyword decomposition, because it creates complexity-by conflating concerns. The sole concern of namespaces is to avoid name clashes: using them to define ‘entity types’ is forcing them to address 2 concerns at once. If you’re going to go down that road, you might as well use a statically-typed class-based language (Although this is a Clojure forum, I’m not saying this to be provocative: I really believe that this is what statically-typed class-based languages do very well, although I also believe it creates accidental complexity).

Like, in a typical web setup, should we not use sets because our JS client and the JSON on the wire between them do not understand them? (Yes, I know about transit and ES2015 sets let’s pretend it’s an old js client…)

This is a different situation, because a list and a set don’t have the same API, whereas all keys / keywords have the same API. We’re really only talking about naming conventions and their practical consequences here, not programming semantics. Changing your Clojure program to use:customer_first_name instead of :customer/first-name doesn’t affect the expressiveness of your Clojure code.

My preference would go to option 2) systematic translation. Maybe it wouldn’t feel so hacky if we could share a community agreement

Well, while I’m very enthusiastic about the Clojure community in general, since I’m advocating an approach that is designed to go beyond our “Clojure bubble” (i.e striving to program in the Language of the System as named by Rich Hickey), I’m a bit skeptical of a convention from the Clojure community for this particular problem.

(or even tooling?) about the translation.

Come on @chpill, you come from the JavaScript world, you know this sort of tooling reeks of incidental complexity!

didibus · February 1, 2018, 8:32am

they identify a type of information without any more context needed, and make that easily searchable across the codebase of the entire system

Ya, I understand what you want, not convinced a naming convention really gives it to you. It seems like it would devolve over time, just like comments always end up outdated. That’s why I’d still rather use the more powerful tools availaible within each system, which gives me better guarantees and more usability then a naming convention and adapt the data across boundaries from one another.

It would be sweet for multiple systems, languages and frameworks to all somehow adopt a common data modeling mechanism, but I think that’s a pipe dream. Though a Clojure/ClojureScript/Datomic workflow gets pretty close, with EDN and Spec, but even that isn’t perfect. I’ve also heard of cross system boundary types in experimental Haskell which sounded pretty promising.

vvvvalvalval · February 1, 2018, 9:27am

Ya, I understand what you want, not convinced a naming convention really gives it to you.

Well, a naming convention certainly can’t give you everything but I do think it’s a significant step forward.

Besides, can you name a concrete example of a significant downside to the syntax I propose? One concrete example where it would create a real incompatibility or inconvenience? I can’t think of one.

That’s why I’d still rather use the more powerful tools availaible within each system, which gives me better guarantees and more usability

They do… only inside the one particular language, and then fall short when used system-wide.

It would be sweet for multiple systems, languages and frameworks to all somehow adopt a common data modeling mechanism, but I think that’s a pipe dream.

Consider this: isn’t it an even more remote pipe dream that we can all happily enjoy our language bubble and hope that tooling will eventually bridge the impedance mismatches between the specific idioms of all languages? It seems to me this has been proven wrong by history many times.

chpill · February 1, 2018, 3:54pm

Come on @chpill, you come from the JavaScript world, you know this sort of tooling reeks of incidental complexity!

Okay, I have to admit, I don’t want to go back to that world

We’re really only talking about naming conventions and their practical consequences here, not programming semantics. Changing your Clojure program to use:customer_first_name instead of :customer/first-name doesn’t affect the expressiveness of your Clojure code.

Well, It has a semantic effect in the sense that spec will not validate those pieces of information as they flow through your system as it would with the classical namespaced keywords (which is in my opinion, one of the core values that set it apart from, say, plumatic.schema)… But maybe this is just a tooling issue. As those option 3) all-terrain keywords are also designed to be unambiguous identifiers, asking spec to treat them the same as the classical namespaced keywords would respect the original spirit. I don’t know much of the inner working of spec though, I’d be interested to know if it could be extended to allow for such a use case.

The more I think about it, the more I like your idea. Since the big thing here is making a bridge to other environments and languages, i wonder how this style of using long unambiguous identifiers would be received on the other side… As we both know the dangerous appetite of JS devs for syntactic sugar, aren’t you afraid they would just refuse to work directly with something like my_company_customer_first_name and insist on a mapping on their side in the end? Are there other communities pushing for this style of representing information?

Thank you for bringing attention to this subject by the way. If we want Clojure to find its place in more of the systems out there, we’ll have to play nice with the other participants.

vvvvalvalval · February 1, 2018, 8:21pm

all-terrain keywords are also designed to be unambiguous identifiers, asking spec to treat them the same as the classical namespaced keywords would respect the original spirit.

My knowledge of Spec is still a bit superficial, but I think this is what :req-un and :opt-un are for?

As we both know the dangerous appetite of JS devs for syntactic sugar1, aren’t you afraid they would just refuse to work directly with something like my_company_customer_first_name and insist on a mapping on their side in the end?

TBH, :my_company_customer_first_name feels a bit long even for me … however I could imagine everyone being fine with :mc_cust_first_name. It’s a bit less approachable to the uninitiated, but I think people would get used to that quickly.

I don’t think you can convince the whole JS (or other languages) ecosystem to adopt this sort of Clojure-inspired best practice, however I could totally imagine doing that for a company, or department, or team.
Especially at the beginning of projects when this sort of decision is made.

didibus · February 2, 2018, 8:01am

can you name a concrete example of a significant downside to the syntax I propose? One concrete example where it would create a real incompatibility or inconvenience?

I’m sure there’s some system out there that doesn’t support the colon character or underscore. So you’d have to truly find the least common denominator across your given systems.

Im trying to think how beneficial this would be. If every dev, over time, touching the code, in all systems, somehow managed to always use the same name for the same information and would never have two different piece of information have the same name, across the board, ya that would be practical. You could easily trace information through the systems and code.

Somehow though, I really don’t believe it’ll work out that way in practice. I think devs will forget, will not know to do it, will overlook that there was already a name for information X, will mistype it, etc.

In a degenerate case like that, I’m not sure its still beneficial. Then I’d rather have my normal keywords when in Clojure, using Clojure idioms. If I have well defined mapping between boundaries, I can still trace information through, its just one extra hop. I look what keyword X in Clojure gets mapped to in my Postgress table for example. Its not that much more effort, but it means I can leverage spec fully in Clojure, I can leverage Java Classes fully in Java, etc.

So I’m making compromises at the boundaries, but not within a single component, and I feel that might end up being better then compromising within components to have uniformity across boundaries.