Optional keys in (JSON-)schema

I’m defining JSON Schemas for a customer
and I’m not sure how I should handle optional keys.

In Clojure, I would just mark the key as optional
and all is good, but this schema will not be used
with Clojure.
In SQL I’m in the “avoid NULLable fields” camp.
So an optional string column should not be nullable
and just have an empty string as default instead.
Imho it just makes usage later easier.

So let’s say I have a map for a user with first,
last and optional middle name.

Would you (and why):
(malli syntax)

a) Mark it as optional and also allow nil

 [:first string?]
 [:middle {:optional true} [:or nil? string?]]
 [:last string?]]

b) Mark as optional and not allow nil

 [:first string?]
 [:middle {:optional true} string?]
 [:last string?]]

c) Must be present but with nil possible

 [:first string?]
 [:middle [:or nil? string?]]
 [:last string?]]

d) Must be present but always non-nil with a default

 [:first string?]
 [:middle {:default ""} string?]
 [:last string?]]


I lean on this. Or instead of nil being possible, something like: Must be present and not nil but can be :not-provided.

My reason is because I think about the form presented to the user, did we ask for a middle name on it or not? If the form didn’t have a middle name input box, than I’d make the key missing, since the form didn’t ask for middle name, there is no middle name key. If the form does have a middle name input box, then I want to know that, and then I want to know that the user didn’t enter anything in it, which is why if they don’t provide a middle name on it I would make it either nil which in my system is interpreted as :not-provided or I’d use a keyword or a special type to indicate when it is not provided (which would not be a string as not to conflict with a user providing as a middle name the name not-provided)

I find that knowing the difference between the data not having been found but looked for, and not having been provided but asked for to come in handy sometimes, so I like to be explicit about modeling it.

Another case is for example if you query something, if you tried to get a middle name from your DB or other, and it was not found, than I’d rather get {:middle-name :not-found} or even {:middle-name nil} instead of {}. I’ll assume if there is no middle-name key, no attempt was made to acquire it, where as if the key is there but nil or a keyword explaining why it’s not there, that an attempt to acquire it was made, but yielded nothing. The keyword is even better because it can tell you why it yielded nothing, like :not-found or :not-provided and is also less likely to be nil due to a bug.

This also has the advantage that with Spec for example, conform can tell the difference between your various schemas.

Like imagine the form a year ago didn’t have a :middle-name key. So the data from a year ago would look like:

{:first "Martin"
 :last "King"}

And eventually a middle name was added and now the data would look like:

{:first "Martin"
 :middle nil
 :last "King"}

You could then spec it like:

(s/def first-last-form
  (s/keys :req-un [::first ::last]))

(s/def first-last-middle-form
  (s/keys :req-un [::first ::last ::middle]))

And now given some form data s/conform can tell you which of these two spec it belongs too. Without you needing to use nominal types.

If you didn’t do it this way, you could not distinguish which type of form this data was? Unless you had a nominal typing scheme like:

{:type :first-last-middle
 :first "Martin"
 :last "King"}

Which now tells you that even though there is no :middle key this was a :first-last-middle form. And so you similarly know that the middle name was asked to be provided by the user but nothing was entered.

So in summary, I like my keys to tell me about what keys are on my data. Keys map to expectations of what data can be there, so if the map represents user input, it means all the keys were something the user was asked to provide. If the keys represent a DB result, all the keys represent the columns in my table. If the user was not asked for it, or the table doesn’t have the column, then the key wouldn’t be on the map for it.

And similarly I like my values to tell me what values are on my data, where I consider lack of a value a value, which I normally am okay using nil for. It means that for some reason it’s not there for this entry.

Ok one last thing, this is not like OO. I just want to point this out. In OO the opposite normally happens. You end up with a class of the union of all possible keys over time. And then things are null, but you don’t know why they are null. Since in OO the object cannot represent “key not applicable”, you’re forced to have the key and put null in it. You can be better in OO as well, and use subtyping to represent all schema variants, but it’s a lot more work and people get lazy and in practice this doesn’t happen.

This is same for anything which cannot mix different schemas. So like SQL DBs suffer from this, since you can’t have old rows without a column. So again, you’re forced to make up some value for all these old rows, and pretend like the user didn’t enter anything or entered some weird default (which is really weird if the column type is a Number)

1 Like