Why are specs not extensible?

One pattern stands out in many of my projects where my code tries to recognize data by its shape using clojure.spec.alpha/confrom. Once annotated, the conformed output goes into a recursive function that traverses and transforms it. With deeply nested data this results in confusing dispatches and destructuring on added tags of specs such as s/or.

However, my only goal is to transform the data, not to annotate it. Since conform knows what the matching spec is (assuming there is one) the transformation could happen right then and there. Yes part of this could be done with conformers but it’s not general enough as we’ll see below. Also, I remember reading that coercion is not in scope for spec. OK, please bear with me–we’ll get to that.

An Example

Specs and data we want to transform:

(s/def ::id integer?)
(s/def ::ref (s/tuple #{:ref} ::id))
(s/def ::entity (s/keys :req-un [::id] :opt-un [::child]))
(s/def ::child (s/or
                :ref ::ref
                :ent ::entity))

(s/conform ::entity {:id 2
                     :child {:id 3
                             :child [:ref 4]}})
; => {:id 2, :child [:ent {:id 3, :child [:ref [:ref 4]]}]}

Transformation function normalize:

Given a nested map of entities that point to other entities or refs we want to form a normalized graph keyed by refs that always point to entities which only ever point to refs of other entities. In other words we want to go from:

{:id    2
 :child {:id    3
         :child [:ref 4]}}

to:

{[:ref 2] {:id 2
           :child [:ref 3]}
 [:ref 3] {:id 3
           :child [:ref 4]}}

Let’s imagine we have (defn spec-walk [f spec x] ...) which is similar to clojure.walk/postwalk yet different because a) it will return ::s/invalid if the data can’t be validated by the spec and b) instead of only providing f with the data at a given node it is also given the key of the matching spec :

(spec-walk (fn [spec-key x]
            (println spec-key x)
            [:transformed x])
          ::entity
          {:id    2
           :child {:id    3
                   :child [:ref 4]}})

; prints:
; ::ref [:ref 4]                           ; first and deepest spec call
; ::entity {:id    3                       ; next up
;          :child [:transformed [:ref 4]]}
; ::entity {:id    2                       ; root entity
;          :child [:transformed {:id    3
;                                :child [:transformed [:ref 4]]}]}

; notice the `s/or` keys are ignored.

; returns
;=> [:transformed {:id    2
;                 :child [:transformed {:id    3
;                                       :child [:transformed [:ref 4]]}]}]

Good, spec-walk can walk our data and tell us which specs succeed along the way. It’s going to be a little ugly but we can imperatively achieve our normalization goal:

(let [out (volatile! {})]
 (spec-walk (fn [spec-key x]
             (case spec-key
               ::ref x
               ::entity (let [ref [:ref (:id entity)]]
                          (vswap! assoc out ref entity)
                          ref)))
           ::entity
           {:id    2
            :child {:id    3
                    :child [:ref 4]}})
 @out)

; should return:
;=> {[:ref 2] {:id    2
;              :child [:ref 3]}
;    [:ref 3] {:id    3
;              :child [:ref 4]}}


We could imagine another function like spec-reduce-replace that passes a context to f and expects it to return the optionally updated context and a replace value for the x:

(defn spec-reduce-replace [f init spec x])

(spec-reduce-replace (fn [ctx spec-key x]
                       (case spec-key
                         ::ref [ctx x]
                         ::entity (let [ref [:ref (:id entity)]]
                                    [(assoc out ref entity) ref])))
                     {}
                     ::entity
                     {:id    2
                      :child {:id    3
                              :child [:ref 4]}})
; should also return
;=> {[:ref 2] {:id    2
;              :child [:ref 3]}
;    [:ref 3] {:id    3
;              :child [:ref 4]}}

OK, time to stop dreaming! Yes, spec-walk and spec-reduce-replace are probably not what spec was originally designed for. However, are specs, like clojure’s datastructures, extensible so that functions can be added for each type of spec? Let’s see:

Here’s the type of our ::id spec from before

(type (s/spec ::id))
;=> clojure.spec.alpha$spec_impl$reify__2069

from clojure.spec.alpha:

(defn ^:skip-wiki spec-impl
  "Do not call this directly, use 'spec'"
  ([form pred gfn cpred?] (spec-impl form pred gfn cpred? nil))
  ([form pred gfn cpred? unc]
   (cond
     (spec? pred) (cond-> pred gfn (with-gen gfn))
     (regex? pred) (regex-spec-impl pred gfn)
     (ident? pred) (cond-> (the-spec pred) gfn (with-gen gfn))
     :else
     (reify
       Spec
       (conform* [_ x] (let [ret (pred x)]
                         (if cpred?
                           ret
                           (if ret x ::invalid))))
       (unform* [_ x] (if cpred?
                        (if unc
                          (unc x)
                          (throw (IllegalStateException. "no unform fn for conformer")))
                        x))
       ; ... elided
       ))))

Every spec-impl, and there are many


uses reify, creating objects implementing the Spec protocol on the fly. In order to extend these objects with another protocol (e.g. SpecTraverse with our methods spec-walk and spec-reduce-replace) each impl would have to be wrapped in a foreign namespace, with another implementation containing another reify that defines methods for SpecTraverse.

How does spec not support Clojure’s essential polymorphism?

AFAIU something like the is impossible with spec and spec2:

(defprotocol LookMagic
  (magic! [this]))

(extend clojure.lang.IPersistentVector
  LookMagic
  {:magic! (fn [this] (conj this "Not exactly magic, but extensible it is!"))})

(magic! [1 2 3])
; => [1 2 3 "Not exactly magic, but extensible it is!"]

This seems incredibly backwards to me and I hope very much that someone will prove me wrong. As for the argument that coercion is not in scope for spec, that’s fine. The entire library ecosystem outside clojure.core was not in scope for clojure but it exists because Clojure can be composed and extended like hardly any other language.

Why is this not true for spec?

2 Likes

On to why you can’t extend Spec types, that’s a good question, and maybe it should be we can. Don’t know the pros/cons.

I think your message is lost though with also talking about why there’s no built in spec based coercion. Conform tells you to what spec a structure conforms too, that’s all.

What you describe at the top is afaict coercion, not conforming. Conforming is about telling you how a value matched a spec, and is not a generic data transformation tool. While conforming is not coercion, I think there is an interesting opportunity to leverage specs for coercion and the closest thing to what I would like that I’m aware of is https://github.com/wilkerlucio/spec-coerce.

As an aside, if you’re having issues with s/or specifically, the (undocumented) s/nonconforming can be a useful thing to wrap around an s/or if you do not want it’s value to be tagged with the or branch. This is undocumented in spec 1 as we were not sure whether it was generally useful. My thought at this point is that untagged s/or’s are frequently useful and we should either keep s/nonconforming or add a flag to s/or or add a new conforming flag - tbd which way we’ll go on that in spec 2.

Re extensibility, spec 1 intentionally does not try to commit to spec extensibility because we were not comfortable in locking down how specs are implemented (and in fact there are two styles - protocol impl and record impl for regex ops). Spec 2 will open this up and essentially has standardized on the protocol version. But, there are several kinds of extensibility baked into spec 2 already and they cover parameterized ops and many other common needs without dropping to that level (see https://github.com/clojure/spec-alpha2/wiki/Differences-from-spec.alpha#implementing-custom-specs).

3 Likes

I, for one, would not mind coercion added to specs.

Kind of like:

    (s/def ::id (s/coercible-or string? int?))

where coercible-or is like s/or but will cast the second (or third) spec to the first one. You supply a multimethod that gets the spec it has to convert to, the spec it comes from, and the value.

I think it would be very handy, because very often you have a value that can be this or that, but that is way easier to handle in code if I’m sure it has one and only one form.

What you describe at the top is afaict coercion, not conforming.

Yes, I just used conform since it can be abused for coercion via conformers.

the (undocumented) s/nonconforming can be a useful thing to wrap around an s/or if you do not want it’s value to be tagged with the or branch.

I’ve tried that, however using s/nonconforming around s/or does not just omit the tags (keys in s/or) but also does not conform the values:

(s/def ::child (s/nonconforming 
                (s/or
                 :ref (s/and vector? (s/conformer (fn [x] :CONFORMED))) ; <= conformer!
                 :ent ::entity)))

(s/conform ::entity {:id    2
                     :child {:id    3
                             :child [:ref 4]}})

; returns
=> {:id 2, :child {:id 3, :child [:ref 4]}}
; should be
=> {:id 2, :child {:id 3, :child :CONFORMED}}

Re extensibility, spec 1 intentionally does not try to commit to spec extensibility because we were not comfortable in locking down how specs are implemented

Wow, ok.

Spec 2 will open this up and essentially has standardized on the protocol version.

Is this available already in Spec 2? If so could, you point to or briefly explain how I could implement my own walk/coercion implementations on top of the default spec-impl, e.g. s/or, s/alt etc. ? I couldn’t reason about it myself from looking at the source.

Thanks Alex!

If so could, you point to or briefly explain how I could implement my own walk/coercion implementations on top of the default spec-impl, e.g. s/or , s/alt etc. ?

Currently, it’s not generic enough to do this, but the internal map form is a big step in that direction and having support for this kind of thing is still on our list of things to work on for spec 2.

You could do that with https://github.com/exoscale/coax.
It’s basically a fork of spec-coerce that makes it more open to custom transformations. That means you can teach it how to transform spec forms (like s/or & co) or any spec ident, on a per call basis.
But it’s a separate pass from coercion or validation, so you’d have to account for that, rely less on destructuring tags and more on spec identifiers/forms.

I supose malli & spec-tools also support the same kind of things.

there is also spec-tools having coercion, and a tools to transform specs themselves.

I believe all three libs use the spec1 s/form parsing, which is ok for simple things, but to walk over regex specs, would basically require to implement a full regex parser at these libs. Because of this, all libs are currently incomplete and best-effort. Having new protocol methods in spec core would be much better, but as long as the runtime transformations are out of scope of spec, not expecting things to happen.

new map-syntax in spec2 should make spec parsing simpler, removing a lot of existing code from the utility libs, but not sure how it solves the completeness.

Both Schema and Malli have baked-in coercion.

I guess I just don’t have the use cases that require coercion, I’ve never needed it.

I feel most people really just want to parse user input from text into some higher type. Do people ever attempt to coerce anything more? Seems other coercion would be prone to error.

And so, for just parsing user input, I prefer to do that as close to the user input itself, since I often feel the context of the UX matters to know how to properly interpret the user input into the correct type.

Curious to hear what are some of people’s use cases for coercion?

Well, YMMV, but we use Clojure extensively in a data pipeline where a number of external entities (that may be pieces of software, or external organizations) feed us a set of events. Those events evolve across time, and the format that we receive them in may change, so at a specific point in time there might be multiple valid representations of the very same thing.

Spec gave us a vocabulary and a format to express ideas like “this is a valid connection event”, and it is very useful. But a lot of times we might define an attribute that contains multiple strings as:

  • a list of strings
  • a list of strings and integers (that will be converted to strings)
  • one single string (that will be cast to a list of one element)
  • one single integer (that will be converted to a list of one string element)

In handling this, it would be way easier to only handle the first case in code. Yes, we could use a function that knows it, but if we normalize the format, it’s way easier not to make mistakes. ATM we write conversion functions, but:

  • rewriting parts of inner structures is not fun (unless you use specter or something similar)
  • I always felt the problem domain is so similar to validation - once I know something is a valid B and I know there is a way to go from B to A, I could just ask for an A.
  • Spec is already walking my data to validate it, so it would be nice to convert - “while you are at it…”

Just my 0.02 chf…

3 Likes

I believe the most common case for coercion is reading external input into Clojure/EDN and back. Input could be user input / forms, data from JSON (http apis, config files, databases), query & path-parameters in routing. Given enough context (input & output formats) coercion utilities can derive all the needed transformations for free.

Just validating EDN data:

(require '[clojure.spec.alpha :as s])

(s/def ::type keyword?)
(s/def ::name string?)
(s/def ::tags (s/coll-of keyword? :into #{}))
(s/def ::street string?)
(s/def ::city string?)
(s/def ::address (s/keys :req-un [::street ::city]))
(s/def ::entity (s/keys :req-un [::type ::name ::tags ::address]))

(s/valid?
  ::entity
  {:type :restaurant
   :name "Mustafas Gemuse Kebab"
   :tags #{:kebeb :youghurt}
   :address {:street "Mehringdam 32"
             :city "Berlin"}})
; => true

Validating external JSON data:

(def json {:type "restaurant"
           :DARK "ORKO"
           :name "Mustafas Gemuse Kebab"
           :tags ["kebab" "yoghurt"]
           :address {:street "Mehringdam 32"
                     :EXTRA "KEY"
                     :city "Berlin"}})

(s/valid? ::entity json)
; => false

Manually fixing the data (really?):

(-> json
    (update :type keyword)
    (update :tags (comp set (partial map keyword)))
    (select-keys [:type :name :tags :address])
    (update :address select-keys [:street :city])
    (->> (s/valid? ::entity)))
; => true

Fixing the data using spec-derived coercion:

(require '[spec-tools.core :as st])

(->> (st/coerce ::entity json (st/type-transformer
                                st/strip-extra-keys-transformer
                                st/json-transformer))
     (s/valid? ::entity))
; => true

More complex transformations are possible, but not sure how complex things should be pushed there: splitting & joining strings: sure, doing domainA<->B translation: depends.

1 Like

I keep encountering the scenario of recursively annotating data through conform in a data-aware manner and then recursively walking the data with a recursive function that dispatches on the annotations and transforms the data. That latter function can get quite hairy with many spec annotations on a nested data structure. It’s obvious at the time of conforms traversal which specs match the data and seems to me a logical place to provide transformation functions if all I want to do is transform the data in place.

Could spec-tools work for my normalization example above?

1 Like

Sorry, no elegant solution for your case with the current lib, walking is done currently only top-down and would require several walks over the data with a local atom collecting the refs. Also, you would need to add decoding meta-data to spec, but as there is no way to attach meta-data to spec values, you would need to use more spec-tools own utilities for that, making it kinda tied to spec-tools.

Might be interesting to match meander with spec/schema libs for these kind of things.