Why aren't specs pure data?

Disclaimer: I don’t mean for this to seem controversial or offensive and am genuinely curious. I have no idea what I’m doing. I’m probably wrong, and will keep it short.

tldr: What if “Specs as data” was the primary design constraint?

The operations could be functions that take data and return data. The stuff in alpha2 like resolve-spec are appreciated but I’m still exposed to objects (what is this, java?) and macros which severely reduce expressivity and seems to increase complexity in spec itself (e.g. defop). Clojure taught me to think like this and I’m confused about something so core is taking such a drastic departure from the foundational “everything is data” tenant. Is it technically necessary?

If specs aren’t pure data, how do we send them across the wire and save them in our dbs (among all the other benefits that pure data give us)?

:v:

4 Likes

There is a lib: https://github.com/metosin/malli trying to build a spec like library with that exact design constraint.

As to why spec1 or spec2 won’t strictly adhere to that constraint I can’t say. I know one thing is that spec seems very concerned with the speccing of functions and macros, and their subsequent generative testing and instrumentation. Where as I feel people wanting specs as data are more concerned with speccing data to be exchanged or persisted, like speccing APIs, or you database’s data. I do not know if making specs fully data would be a big hurdle in terms of speccing functions and macros. I know one thing that malli seems to do (not quite sure), is you cannot use any arbitrary predicate (any piece of code that returns true/false) in your schemas. This is a bit required, if you make spec data, you have to assume the set of predicates is always going to be available, so no custom predicates. Maybe that’s a reason why spec1 and 2 don’t fully embrace specs as data?

P.S.: Might be a good question to ask on: https://ask.clojure.org/ as well.

3 Likes

I do have to say though, spec2 has made great stride in being more data-driven.

You have symbolic specs, which are pure data, and they can be interpreted into real spec objects with s/resolve-spec. Spec objects can also be serialized back into pure data symbolic specs using s/spec.

Now, these symbolic specs do assume an environment. And that will always be true, since specs represent code which you can’t bundle in. I think that’s why they called them symbolic spec. Because the spec can not run, it has references to predicate functions or other specs. So if you saved them to your database, and read them back later, you’d need to make sure you re-run them within the same environment. Where the symbols point to the same predicates, and the specs to the same specs. It does seem Malli does something cool here, leveraging Sci, somehow allowing functions to be serialized? I’m not too sure I understand it all though.

I don’t think having spec objects is that bad. It just means the spec probably provide their own functionality. Where-as the alternative would have been for each function. like valid, conform, etc. to re-interpret the spec and execute the appropriate functionality for it. I’m guessing by having spec objects, they can maybe have better dispatch performance. And by being protocol based, it probably allowed for it to be user extended? Speculating a bit here as well.

Further reading here: https://github.com/clojure/spec-alpha2/wiki/Differences-from-spec.alpha

2 Likes

About malli: the current code in master allows any custom function predicate to be used, but the feature might end up behind an option as those can’t be (de-)serialized properly as @didibus described. About sci: it allows one to use code-as-data instead and works with all of JVM, GraalVM & ClojureScript runtimes:

(require '[malli.core :as m])

(-> [:and {:name "A map with x & y"}  
     [:map
      [:x int?]
      [:y int?]]
     [:fn {:error/message "x should be greater than y"}
      '(fn [{:keys [x y]}] (> x y))]]
    (m/serialize)
    (doto prn) ; "[:and {:name "A map with x & y"} [:map [:x int?] [:y int?]] [:fn {:error/message "x should be greater than y"} (fn [{:keys [x y]}] (> x y))]]"
    (m/deserialize)
    (m/validate
      {:x 2
       :y 1}))
; => true

Also, looking forward seeing how spec2 evolves. The libraries have different goals, but who knows it they could converge in the future.

1 Like

Spec forms are data. They are lists of symbols, keywords, etc. Most of the api ops take spec names or spec forms and return data (conform returns data, explain-data returns data, etc).

Spec 2 is a bit more rigorous about enforcing the symbolic nature of predicate references so is even better in that regard. It also has a (still in work) map form of all specs and conversions between them.

The macros (in spec 2) are helpful expanders to more fully resolved symbolic spces, but you can use spec 2 without touching any macros at all if you desire.

So, in short I disagree with the premise of your question. Spec forms are data, and can be sent across the wire. Spec maps are a second format, more amenable to transformation.

To the question of why there are spec objects under the hood, the answer is performance. The spec objects compile in predicates and other things to make validation faster.

5 Likes

When sending Specs over the wire the following things need to be done:

Sender

  • write specs as data and collect all referenced specs as key->spec and send them too. This can be done even with Spec1 (write s/form, walk the specs with spec-tools to collect all registry references). Would be great if Spec2 had utilities for walking over specs, maybe even baked into the Spec protocol.

Receiver

  • have a utility that can take a spec form/map/data definition and construct Spec objects out of it. If target is ClojureScript & GraalVM, this needs to be without eval. Spec2 solves this with the upcoming map->spec converters?
  • (re-)register the referenced qualified specs into local (global) registry and ensure that the names don’t conflict with existing (different) specs.
  • figure out how to clean up the local registry when the specs are not needed (memory is cheap, don’t worry about it?)
2 Likes

Thank you for the thoughtful reply!

I wasn’t precise. Maybe I should have said EDN instead of “data” (but I’m not sure that precisely captures my intent either). Afterall, bytecode is data too, but I’m not a big fan of dynamically manipulating bytecode at runtime on a regular basis :sweat_smile:

What I’m after is the usual benefits of fluently reading/transforming/manipulating data (maps, lists, vectors) that we’ve all come to love and expect in Clojure. Random examples that come to mind:

  1. Use reduce to dynamically and iteratively construct a spec
  2. Build an aggregate spec dynamically from a data structure containing many specs
  3. API that takes user input and dynamically constructs a spec from those inputs
  4. Inline declare a spec for a deeply nested data structure in a single expression
  5. Painlessly parse and analyze a spec data structure