What is 2021 recommendation for Specs?

MrMM · April 14, 2021, 12:48pm

Hello,

I’m reading about Specs in a book Getting Clojure published in 2018 and Mr Olsen writes:

Since spec-based argument checking can slow things down, it’s most useful during development and testing.

I’m thinking that Specs probably evolved since 2018 since in 2018 their namespace was clojure.spec.alpha. I googled some more current guides but they all use clojure.spec.alpha.

Could you please tell me has it evolved from first alpha?
What is the current recommendation for using Specs? Use it in production?
Or could you please point me to an article that is up-to-date?

Thank you.

Matys

andy.fingerhut · April 14, 2021, 2:41pm

There is a Spec version 2 in development, but it is not ready for wide use as there are parts of it still under design as of 2021. There is code for it available, but it is anti-recommended for anyone that doesn’t want to experience bleeding-edge code issues, e.g. known bugs.

There are people who do use clojure.spec.alpha in production code bases, both at development / test time, and also for checking data received between large subsystems of code. Any uses of it in production are typically with full knowledge of the run-time cost, and so it is used sparingly and selectively in the places where it is judged to give the most benefit in catching miscommunications between different subsystems.

andy.fingerhut · April 14, 2021, 3:02pm

Sean Corfield actively uses Clojure and spec for code for his company, and writes frequently on various Clojure-related topics, including this one on how his team uses clojure.spec in their code base: An Architect's View: How do you use clojure.spec

Sean might very well reply to this article with an updated article, if he has written one, but I know that he frequently answers questions on Clojurians Slack with pointers to articles he has written on the topic, as well as answering questions about it.

seancorfield · April 14, 2021, 11:23pm

As Andy notes, I wrote an article about our various uses of Clojure Spec – and that’s all still true today: we are still heavy users of Spec 1 in production, testing, and development.

Instrumentation – automatic checking of function arguments – is intended for development/testing, as is (generative) function behavior checking. Neither are recommended for production.

Validation – explicit checking of data against specs – is good for production code (albeit with some caveats around the complexity of your specs: Spec is not a type system so try to avoid over-specifying things).

I believe clojure.spec.alpha is likely to stay at Alpha indefinitely. Spec 2 will eventually become the non-alpha approach for Specs and it should be possible to adopt it piecemeal and migrate away from Spec 1 but, as Andy also noted, Spec 2 is very much pre-alpha state right now and likely to change substantially in at least one area as Rich continues design work on it.

mvarela · April 15, 2021, 5:52am

Depending on your needs, you may want to look at alternative libraries, such as Malli, which provides a significant overlap in functionality, with a data-driven approach.

MrMM · April 15, 2021, 11:12am

Very helpful thank you.

I was unsure about Specs since I saw in a different articles that you should use “higher” language features like Records, Protocols and Specs only when you really need to.

I checked these “higher” features and I quickly decided that I’m going to use Records and Protocols a lot. I see no reason not to – Records can make it easier to create new structure (thx to autogenerated ->RecordName) and Protocols have shorter syntax than multi-methods so it doesn’t make sense to use multi-methods if all you need are different functions based on type.

But Specs are different. I’m not going to use them for production code unless I really need them.

Yeah, I think I would ended up creating static typed Clojure ;-).

Cool, might be useful when getting JSONs from remote systems. Thx.

seancorfield · April 15, 2021, 2:57pm

Even the author of Clojure Applied – a book that leans heavily on records – has said that he would downplay records in favor of plain hash maps if he writes another edition of it.

If you’re coming from an OOP background, records and protocols look attractive and familiar but hash maps are much more idiomatic and much more widespread in use.

Protocols make sense in some situations but, again, their use in Clojure is specific and narrow – and if you’re using hash maps instead of records, you’ll be less tempted to sprinkle protocols all over your code.

MrMM · April 16, 2021, 12:04pm

Thank you.

I get the point with Records, they’re class-like.

I haven’t written enough code in Clojure to say for sure but creating a dispatch function every time I need a polymorphic function doesn’t seem right. But maybe polymorphic functions are not so common in real code and people rather use different functions (different names) or different arity.

seancorfield · April 16, 2021, 4:05pm

You get a lot of generic function behavior by using abstractions (like “sequence”) and “just hash maps”.

Here are some statistics about our codebase at work:

Clojure build/config 20 files 233 total loc
Clojure source 359 files 89703 total loc,
    3597 fns, 904 of which are private,
    575 vars, 30 macros, 92 atoms,
    26 protocols, 67 records,
    858 specs, 33 function specs.
Clojure tests 383 files 23615 total loc,
    4 specs, 1 function specs.

Nearly all of the records are for (Stuart Sierra’s) Component library – but a lot of those could be done with hash maps and metadata now (Component has been updated so its Lifecycle protocol has :extend-via-metadata true but our codebase stretches back over a decade).

Most of our protocols exist to adapt Java types to new behavior – similar to what I do here in next.jdbc to make a bunch of Java JDBC types “datafiable” by Clojure tools: next-jdbc/datafy.clj at develop · seancorfield/next-jdbc (github.com).

Nearly all of the Vars are constants or “lookups” I think and nearly all of the atoms are caches of some sort. We have ~130 agents as well, which are nearly all associated with metrics we report to New Relic.

mars0i · April 16, 2021, 4:31pm

This is puzzling to me. Not from an OOP background, but from a Clojure background. I don’t write as much Clojure code as a lot of people–certainly not as much as you do. But what’s wrong with records? They specify the normal fields for a data structure, you can work with them as if they were maps, and they naturally convert to maps when that 's useful. Best of all worlds. (As an added cool benefit, records can function as Java classes for interop, yet retain all of their nifty Clojure conveniences–but that’s not the use case here.)

I use records whenever I know in advance what fields I want to see in a data structure. To me, that makes the code easier to understand, since defrecord partially documents the fields. I often use maps, too, of course, for more ad hoc and changing associations, or if I have many data items (maybe records) that I want to look up quickly–but then all of the documentation must be independent of the data.

seancorfield · April 16, 2021, 5:31pm

There’s nothing “wrong” with records but they’re just not needed most of the time. The recommendation from the core Clojure folks always seems to be “use maps” first and foremost and only switch to records if you need to later.

This is still good advice: Flowchart for choosing the right Clojure type definition form - Chas Emerick (cemerick.com)

rudolfvesely · April 16, 2021, 5:40pm

Did you mean you use records when you know the fields in advance?

mars0i · April 16, 2021, 7:05pm

Yes! Sorry about that. I’m going to edit the original post to fix that. Thanks.

mars0i · April 16, 2021, 7:12pm

This makes sense to me if one is exploring the ideas/data/algorithms, etc. My reasoning still makes sense to me otherwise, and at early stages of a project, it might not matter, since it’s easy to change definitions. It’s so easy to move back and forth between maps and records, that maybe it should be considered be a matter of personal preference.

I’m thinking that maybe the reason for the advice to use maps is because people coming from Java will overuse fixed types such as records, and think that everything has to be done that way. So it’s good advice for them to start with maps and then use records as needed. I can see that. That was never my orientation with Clojure, though. (I was a Java programmer a time long ago, but came to Clojure by way of Common Lisp.)

Irrelevant to this discussion, but fwiw I spent a lot of time studying that flowchart at one time, and in it is not always right, in my experience for decisions about interop structures. It presents good rules of thumb for many cases. I doubt any flowchart could capture all of the factors that could matter for Clojure interop data structure decisions. (I definitely have less overall experience with Clojure than many people, but I think I may have gotten deeper into interop at one time than most Clojure programmers. It wasn’t fun. Well, OK, some of it was fun. And now I have the problems worked out to my satisfaction.)

rudolfvesely · April 16, 2021, 10:40pm

No worries, I just wanted to ask about that. Thank you for the correction ;-).

Could you please tell me what do you do if you know the keys but do not have values for some of them? Do you later change them or add them?

Would you for example use nil?

(defrecord Flight [flight aircraft departed arrived])

(map->Flight {:flight "BA5", :aircraft "Boeing 747", :departed "2021-05-01 15:30:00")

#user.Flight{:flight "BA5", :aircraft "Boeing 747", :departed "2021-05-01 15:30:00", :arrived nil}

or something else

(map->Flight {:flight "BA5", :aircraft "Boeing 747", :departed "2021-05-01 15:30:00", :arrived :has-not-landed})

or rather not to include it at all

(defrecord Flight [flight aircraft departed])

and later add?

(assoc myflight :arrived "2021-05-01 19:30:00")

mars0i · April 17, 2021, 3:34am

I would probably use nil in most cases. But more deeply experienced people may have a better idea. You do have to be careful in that case to make sure that getting an unexpected nil doesn’t cause a bug, but that’s a normal thing to have to watch out for. For your example, :has-not-landed seems like a good option, though, and avoids an accidental nil-pun, for example. I wouldn’t leave the field out of the definition, though.

If there are often many unfilled fields, i.e. keys without values, maybe that would be a case where maps are better–I don’t know.

seancorfield · April 17, 2021, 4:20am

“Optional” fields can be tricky to handle with records because if you accidentally dissoc a declared field out of a record, it quietly becomes a hash map and it won’t become a record again. And then there’s Rich’s whole thing about nil being a bad thing in a hash – see the Maybe Not talk – because so much code assumes nil == “not there” / “no value”, so having nil being a deliberate value can easily trip you up.

You get this problem when dealing with SQL/JDBC because NULL is a perfectly reasonable value in a database (although it doesn’t just have “regular value” semantics). You need nil in your hash map for INSERT / UPDATE operations and the main Clojure JDBC libraries will give you hash maps back with nil for NULL. next.jdbc.optional provides alternative builders that omit nil values that align with NULL values in the database. I don’t know how widely used it is. All I can say is that nearly all of the JDBC-related code I’ve ever written assumes nil-punning and therefore treats nil and “not there” as identical rather than trying to treat nil as an actual value.

rudolfvesely · April 17, 2021, 6:40am

That’s my thinking since :has-not-landed is an information. In Elixir there is an unwritten rule to use :unfetched so you don’t have to think about naming.

didibus · April 17, 2021, 9:51am

The downside of records is that they are no longer pure data, so serialization is a problem, which makes information modeled with them harder to move to other processes, or store/retrieve them.

Most format that have a schema suffer from this, you need to have the schema definition of the correct version of the serialized data and know implicitly which one it maps too, where as schemaless formats evolve better over time as they are more flexible.

That’s why I say start with maps, use records if you need the performance boost and/or want to actually create a type to use with protocols for type polymorphism, though now you can do so with maps as well.

That’s also where I’d recommend the use of Specs over records. Specs are much better at describing data then records, and much more flexible in how they can evolve along the data.

Just to give an example, if you have a map, you would model type as data (if you cared about type):

{:type :dog
 :name "Bib"}

{:type :cat
 :name "Kitty"}

But when using records, the type is implicit and it isn’t part of the data it models, instead it’s tracked by the runtime alongside the language instances of your data.

By having the type as data, your type info will serialize itself automatically. It is also more flexible and can evolve to be more refined or less as need be. The downside is polymorphic dispatch won’t be as performant.

And now if you want a schema to help you know what the data invariants for a certain entity are you can use spec instead of a record, which is even more precise.

So I feel maps + spec are just superior to records, unless like I said, you have some very special performance consideration.

You can absolutely use it in production, we have at my work since it launched with great success. The code works, and does what it does well. The reason it is alpha is because it isn’t sure if that’s what the final ergonomics and feature set for it will be for the language forever. They wanted to see how people would use it, if it would deliver on all they wanted, get feedback, etc, before commiting to spec fully for the language. And that’s where Spec 2 comes in, they’re reworking some aspects from what they learned from the alpha.

It isn’t alpha because it is buggy or anything like that, so it is safe to use in production.

As for best practices, I’d say you can spec your domain model and then validate explicitly using s/valid or conform (not instrumentation) at specific places in your app, my recommendation is to have the producer of the data validate, and the reader conform, and to do so at the boundary. Like before sending a payload, validate it meets the spec, and as soon as you receive a payload, conform it. Or before writing data to the DB, validate it, and after reading data from the DB, conform it.

On top of that, it’s good to spec pure functions you want to thoroughly test, and then setup a generative test for them.

Finally you can spec a few other functions as documentation for what entity they take as input/output, when it helps readability, and setup instrumentation at the REPL and when your tests run for it. But don’t use instrument in prod.

mars0i · April 17, 2021, 8:12pm

Wow. Incredibly helpful, @didibus.