'Clojure devil in the details' article


#1

I’ve recently read over this article linked in a recent newsletter and spent some time pondering:

Most of the comments there are over a week old, so I thought it might be worth continuing the discussion here where it might get more exposure.

I have to say, this article has distilled some of the same issues I’ve run into working with Clojure. One of those issues which specifically I have not yet found a solution for. When looking at a function buried deep in a project, there’s no easy way to know the shape of the data coming in or going out. And there’s no agreed-upon way of conveying that information. Tests? Doc strings? Commented code examples? Commented snippets of example data?

For example, with a statically typed system, I can see the structure of a type with a click or a keystroke, and that immediately sets the context for the function in question. Good names help with this, but there’s no substitute to seeing the parts of a data structure defined next to where it is used, such as a pop-up window in an IDE.

There are 2 suggestions I have on how to deal with this issue, one old, one new. Spec is the new player on the team, and we will have to see how tools evolve that will use spec to possible show type-like information where relevant. But I also think more use of Clojure records would be helpful. If a record is defined and used rather than simply a map, and then functions are well named (another good suggestion from the article), this helps the reader develop the mental context needed to understand a given function.

What other suggestions might be useful for this issue? And what other comments do you have on the article overall?


#2

For this issue in particular I’ve become a huge fan of scope-capture. Wrap the inside of the function in sc.api/spy, run some tests that are likely to hit that code path, then look at the values that went in and out. (for that last bit cider’s C-c C-p eval+pretty-print is great)

I recently contributed a patch to clojure.tools.cli. Some of the internal functions are far from obvious, but with this technique I quickly had a sense of the kind of values going around, and managed to add the feature with just a few lines of well placed code. I’m convinced that without scope-capture it would have taken me a lot longer, and I would have come up with a clunky, more verbose solution.


#3

The official answer is: use spec to describe the shape of that data.


#4

I’ve always been surprised that records didn’t catch on for this specific use. And I think we have to ask why?

My impression is, as much as this is definitly a problem, it seems in practice, people don’t find it to be a big enough problem that they’re willing to spend extra time defining the shapes of function’s inputs and outputs.

Core.typed was seen as too much overhead for the gains. Records were barely used. No doc-string convention ever manifested itself.

Now we have spec. I think its still too early to tell if people will find its overhead worthwhile. Personally, I found I don’t use it that much, only at the boundaries, and for some functions.


#5

If you think PurefunctionPipeline&Dataflow first, everything is simple. Simply put, use “threading macro” more, unified data specification of a pipeline in a map.


If you know Chinese, you can refer to https://github.com/linpengcheng/PurefunctionPipelineDataflow


#6

@alexmiller I agree spec is an answer…I think I made it clear in the article that all of this depends on developer discipline (as does most things of course). The open question I have is how we create a ‘process’ that provides enough benefit to drive the use of spec (preferably in the form of some quick feedback/gratification a la REPL based dev).

I certainly don’t have the answers but the post was written from my personal frustrations to share them. I’ve not been put off using Clojure (although I’m struggling to find another contract using it) and still think it’s ‘opinions’ are generally those I agree with.

Chris


#7

@didibus I have a question for you, do you work on codebases that get touched by between greater than 7-8 developers…a lot of my pain has come at scale (in terms of dev’s not processing. None of the issues I highlighted are unique to Clojure of course.


#8

I’m lucky enough that our code base is handled by at most 11 developers. I can see how more then that would start to have problems. But I think its important to distinguish also say if its 20 devs that are at least intermediate and higher in their familiarity with Clojure, or they’ve just recently been exposed to it. In my experience that often compounds the challenge.

For me, the best tool has been code review and mentorship. We have a good balance of expertise in Clojure, dynamic typing and FP though, so that helps a lot with promoting good Clojure code practices.

Something else I think is often skipped is proper data modeling. Spec is actually great for that. Identifying the major entities, values and their relationship, and then defining structure for them in spec helps a lot. On our team we spec everything that gets serialized. So any data exchanged over IO, be it network between client and server, or to a database, file, etc., we make sure to have a shared spec for, so we can validate and assert contracts. That also allows the team to work in parallel, you can work on the consuming code even before the producing code is actually producing any real data, as long as you adhere to the spec it will work. Generative samples that spec gives you also help with this, since you can get sample data to test very early on.

Internal data, we don’t spec. But we try to maintain the execution flat. Use good parameter names, and always destructure input collections. So all access to values inside a collection are explicitly declared in the function argument vector.

Most of the time, you’ll find your code actually mainly operates with the data you have specced, because the data you end up remotely passing around, or persisting is what matters to the business. So just having that spec for it, and then naming your variables using the unqualified spec name is often enough for people to know what data the function expects. Off course, you could add a spec to your function also, and turn on instrumentation to make it a guarantee, but we personally don’t bother.

And when nothing else work, once you develop goid familiarity with Clojure, you can often quickly reverse engineer the input parameters. Look for how they are accessed inside the function. If you see a (keyword param) or a get, assoc, update, conj, you can easily figure out the shape that matters for the function. Add a print and run it in repl also helps.

Finally, we do remind everyone to have good documentation on functions, especially utility ones, things that take or return higher order functions should clearly document it and document the contract of that higher order function.

So, I’m afraid there’s nothing systematic, but good culture, and best practices, and trying to share that with mentorship and enforce it through code review. That’s what has worked best for us.


#9

@didibus I agree wholeheartedly with everything you say and I think you’ve made my point that it’s down to developer discipline. I also use the techniques you suggest and I suspect I would find your codebase reasonably easy to parse. However, I have seen poor discipline even in a 4 dev team.

Thanks for sharing some good advice.


#10

Yes, but spec is very verbose and does not play well with (fn []) or #(...).
Now, think about Java: the way you define a function is actually more concise and clearer to reason about than the equivalent (defn x ....) / (s/fdef x ...) - it’s more compact, and everything is in one place.
Of course spec is turbocharged and rocks, but the Java ergonomics of the simple case “this function gets two strings in and returns an integer” are simpler, and come rain or shine, it will adhere to that contract.
So I love spec, but I also think that we could do better in terms of ergonomics.


#11

You can use s/fspec for (fn []) or #(...), so it plays fine.

Java is more colocated but being type-oriented it is also a) mandatory and b) less expressive. Separating the spec from the function allows you to decide to not load specs at all at production time, or to spec only some functions, or even to spec functions in a library you don’t control. Having being optional, partial, and separate is a huge advantage in my opinion.

Regarding expressivity, Clojure’s reliance on predicates instead of types is way more expressive (arbitrarily so). Java can just say int foo(String a, String b). Clojure can actually put arbitrarily precise domain predicates around each of those AND define a :fn spec that relates the output to the input. And then generate arg examples, automatically property test it, etc. Gimme that, please, over Java.


#12

Java might say

int daysBetween(String dayOfWeek1, String dayOfWeek2) { ... }

Where Clojure spec could do:

;; useful stuff already in your code
(def days #{"Mon" "Tue" "Wed" "Thu" "Fri" "Sat" "Sun"})
(defn week-day [day-of-week] ...)

;; you choose the right time and whether to add this.
;; note that the args and ret here are far more precise
;; and you can specify constraints BETWEEN the args
(s/fdef days-between
  :args (s/& (s/cat ::d1 days ::d2 days)
             #(<= (week-day (::d1 %)) (week-day (::d2 %))))
  :ret  (s/int-in-range 0 7))

(defn days-between [day-of-week-1 day-of-week-2] ...)

Then stest/instrument at dev time to ensure everyone is calling it right and stest/check to get free property tests. I didn’t add a :fn spec but that can be done to add constraints between args and ret too.


#13

Just to be clear: I am not saying that the Java approach is better than Clojure - spec is really useful, nice and powerful. Otherwise I’d be writing on a Java forum :slight_smile:
What I am saying is that I find the separate spec instrumenting and definition harder to use and reason about than the corresponding Java signature, as it is in a separate place and often quite verbose. I personally like the defn-spec approach that Orchestra uses https://github.com/jeaye/orchestra but unfortunately it does not seem to play nice with Cursive.


#14

@l3nz:

How is this different from Orchestra?
Orchestra provides a macro called defn-spec. That macro (and Orchestra itself) aims to extend Spec’s instrumentation functionality, not take the assertion approach as this library does. Further, Orchestra’s defn-spec macro specifies a new DSL for specifying a function’s :args and :ret specs, which does not follow Clojure’s defn format. IDEs (like Cursive) do not have support for this new DSL.


#15

Cool, I still personally prefer Orchestra’s DSL over this, but this might be a good compromise if it does play better with IDEs.


#16

I find the observation that Clojure has problems scaling to large teams interesting – I’ve only used Clojure with small teams (up to three developers). I would imagine that a diverse team, in terms of approach, could lead to a codebase that is inconsistent and mixes very different styles. Clojure isn’t really opinionated, to be honest, not compared to, say, Rails, and I’ve seen almost as many styles of Clojure as I’ve seen developers.


#17

We have been working with both small and large (10+) teams with Clojure. Mostly with people who have done projects before in Java/Scala/Javascript/Typescript. I agree with the spec ergonomics. Specs are easy to reuse in flat s/keys, but changing a detail in a nested map requires all the s/keys in the path to be redefined and -registered. Fdefs are also powerful, but quite noicy to define.

I also like the noice-free Schema-syntax for def, fn and defn. We (the community) could host a 1:1 version of those for spec, Cursive already does static analysis for it. Related discussion: https://github.com/plumatic/schema/issues/366