Dealing with data-driven mistypes

Webdev_Tory · November 7, 2019, 4:42pm

Our teams have a setup including clj-Kondo for linting, Re-frame, and Reitit for data-driven development. There are many other places in our work that follow the same pattern, building maps that guide our program activity. Recently we again dealt with a common experience: sadly lengthy trouble-shooting that boiled down to a mistyped keyword in one of our data maps, resulting in silent and mysterious failures when the program ran. In this case, it was a keyword for re-frame consumption (resulting in the reframe things operating with a nil input because the key they were looking for was mistyped). Kondo couldn’t find it because it’s just a new cljs map we’re passing to the parser.

This seems like a price paid for the dynamic nature of Clojure, and I’m curious how other experiences have approached this. Do you:

Blame it on experience; the problems of mistypes are on the developer, and will be reduced with experience.
Spec or validate everything, constraining input. But this is harder when using 3rd-party libraries, and it’s cumbersome.

Or something else?

mvarela · November 7, 2019, 7:26pm

I’ve found that using Cider and company mode helps, since you get auto-completion… It doesn’t solve the issue, but it does reduce its incidence.

seancorfield · November 7, 2019, 7:40pm

If you have data structures that have required fields, Spec can be very useful, and then you can instrument certain functions for dev/test which would find errors like this – for required fields (which this sounds like, since the field was expected to be present).

For optional fields, you should already be writing code that relies on conditionals on the field being non-nil / present (and Spec will still help here in terms of catching badly formed values for optional fields).

Spec 2 brings optional closed spec checking so that could help highlight misspelled optional field names (since they’d be a disallowed key) – but Spec 2 isn’t ready for prime time yet and, of course, Clojure and Spec are designed to be open-by-default so making the choice to close a spec in certain situations should be deliberate and with awareness of the trade offs.

We’ve had this very same bug bite us (and fairly recently) where a misspelled keyword got past dev, automated tests, and PR review, and caused incorrect behavior in production in certain situations. It was for an optional field so Spec 1 would not necessarily have helped us much here.

ikitommi · November 7, 2019, 7:59pm

For spec1, there is also GitHub - bhauman/spell-spec: clojure.spec.alpha helpers that check for misspelled map keys, with expound integration, which does both closed specs and errors on misspelled keys. It’s a macro, but has a functional wrapper in spec-tools, used in libs like reitit to give creation time erros like this:

Problem is, like @Webdev_Tory described, that it’s just one library doing that. If Spec2 turns to be good, I hope we’ll see more functions and configurations being specced so the validation could span over library boundaries to yield good developer experience in general. Lot of things need to happen first, both in spec and in the library space. Looking forward on Ben’s work on Expound under Clojurists Together.

didibus · November 7, 2019, 10:55pm

I wrote my own closed spec, which we use in certain places. Most of these issues for mandatory keys should be caught early by REPL testing, integ tests, QA, etc. Since if you only tested the happy path you’d catch it. If you never tested the happy path, I’d say that’s an even bigger source of concern. Who knows what other bugs lurks if you didn’t even test that. So I’d look into having a process that covers at least one happy path for each of your required user behavior.

But, in the case of optional keys, or keys that are mandatory only in certain circumstances and contexts, they can easily slide through the cracks. Which is where I like to use closed specs as others mentioned to validate such maps in key places, so at least I can fail fast when it happens. Ideally, where the map is constructed, so that we fail prior to any side effects, such as persisting the data or running some transaction, which would result in corrupted state. Also, if you validate where the map is constructed, you do not suffer from the backwards incompatibility that closed validation can introduce. That is, the functions that use the map will happily work even if it has more keys than they need, so you don’t have to update them if you’re evolving your map over time and growing its number of keys. But the function which creates the map will validate it didn’t add any new key which it wasn’t supposed too, and fail otherwise.

lilactown · November 8, 2019, 12:35am

This is one of the reasons why I think that it’s better to use vars than raw data most of the time. The Clojure(Script) compiler will warn you if mis-type a var; it won’t if you mis-type a keyword.

It could be as simple as:

(ns my-app.messages
  (:require [re-frame.core :as rf]))

;; ,,,

(defn new-message-handler
  [db _]
  (::new-message db))

(def new-message
  (do (rf/reg-sub ::new-message-sub new-message-handler)
      ::new-message-sub))

;; elsewhere

(let [new-message @(rf/subscribe messages/new-message)]
  ,,,)

I think it’s unfortunate that re-frame (and many other libs) force us to write code as data, rather than treating our code as data in the implementation (via macros).

didibus · November 8, 2019, 6:09pm

Ya, actually a strategy I thought of, but never really tried, is to put all your keywords behind symbols.

So instead of doing:

(defn perform [actions-map]
  (do-something (:user actions-map)))

(perform {:user (get-user)})

You could do:

(def user :user)

(defn perform [actions-map]
  (do-something (user actions-map)))

(perform {user (get-user)})

That way you can be sure all use of the user key will be in sync. And if you mistyped the symbol it’s a compile time error since it won’t resolve.

In other languages, if I see someone using a string key somewhere, I would call it out as a magic string and ask that they put it behind a constant. So it would kind of make sense to me as well in Clojure to do the same for keywords.

The only downside in Clojure is how prevalent the use of keywords is. Wherever you’d do x.a x.b x.c in another language, in Clojure you do (:a x) (:b x) and (:c x). So putting all these behind a var seems like a lot of extra work.

There may be ways to automate this though. I could imagine leveraging spec so that keys spec auto define symbols of the same name and namespace for their keys. That would be pretty neat actually.

Another possibly cool avenue is if a linter could grab all keywords from all loaded libs, and do a fuzzy clustering on them, and then warn you if it finds any two keywords that are close match, in case they are typos.

Or, the linter could grab all keys from all specs, and similarly check if any keywords is a close match to those and warn, in case it’s a misspelling. A little what spell-spec does but for the whole program and as a linter.

lilactown · November 8, 2019, 6:36pm

I think that anytime you are assigning semantics to a name, you should def it as a var. This allows you to take advantage of all of the power of the analyzer. You can assign metadata like docstrings to it, too. Most libs end up basically re-implementing vars themselves in some way. Sometimes this is worthwhile (see: CLJS libs, since CLJS doesn’t use vars) when you want richer reification of the semantics and context they were derived in. But IMO it should also interop with the var/def system for ease of tooling for exactly the reasons we’re talking about.

For instance, your example could be refactored with a small macro to:

(defaction user)

(def my-context {user (get-user)})

(defn perform [context]
  (do-something-with-user (get context user)))

(perform my-context)

The defaction macro can end up returning any kind of data: a symbol, keyword, etc. It can also do side-effects, like interoping with some other reified runtime like registering a re-frame handler or whatever.

Another refactoring, of my re-frame example above, where we imagine a defsub macro in re-frame:

(ns my-app.messages
  (:require [re-frame.core :as rf]))

;; ,,,

(defn new-message-handler
  [db _]
  (::new-message db))

(rf/defsub new-message new-message-handler)
;; => (def new-message
;;      (do (rf/reg-sub ::subs.new-message new-message-handler)
;;          ::subs.new-message)

;; elsewhere

(let [new-message @(rf/subscribe messages/new-message)]
  ,,,)

We can still use keywords under-the-hood, which gives us the flexibility and dynamic behavior we often need from our framework, without forcing users to code with data and have to deal without the benefits of the analyzer, or have to build it ourselves.

didibus · November 8, 2019, 8:15pm

Hum… I’m not sure I’m fully onboard with that. I’m worried it would end up creating too many unnecessary abstractions. Maybe in certain situations, like if you did need to perform a lot of scaffolding, registration, side-effects, etc. (which might be true in the context of react). But otherwise, it seems pretty heavy handed. The only issue related to keywords keys that I’ve had is just mistyping them, or forgetting their possible set of options. I only need something that addresses that.

The thing is, by going with a macro, you have created a closed for extension system. Say I needed to define an action, but just this one time I don’t need it to register with something, or I’d also need it to do something else? Like, isn’t it because re-frame used data that you can also go and build that macro yourself if you wanted?

But, to be honest, I don’t know reframe, so I’m not super following your code example. Specifically, I don’t understand why you can’t just make that a function?

lilactown · November 8, 2019, 8:22pm

You still use data under the hood, and can provide a public data API for when people want to hack it. But also providing a code-centric, easy-to-use API for “every day things” is IMO the best of both worlds.

didibus · November 8, 2019, 8:27pm

Ah I see. I wouldn’t disagree with that. At least for user friendliness (something Clojure is often blasted for lacking), it does help to do so.

Macro or functions building data can be quite convenient. That’s the direction Spec2 is taking.

Alan_Thompson · November 15, 2019, 2:16pm

In order to avoid this problem, I always use the grab function from the Tupelo lib

Using it is easy-peasy. Instead of typing:

(:name user)

just grab it instead:

(grab :name user)

If you have a typo:

(grab :nome user)  ;=> throws an exception

so you detect the misspelling right away in the first unit test.

Note that this also detects keys that are missing from a map, even though they are spelled correctly.

Webdev_Tory · November 15, 2019, 5:35pm

This is a good idea. I’ve had Tupelo in my dependencies before, but haven’t used it for a while. grab sounds good

Anthony_Leonard · November 18, 2019, 8:23pm

But I guess this is not always what you want right? Some - most? - times you want to try and get the information from a map, and then keep going if there’s no information there to be had.

I am really hoping spec2 will solidify soon, partly as then the tool makers can finally embrace its power. IIUC spec2 will allow consumers of information to specify what information is required from a map in a given context via “select” - which will then fail fast as grab is doing above I think?

DaveWM · November 20, 2019, 4:03pm

All the suggestions so far seem to concentrate on how to prevent typos, but you also need to make sure you can easily debug them. No matter how strict you make your specs (or other forms of validation), some typos will always “slip through the net”, so I think it’s very important to have good debugging tools. I’ve found spyscope and scope-capture invaluable, and I’ve also dabbled with clojure.tools.trace. A good repl-driven workflow helps massively.

system · May 21, 2020, 4:03am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.