What is the idiomatic way to process data through a number of functions, checking for errors after each function?

I’m new to Clojure. My background is mostly Python. In Python, it’s common to write code like:

data = some_data_generator()

processed_data = process_data_1(data)
if processed_data['error']:
  return {'error': True, 'msg': 'Error msg'}

processed_data = process_data_2(processed_data['data'])
if processed_data['error']:
  return {'error': True, 'msg': 'Error msg'}

# And so on for however many processing functions there are.

As per my current understanding, I can do something similar with Clojure:

(-> data
     process_data_1
     process_data_2
     ; more process data functions)

However, I don’t know of a good way to check for error’s b/w each of these invocations. I can also think of using let bindings:

(let [data (process_data_1 data)]
    (if (:error data)
        {:error true :msg "Error msg"}
        (let [data (process_data_2 data)]
            (if (:error data)
                {:error true :msg "Error msg"}
                data))))

But this get’s unwieldy soon. Is there a better way of expressing operations like this? Essentially, checking for an error b/w each processing step and stoping executing if an error is encountered?

some-> or some->> might do the trick, if you can make your functions return nil to represent needing an early return.

https://clojure.org/guides/threading_macros
https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/some->

If one of the threading macros doesn’t do the trick, you might try an either monad. Dan McKinley wrote a bit about the same frustration here:

Search for “Nesting Sucks”.

1 Like

In instances where I want to mess with the control flow and define custom processing, I have used macros like the following (although this is a toy/thought piece without the customary testing). It will act like the -> threading macro, except we introduce two new functions invalid and the predicate invalid? to detect invalid results. It also takes an initial pair of args: a predicate to detect/mark invalid intermediate results, and a handler function to do something in the face of invalidity (either repair it or return something wrapped in invalid which yields an Invalid wrapper object that is dereffable like an atom).

So the general idea is to have these sequential intermediate computations, and if we detect invalidity (per the predicate arg), we invoke the handler. If any result (even after passing the handler) yields an Invalid instance (as constructed by the invalid function), we stop computing (akin to some->) and yield the invalid result. Otherwise, the pipeline behaves as -> would, threading the prior result as the first arg of the successive form evaluation.

(deftype Invalid [obj]
  clojure.lang.IDeref
  (deref [this] obj))

(defn invalid? [x] (instance? Invalid x))
(defn invalid [x] (Invalid. x))

(defmacro valid-> [[pred handler] & forms]
  (let [res   (gensym "res")
        p     (gensym)
        h     (gensym)
        steps (for [form (rest forms)]
                `(let [this# (if (invalid? ~res)
                               ~res
                               (-> ~res ~form))
                       res#  (if (~p this#)
                               (~h this#)
                               this#)]
                   res#))]
    `(let [~p  ~pred
           ~h  ~handler
           ~res ~(first forms)
           ~res (if (~p ~res)
                  (~h ~res)
                  ~res)
           ~@(interleave (repeat res) (butlast steps))]
       ~(if (empty? steps)
          res
          (last steps)))))

If we apply it to a similar example from the python code, we can transform a map and assoc values to it. We detect invalidity by the predicate form of :error, e.g. (:error some-map) which is equivalently (get some-map :error) since keywords have that functional semantic. The handler function will print that there is an error, and yield an invalid result with the current value:

user> (valid-> [:error
                (fn [d]
                  (println ["error!" d])
                  (invalid d))]
    {:a "hello"}
    (assoc :b "world")
    (assoc :error "oh no!")
    (assoc :d "shouldn't get here!"))
[error! {:a hello, :b world, :error oh no!}]
#<Invalid@4d9d86ab: {:a "hello", :b "world", :error "oh no!"}>

We can hook in anything for the pred/handler functions, and yield some exception info (or even throw if we want to):

user> (valid-> [:error
               (fn [d]
                   (invalid (ex-info "bad-input!" {:data d})))]
         {:a "hello"}
         (assoc :b "world")
         (assoc :error "oh no!")
         (assoc :d "shouldn't get here!"))
#<Invalid@2016241e: #error {
 :cause "bad-input!"
 :data {:data {:a "hello", :b "world", :error "oh no!"}}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "bad-input!"
   :data {:data {:a "hello", :b "world", :error "oh no!"}}
   :at [user$eval14906$G__14905__14907 invoke "form-init5309770136055223389.clj" 402]}]
 :trace
 [[user$eval14906$G__14905__14907 invoke "form-init5309770136055223389.clj" 402]
  [user$eval14906 invokeStatic "form-init5309770136055223389.clj" 400]
  [user$eval14906 invoke "form-init5309770136055223389.clj" 400]
  [clojure.lang.Compiler eval "Compiler.java" 7194]
  [clojure.lang.Compiler eval "Compiler.java" 7149]
...elided]}>

This mostly lifted from the implementation of clojure.core/some->:

user> (use 'clojure.repl)
nil
user> (source some->)
(defmacro some->
  "When expr is not nil, threads it into the first form (via ->),
  and when that result is not nil, through the next etc"
  {:added "1.5"}
  [expr & forms]
  (let [g (gensym)
        steps (map (fn [step] `(if (nil? ~g) nil (-> ~g ~step)))
                   forms)]
    `(let [~g ~expr
           ~@(interleave (repeat g) (butlast steps))]
       ~(if (empty? steps)
          g
          (last steps)))))
nil

You could accomplish something simpler where flow-control is determined by the presence of exceptions (or use the built-in clojure.core/reduced to indicate termination, or use metadata).
You could implement a similar processing pipeline with functions and custom reduce too (if there is a validation error, just indicate it and return a reduced result to terminate processing), although it’s a bit more paperwork to express the pipeline.

3 Likes

You don’t have to do anything, this is the default behavior.

This will simply stop processing on any error thrown.

2 Likes

A great discussion on the topic here:

We also had this problem, and in our case (and I believe also in general) the logic was not linear, so a pipeline doesn’t always fit, and using one would result in limited or awkward code reuse.

After several other attempts, we took the idea of cognitect.anomalies, i. e. error conditions that are just data, and came up with de.otto/nom (GitHub - otto-de/nom: Tools for ignoring anomalies), which we now use throughout our team’s code base. I also talked about it at London Clojurians (How I learned to stop worrying and ignore anomalies (by Svante von Erichsen) - YouTube).

Most use cases are handled with let-nom>, which is like a let, but just short-circuits as soon as a binding value is an anomaly. We also have things like nom (checking arguments before calling a function), nom-> and nom->> (threading with nom) and others. For actually handling the returned anomalies, I like to use clojure.core.match.

Our experience with this is positive so far. It seems that code using it gets better than without it, as you not only think about possible sources of anomalies but also have a toolbox to conveniently pass them through to the outside. At the same time, the code doesn’t look much different from what it would look like without it, so it feels like the cognitive overhead is quite low.

Thanks. I didn’t know about the some-> macros. They do seem very useful in contexts similar to mine.

However, the issue is that I’d like to return details of the error, instead of just nil. There’s a couple of other answers in this post that provide the exact solution I was looking for, especially the detailed implementation by @joinr.

Another library I stumbled upon after reading these answers is GitHub - adambard/failjure: Monadic error utilities for general use in Clojure(script) projects which provides a solution similar to @joinr.

I think the behavior you mentioned is default in 1 of 2 cases:

  • The process_data_N functions return nil if the input is nil
  • The threading macro is replaced by the some->/some->> variants

Thank you for this detailed walkthrough. It’s exactly what I was looking for and it’s very helpful to see how something like this can be implemented.

I had 1 question about your code which I’m hoping you can help with. Why did we use new gemsyms for the pred and handler? Could we not just use the passed values directly?

It allows you to supply vars or forms. If we have a form, like (fn [blah] true), we don’t want to eval it or effectively inline the form everywhere the function is meant to be invoked. So we have a lexically scoped gensym for pred (or handler) and bind that to the spliced pred/handler argument in the macro expansion. Then we just use the gensym throughout when we want to apply the function in subsequent expressions.

If you decided to not allow arbitrary forms (which should eval to functions), then yes, you could restrict the macro to using only symbols that must resolve to a function at evaluation time. Keywords could work as well. These kinds of choices come up in macro writing (e.g what to restrict the input to, e.g. defining the semantics of your little language extension). I opted to allow users to pass in arbitrary forms (as in my example).

No, it is the default for any thrown error:

(defn process_data_1
  [data]
  (throw (ex-info "msg" {})))

(-> data
     process_data_1
     process_data_2
     ; more process data functions)

This will short-circuit when the error is thrown. The short-circuiting will bubble up the entire call stack all the way to the top. This is the default error mechanism, any thrown error short-circuits the entire call stack chain.

You can then choose where in the stack is most appropriate for you to handle the error and use a try/catch to intercept the short-circuiting and stop it from bubbling further up.

For example:

(defn process_data_1
  [data]
  (throw (ex-info "msg" {})))

(defn process-data
  [data]
  (-> data
       process_data_1
       process_data_2
       ; more process data functions))

(try (process-data data)
  (catch Exception e
    (println "An occured when processing the data")))
1 Like

I think there is a disconnect or overloading of “error” to “exception”. OP’s python code just returns a map and doesn’t throw/raise. So there exists a universe of data processing functions where data can be considered invalid (or erroneous) by its form, and where no exception is thrown - but somehow the invalidity is captured in the intermediate results to preclude later processing. You could model this with try/catch and have all of your processing functions throw to signal invalidity and to stop processing (potentially stop everything). That is a bit different than the proffered example, although they aren’t far apart.

1 Like

Unless OP has no access to the code for the data processing functions, I would suggest that they simply refactor them to throw on error and rely on the default exception short-circuiting mechanisms of Clojure for the short-circuiting on error behavior they were asking for.

There are some circumstances where the exception mechanism of Clojure doesn’t allow for certain specific handling which warrants returning errors instead of throwing, but, as I understood, OP didn’t mention anything that require it.

In those cases I personally recommend sticking to the standard Clojure error handling mechanism for simplicity and compatibility.

4 Likes

Yeah, there’s lots of ways to skin this cat. Its common to make a simple utility fn for threading things through:

(defn check [v checker]
  (if (checker v)
    v
    (throw (ex-info "Error msg" {:error true}))))

(-> {:x 1 :y 2}
    (check #(keys %))
    (check #(contains? % :x))
    (check #(contains? % :y)))
;=> {:x 1, :y 2}

Edit: yeah, looks similar to didibus’s solution

3 Likes

Is this something that failjure could help you with ? Maybe the ok-> ?

Monadic error handling, what you want sort of sounds like what is called Railroad orientated programming in F#.

You pipe your data through a series of functions and fail out into an error handler at any step. I’m sure you can do this with failjure above.

You can use this kind of pipelining, but only if it’s linear. And if you find that you are creating some aggregate just to pipe it through, be very suspicious, and when it gets a life of its own, abort, abort.

Maybe take a look at: GitHub - kumarshantanu/promenade: Take program design oddities in stride with Clojure/ClojureScript