Adding steps to a pipeline with a macro

clojure

#1

Hi everyone!

I am introducing myself to macros and I have a few issues with a problem I’m trying to solve: I have a pipeline of computations, something like

(->> data
     (map #(parse-row % refs))
     (filter right-id?)
     ...
     (map #(select-keys % (:important-fields refs))))

Now, I would like to add an arbitrary number of filters and/or transformations where there are the ... by starting from a configuration data structure. What I have up until now is:

(defn parse-steps
  [steps]
  (let [s (keep-or-discard steps)] ; this parses the config 
    (doall
      (for [step s]
        (let [f (:function step)
              d (dissoc step :function)]
          (cond
            (= :keep f) `(filter #(= ([email protected](keys d) %) [email protected](vals d)))
            (= :discard f) `(filter #(not= ([email protected](keys d) %) [email protected](vals d)))))))))

(defmacro add-steps
  [refs steps]
  `(->> data
      (map #(parse-row % ~refs))
      (filter right-id?)
      [email protected](parse-steps steps)
      (map #(select-keys % (:important-fields ~refs)))))

But this isn’t working. I’m aware of the fact that my problem is that I’m trying to evalue arguments at compile time, but I can’t seem to find a working solution :frowning:


#2

In my opinion, the ->> macro is mainly a syntactic facility to reduce the syntactic clutter when you write code to pass some value through many functions. But for your use case where you have to build a pipeline based on some data specification, macros are not a very good choice. Clojure is a functional language, and functions can be used in many different configurations to achieve what you want. In your case, I believe transducers may be a good idea. Here is a draft to get you started:

(defn parse-steps
  [steps]
  (let [s (keep-or-discard steps)]      ; this parses the config 
     (for [step s]
       (let [f (:function step)
             d (dissoc step :function)]
         (cond
           (= :keep f) (filter #(= ([email protected](keys d) %) [email protected](vals d)))
           (= :discard f) (filter #(not= ([email protected](keys d) %) [email protected](vals d))))))))

(defn do-steps
  [data refs steps]
  (let [transducer (comp (map #(parse-row % refs))
                         (filter right-id?)
                         (apply comp (parse-steps steps))
                         (map #(select-keys % (:important-fields refs)))
                         )]
    (transduce
     transducer
     conj
     []
     data)))

I replaced your macro by add-steps by the function do-steps, removed the do-all in parse-steps as well as the backquotes on (filter, so that parse-steps will return a sequence of filtering transducers. But I haven’t tested it.

I suggest you read up on how transducers work (https://clojure.org/reference/transducers), then you try to get my code working.


#3

Hey thanks for the answer! Anyway I’m aware of transducers and I’m currently using them, that was my plan B :smile:

The point was to learn more about macros and since I’m working with largish data I’m currently using tesser to parallelize computations, so the code here is just an example what I’m really doing is (tesser/filter #(= ([email protected](keys d) %) [email protected](vals d))) etc.

But I can always try to parallelize the transducer with core.async, though I’m not an expert with it