How to transform data by reference to a Mapping configuration? (ETL in Clojure)

Hello! New to Clojure and am hoping for some advice on how to solve a problem.

A desired feature of the application I am writing is to allow users to set up rules to transform data records from different sources (by this point turned into edn) into a normalized data structure.

So for example the user might source equity prices from Bloomberg and Reuters, which will obviously come in with slightly different record schemas, and I would want users to be able to set up two mappings which transform each of them to a pre-defined ‘price’ target object (with, say, a date, price, security name). There might be a few circumstances where the logic is slightly more complex than ‘the value of key s_price in the source maps to the value of price in the target’, like adding two source fields together to get to the target field - but I wouldn’t expect much more complexity than that.

Obviously it’s fairly trivial to write a function ‘transform-bloomberg-object’ which accepts the bloomberg data structure and converts it to the target object schema. However I don’t want the concrete mapping logic to be part of the source code, i.e. I don’t want to redeploy every time a new mapping rule is needed by the business. I would like mapping logic to be loaded at runtime from a database or file (and ultimately generated and saved by business users through some kind of UI).

As that would suggest I think I need a Mapping object in the form of some sort of data structure with a declarative description of how to map from a source field to a target field, something like

{:name "Bloomberg Price Transform Map"
 :target-field1 :source-fieldX
 :target-field2 (+ source-fieldY source-fieldZ)}

I would then want to have a (single) function which accepts both a source record and a Mapping object, applies the operations described in the Mapping object, and returns a target object.

This feels like something Clojure would be suited for (‘it’s just data’ yada yada), but I’ve not done anything like this before and am unsure if this is a sensible approach. It also seems like something that is a very common business problem, and I don’t want to reinvent the wheel. It’s just the T part of ETL after all, and I’ve seen it solved with graphical ETL tools.

So my questions are:

  • is what I’ve described a reasonable approach?
  • are there any difficulties you would see with this approach?
  • am I trying to reinvent the wheel, and are there libraries that solve or partially solve this problem?
  • is there something obvious I’m missing (in particular it feels like spec might have a part to play here, but I’m too new to Clojure to see how or where)?

I’ve googled around and found some answers which seem to touch on the subject, but not really get to the core of the issue.

Sounds like you might want to have a look at specter, meander, and/or odin, they might be a good basis for implementing what you’re looking at.

1 Like

Thank you! After playing with meander for a while it looks like it might be just what I need.

The only part I can’t find any examples and am struggling to solve is using patterns and expressions passed in as data structures. To take the simple example of

(m/search {:a 1 :b 2 :c 3}
          {:a ?a :b ?b :c ?c}
          {:result-a ?a :result-b ?b :result-c ?c})
;; => ({:result-a 1, :result-b 2, :result-c 3})

What I’d like to do is something like

(def pattern {:a '?a :b '?b :c '?c})
pattern
;; => {:a ?a, :b ?b, :c ?c}

(m/search {:a 1 :b 2 :c 3}
          pattern
          {:resulta ?a :resultb ?b :resultc ?c})

Since pattern when I send it to the REPL matches exactly the second argument in the initial block, I would expect the result to be the same, but this returns nil (for for m/match, a non-exhaustive pattern match error).

Is m/search a macro? If so it is possible that it doesn’t support resolving Vars passed to it, which might mean it is trying to use the pattern 'pattern instead of what the Var contains.

It is a macro, yes.

This is probably not nice, but it does the job:

(def pattern {:a '?a :b '?b :c '?c})
(defmacro search* [src pat target]
  `(eval (list 'm/search ~src ~pat ~target)))

(search* {:a 1 :b 2 :c 3} pattern {:resulta '?a :resultb '?b :resultc '?c})
;;=> ({:resulta 1, :resultb 2, :resultc 3})

That work great, thanks!

1 Like

Glad to hear!

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.