Externalising processing transformations


#1

Hi, I’m looking for guidance on better ways of accomplishing something. Im not exactly sure what “better” means in this instance; less brittle perhaps.

Background
I currently have a CLI tool that accepts a filename as input, and the path to a “transformation.edn”.

The file is “processed”, by applying some transformations which are common to all files (and therefore embedded in the main program), and I have externalised transformations which aren’t common, and put them in “transformation.edn”

transformation.edn contains a hash, which is keyed from the name of a field to transform, and the value to a key is the function to apply to each row of the input file.

This works, but for some reason it just doesn’t feel right having functions defined and used like this. I’m looking for ideas or libraries that would make transformation.edn less code-like, and more configuration like, if that makes sense.

One alternative I thought about was to put the functions in a separate namespace and just reference them, but that approach seems less flexible.

Any ideas would be gratefully received.

Many thanks

Peter


#2

I’ve used multimethods dispatching on keywords for this kind of thing before. This way, the “function namespace” is decoupled from the “transformation name” while still having namespacing available through keywords.

One example is a web app configuration which lets you define routes and middleware in data (from your description I don’t know if this is even applicable, but it reminded me of that technique):

[{:id :web/index 
  :methods #{:get}}
  :path "/"
 {:id :web/backoffice.index 
  :methods #{:get}
  :path "/admin"
  :middleware [[:defaults/site]
               [:auth.role/one-of #{:admin :manager}]}
 ; ...
 ]

Another upside is that multimethods are easily extensible “downstream”. I have a core web library which defines multimethods for route ids (handlers) and middlewares, and also brings a slew of default middlewares. In consumer code I can easily add new ones or even overwrite.


#3

Thanks, that’s a good tip.

The downside of referencing named functions, opposed to defining them, in config is that the approach becomes less flexible. The transformations could be anything, so I’d be in the position of defining a new function each time, either in the config, or in a separate namespace.If I need to do it, it’s easier to modify an external .edn file, rather than re-compile a .jar.

I suppose if the set of transformations was relatively static then the referencing approach might work, however.

Thanks for the ideas.


#4

Yes, that’s definitely a downside! Possible remedies:

  • you could take a hybrid approach, though, to make it more flexible, just allow “data” based transformations via multimethods, as well as “inline functions” defined in .edn
  • you can achieve more flexibility by making the multimethod-based transformations very generic (where possible) and parameterize them (like the example :auth.roles/one-of middleware above)