How to replace DI in Clojure?

ACiep · October 15, 2020, 8:40pm

Hello,
How do you deal with function which are dependent upon impure functions? Common example: functions which contain business logic but rely on data from database. Our business logic can be pure but database access can not.
In OOP we’d use Dependency Injection to inject services to our model and mock data while testing. In FP obvious solution to me seems higher order functions - we pass service as an argument to function so our business logic stays pure and we can mock data while testing. We can create namespaces: store, core and public. Store contains database access, core our business logic and public composes these 2 functions together.
But sometimes we have to depend on more than just 1 service. Then composing our functions becomes harder. We have to remember order, we have to make decision if we want to pass only services or our impure public interface and our code becomes harder to modify.
I know in functional programming there are no patterns just functions but do you have any tips, articles or open-source projects where I can peek how is it solved? Or maybe my approach is correct and I’m just nitpicking (I’m just looking for good technique, I didn’t tried it yet)? Or maybe even it’s something which I shouldn’t care about in functional programming and just use with-redefs while testing?

Pseudo code from social media pseudo app where we have 4 functionalities:

find user by id,
find friends of user,
find last post of user,
find user by id, them all friends and last post of every

Phill · October 15, 2020, 11:15pm

There is a pattern exemplified by Stuart Sierra’s “Component” - https://github.com/stuartsierra/component

seancorfield · October 16, 2020, 12:04am

Stuart talks about Component in https://vimeo.com/46163090 and https://www.infoq.com/presentations/Clojure-Large-scale-patterns-techniques/

We use Component at work and in some of our tests we swap in a mock version of a component to be able to stub out or simulate a subsystem’s APIs.

In general, though, I recommend trying to avoid functions depending on other side-effecting functions and instead separate out the pure and non-pure functions and have a layer of orchestration that calls the various side-effecting functions to get data, passes it through a pipeline of pure functions, and then uses the result of that pipeline to call other side-effecting functions to produce changes in the “world”. Where possible.

Pragmatically, you’ll find it hard to mock out an entire database, so handling that via regular test fixtures is another possibility: especially if you can use an in-memory database that you set up at the start of your run, let your system mess with as much as you want, and tear down at the end of your run. There’s an embedded version of PostgreSQL that you can do that with (if PG is your database-of-choice). There are other embedded DBs that you might be able to swap in, depending on how exotic your SQL is. Or just use a throwaway DB instance running in a local Docker container (which is what we do at work, since we’re on Percona).

didibus · October 16, 2020, 2:49am

How so?

Your business logic is pure you say, so “swapping” a service doesn’t require any change to your pure code. Neither does testing it.

If your pure code takes a User map which is supposed to have 5 keys for example. Then you can test it by just generating random valid User maps of those 5 keys and make sure it returns what you expect.

Now you have another namespace whose job is to get the data you need to create the User map of 5 keys that your pure code needs. Those functions will all be impure, since their whole job is get data from some other services and restructure it in the shape that your pure code needs it, so in the shape of a User map of 5 keys. Now sometimes I recommend breaking this out further, so the function that gets the data only gets data, and another pure function takes the data in the shape returned by the service and returns it restructured in the shape your pure code needs it. That way you can even test this function in a pure way.

At this point, the only functions that need a service client or database client are the functions that fetch data from those services, or the ones that sends data to them. Generally there’s so few of those, that you can just inject the clients directly as arguments to them.

Now I don’t know what you mean by sometimes you depend on more than one service? Can you provide a better example scenario?

LucyWang000 · October 16, 2020, 3:42am

+1. Sorta like the unix philosophy of decoupling different stages using pipelines. The first and last stage could be read/write to dbs. All the intermediate stages could be made pure and thus much easier to write and test.

But in some real world projects sometimes you really have to hook in a real database, then you could use component, or other similar libs like GitHub - weavejester/integrant: Micro-framework for data-driven architecture or GitHub - tolitius/mount: managing Clojure and ClojureScript app state since (reset)

seancorfield · October 16, 2020, 4:22am

I’m going to lobby strongly against mount. It easy (not “simple”) and it uses global state (in namespaces). I don’t much like Integrant either but that’s more about the multimethods than anything else: at least it doesn’t use global state (beyond multimethods).

ACiep · October 16, 2020, 7:26am

Ok, I think the problem was in my mindset. I tried to use dependency injection where it doesn’t fit well in functional programming but is standard in OOP. Instead of basing my business logic functions on data I wanted to base it on effects and it caused all my problems.
I noticed it’s common pitfall for people who comes to functional programming with OOP background and there’s long way to go for me to forget most of “good practices” I know .
Thank you

greinseth · October 16, 2020, 8:39am

The “Functional Core, Imperative Shell” presentation by Gary Bernhardt is very nice and right on the topic:
https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell

(related: https://www.infoq.com/news/2014/10/ddd-onion-architecture/)

mvarela · October 16, 2020, 10:41am

@seancorfield, could you elaborate on why you don’t like Integrant, or why do you feel Component is better? (I haven’t used Component myself, so my only familiarity with it comes from seeing Stuart’s presentation and reading a bit about it)

Kah0ona · October 16, 2020, 11:22am

So apart from the architectural story as mentioned above where often one would use DI in OOP, or use eg. Component, you can also apply this reasoning on a smaller scale, ie. on the function level.

For instance, if I have a HTTP endpoint handler that gets stuff from databases, and then writes a file to somewhere, and it depends on multiple things, this handler could be implemented by passing in the side-effecting functions (that get data from databases, or write to files) as an argument map.

Then you can still have a pure business logic base. You lose some easy traceability just as with normal DI (as in, following code from your IDE doesn’t point you to the side-effecting function, since it’s now a local passed in argument), but you gain something in pureness / pushing effects to the outside.

Also, recently I read this article, which is kinda interesting, about Railway Oriented Programming. This is a good way to avoid lots of large let-bindings that gather all the dependent data etc, and has a story for how to handle exceptions. (In a more elaborate version you could pass along some context-map in the chain, storing partial results etc.)

This might widen your perspective a little, I hope.

John_Shaffer · October 16, 2020, 1:59pm

Also, recently I read this article, which is kinda interesting, about Railway Oriented Programming.

This looks like what failjure implements. I’ve begun using failjure in everything, and for me it’s much cleaner and easier to reason about than having try-catch blocks everywhere.

didibus · October 16, 2020, 4:19pm

From the article

The problem is in using catch a lot: it is expensive, so if the code is throwing a lot of exceptions and using the try-catch for flow control, then performance is an issue.

I’ve never seen this argument against try/catch before. I don’t want to dismiss it all together, but I’m very skeptical of this claim. I’d be surprised if you could measure anything above 1ms additional slowdowns. I feel it’s mostly like nanoseconds even.

seancorfield · October 16, 2020, 5:31pm

Exception handling has a long history across multiple languages and the emphasis has generally been: you shouldn’t pay for what you do not use. As a result, try/catch generally has near-zero performance cost in the “happy case”, i.e., when no exception is thrown.

However, creating an exception is fairly expensive because the stacktrace has to be collected. If you run timings on just creating a simple object, such as an Integer or String, you can construct 10,000 of those in under a millisecond. If you run timings on creating a basic Exception with a small string (message) and no cause, it takes 30-60ms to create 10,000 of those.

If you wrap that object creation in try, you won’t see much difference. If you also add (catch Throwable _) to that try you still won’t see much difference. Even if you throw the newly-constructed exception, you won’t see much difference (compared to just constructing the exception, without throwing it). So a local catch won’t impact you much: still the most expensive thing is creating the exception itself.

If you add a call chain into the equation and throw from a nested call and catch somewhere up the chain, you’ll see another performance hit, as the code has to unwind the stack, checking for exception handlers that match – both the stack unwinding and the type checking on handlers add time to execution, which increases with “depth” (between the throw and an appropriate catch) as well as with the number of catch clauses to check (since they are checked in linear order to find one that satisfies instance? (in Clojure terms).

This is why there are so many recommendations – across multiple languages – to avoid using exceptions for (expected) control flow (and, instead, only use them to handle unexpected, i.e., “exceptional”, situations).

Now, all that said, there are definitely places where Java library functions throw exceptions for conditions that we absolutely expect to happen (just look at all the known direct subclasses of https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/IOException.html for examples of IO conditions we expect to occur in normal operation).

seancorfield · October 16, 2020, 6:06pm

“better” is fairly subjective. Component is simpler. It has a simple lifecycle – just start and stop – and some dependency graph logic to figure out the order to start/stop the components and what needs to be added in to each component along the way. Since the 0.4.0 release, you can extend that lifecycle protocol via metadata, which means you don’t even need to create records – see how next.jdbc implements the Component lifecycle on a plain hash map and a function to provide a simple connection pool for use with Component (next.jdbc doesn’t even need to depend on Component to implement this!).

Integrant overlays a data DSL and extra lifecycle hooks on those concepts. It has seven multimethods, representing the various lifecycle points (compared to just two in Component) and it’s about twice as much code as Component.

Where Component almost necessitates keeping the start and stop implementations close together, Integrant lets you spread the lifecycle across up to seven defmethods that can be “anywhere”. This can make it much harder to piece together what the complete lifecycle is for any given “component”.

I can appreciate Integrant’s flexibility, but I just don’t find I need that level of complexity.

Finally, most of Integrant’s rationale points apply only to the original version of Component: now that the lifecycle can be extended via metadata, you no longer need records – the lifecycle can be attached via metadata to any object that supports metadata. That was always true for Component’s dependency annotations (and you were always able to use plain hash maps, instead of records, for components that had no lifecycle functions):

Component supports dependencies for anything that is associative (this is still more restrictive than Integrant – but it is more than “just records”),
Component supports the start/stop lifecycle on anything that can carry metadata (which includes functions, although those can’t have dependencies).

mvarela · October 16, 2020, 6:23pm

Thanks for the thorough reply, Sean!
I’m currently prototyping some stuff at work and using Integrant for it, I will have a go at Component, and see how it compares for my needs.

Richard_Heller · October 16, 2020, 6:36pm

This doesn’t really have anything to do with OOP vs FP. Dependency injection works equally well with both.

Yep, that’s basically how you do it. With dynamic languages like Clojure, JS, Python, etc I use a pattern similar to what the Component library is doing. Create a map with an entry for each dependency and pass that around. The code higher up the call chain gets the dependencies out of the map. Code further down that only needs a single dependency could just take that dependency instead of messing with the map. The caller checks for the dependency and passes it in.

Testing is much easier to do than with strict languages like Java because things are duck typed. Your tests don’t need to pass in an entire database mock, they only need to provide the functions that the tested code uses. As long as your interfaces are well defined, you can test all your code in isolation. You only need to stand up a temporary database when testing the db specific code. You business logic tests don’t need it.

mjmeintjes · October 17, 2020, 2:33am

I recently did some research about the performance impact of exceptions in Java/Clojure.

It is worth nothing that the biggest performance impact of exceptions is when you create an exception, and that is because of the cost of creating the stack trace.

However, you can create an exception without adding the stack trace, by using a constructor overload for the Throwable class.

See here for a good explanation and benchmarks: https://www.baeldung.com/java-exceptions-performance

I’m not sure if this adds anything to the discussion, but it is something that I found interesting about the cost of exceptions.

seancorfield · October 17, 2020, 3:03am

Cool. That confirms my quick benchmarks too. I’m a bit puzzled by your comment about “using a constructor overload for the Throwable class” since the article/benchmarks seem to rely instead on a JVM option to disable adding stacktraces to all exceptions?

I see the overload of Throwable in the docs, but it seems like you need to construct a whole hierarchy of exceptions outside of the normal tree of them?

mjmeintjes · October 17, 2020, 3:35am

Yes, it is just for your own implementations of Throwable. It could be useful if you wanted to implement your own flow control system using try-catch.

I was mainly interested in the performance cost of stack traces because I went the other way, adding stack traces to all log statements.

mikeananev · October 17, 2020, 4:49am

Try https://github.com/redstarssystems/context

pure clojure atom, no framework style constraints
no global state connected with particular ns
dependency management
async components support
multi tenant suport
declarative system description
minimalistic

When I use the Context I write code like this:

(def get-from-db [ctx request])

ctx contains all necessary components (like db) I need.