Architecture of big applications

pfernandez · December 12, 2022, 11:03pm

I’m in the process of refactoring a microservice that’s starting to become not-so-micro with 35 endpoints built by several developers at different times. At that size, it’s already starting to feel like a mess and it’s time to clean it up.

Below are the basic techniques I plan to use that I believe will allow our code to grow indefinitely. I realize this isn’t an example I can point you to, but in reality large codebases can’t really follow templates anyway. Templates are just a starting point. I developed a lot of these ideas while wrestling with 18-year legacy code at Tumblr, eventually realized that they’re just basic functional programming techniques, then quit to become a Clojure developer.

Lift side effects like API calls to the entrypoints of the app. Remove all possible logic from this section of code so that writing tests around it won’t be necessary.
Gather the data you need into a map (often called a “context” map) that can be passed down through pure functions.
Store example data in EDN files, and use write your tests around these files. After a while you’ll find that instead of adding tests, you’ll often simply be adding more test data.
Write tests to cover the behaviors of only the top-level primary (pure) functions.
Use pure functions for everything except required side effects.
Always be refactoring into a tree structure that mirrors the natural shape of nested functions.
Break up logic into services, directories, and functions that arise naturally from your flow of logic, data, and use cases.
Wait to destructure data until it’s really necessary, near the leaves of your code tree.
Move shared code into utility files/directories as needed. Bubble these up the tree as needed throughout more of the codebase.

Things to avoid

Shared state. Passing data through functions instead preserves purity, making reasoning and testing much easier. If your code is becoming too deep, think about how you can flatten it naturally with pure functions.
Abstractions meant to reduce the number of lines of code at the expense of increased cognitive overhead. Think about the poor soul who’ll come in two years later and just needs to fix a bug. They should be able to drop into any function at any point in the app and understand everything they need to based solely on the function’s inputs and output.
defprotocol. Programmers trained in object-oriented design tend to go for this, resulting in a lot of needless abstraction that breaks the natural flow of data through an app, making it hard to trace and test.
Fancy techniques like currying, chaining, partial application, and the like, which have their places, but tend to make code confusing for developers new to the application.

I hope the pattern here is clear. Pass data through a tree of pure functions, refactor as it grows, and push all side effects to the root and leaves. This is basically what I consider to be true functional programming. And even though it may not be possible to refactor a huge app into a functional structure all at once, having the overall vision in mind can give you something to work toward.

Harleqin · December 12, 2022, 11:42pm

I think it’s good to lay out such goals, but be aware that this can only be a guideline, not commandments.

One point in particular stands out to me: context maps. In my view/experience, these tend to become giant wool balls, coupling everything to everything through the used keys. They become especially cumbersome when you try to shoehorn the flow of the program through a linear threading macro. Program structure is not linear in general.

Instead, I believe it is more useful to remember the top-down-bottom-up dance: identify top-down what you need, then build the language you need bottom-up, then use it in the upper part. Make sure that one function only talks at one abstraction level. Most function composition should just be function application in function bodies.

pfernandez · December 13, 2022, 12:55am

it’s good to lay out such goals, but be aware that this can only be a guideline, not commandments.

Amen. Remember though that lambda calculus has shown that all logic can (in theory) be written purely in the form a(b(c(x))). Everything else is basically a shortcut. My argument is that leaving the world of pure function composition is what makes code hard to test and understand from a “local” perspective, i.e. for the person debugging the code months and years later.

context maps… tend to become giant wool balls… Program structure is not linear in general.

They do, you’re right. But I’ve never seen a way to pass shared data around that doesn’t involve either a context map passed as an argument, or else something more opaque and/or side-effecty like a global state object.

I did mention one way to mitigate the “wool ball” problem: Think about how you can flatten your code by refactoring using only pure functions. You can take advantage of that nonlinear nature of code, and use multiple context maps depending on the entrypoint. An API, for example, isn’t really a tree but multiple interwoven trees. So strive to make each tree as shallow as possible and pass each only the context it needs.

didibus · December 13, 2022, 7:57pm

I support @Harleqin statement, while the shared immutable context map pattern is better than the shared mutable one, I believe it still creates data coupling and also fails to make your smaller units modular by having them depend on the top level structure.

What I recommend doing instead is to have the modules design an input structure that they want, with only the data that they need, and have that be injected into them on every call.

That means it’s up to the top layer to fetch the data the module wants, and to transform the data in the representation the module wants, prior to calling the module.

I say “module” here, which is ambiguous, but this is because there is a continuum that exists here. At the smallest, you would do what I’m saying for every single function. But sometimes there are a set of functions that all work together tightly to deliver some more application relevant chunk of behavior. I call these a module. How coarse or granular your modules are is up for you to decide what’s best in your case.

What you can do, is inside those modules, you could use the pattern of the “context map”, but it isn’t a global shared context map anymore, shared throughout every function of your application. Instead it is shared only within one module, meaning only accross a limited set of functions that logically form a module.

I would still recommend to lean on keeping modules small, and when you start, I’d even suggest you apply this to every function, because it’s easier to refactor a set of functions into a module with shared immutable input, then the other way around.

The most important trick here is to keep the call stack shallow.

Most people will be tempted to do something like:

A -> B -> C -> D

Now if they need something in D, they pass a context map to A which is passed down all the way to D, and they just keep adding to that map as D needs more data.

Instead you can do:

A -> B
A -> C
A -> D

If D needs output from B, it is A which will take it and give it to B, therefore B, C and D can simply have their own input/output with only the data they need as input and only what they produce as output. They no longer worry of what comes before or after, or where to get the data from.

A will have to find the data that D needs, by calling B to get it for example, it will then need to transform the return of B into the input of D, D might need the data from C as well, and some other data from A’s own input, A can combine all that into the input structure of D.

Now if you realize that B, C and D are never used anywhere else but inside A, you’ve identified a logical “module” of your application. It means that your application benefits from the more coarse behavior of A, and doesn’t need to use the more granular behavior of B, C and D.

Once you’ve identified that, you can refactor A, B, C and D to all use the same input structure, and have them return their output as additional data on their input.

It still is going to be:

A -> B
A -> C
A -> D

But if you look at the input/output of B it would have changed from:

(defn B [arg1 arg2]
  b-output-value)

To:

(defn B [{:keys [arg1 arg2] :as a-map}]
  (assoc a-map :b-output-value b-output-value))

As you see, when we do this, we’ve now coupled A, B, C and D in favor of convenience, because A can now just thread through B, C and D and doesn’t need to transform the data in/out.

But if you look at D, it’s now coupled to B, because if B changes the name of the key, the shape of the value, or where on the map it puts it’s output, D is broken.

Whereas before, only A would be broken, the breakage wouldn’t cascade, any change to B, C or D only would require fixing the direct caller A.

This is the problem you have with this pattern, so imagine using it across your entire app with just one giant global shared map.

Here we’re limiting it to modules we have identified, smaller independent section of code where you are okay taking convenience over coupling because you find the functions are all inherently meant to work together very tightly anyways with knowledge of each other. In which case it’s okay to do this.

pfernandez · December 13, 2022, 10:45pm

@didibus Great reply, thanks! I was just sharing this thread in a meeting with the team when your post appeared.

If I understand you, I think the solution to the “same input structure” problem is simply to use the shallow call stack:

A -> B
A -> C
...

but have the second-tier functions require only a subset of the context. They would be coupled, but only in the sense that the same data must have the same shape. In our case A would be an endpoint handler whose job is to act as a kind of “data bus”:

(defn A [{:keys [param] :as request}]
  (let [context        {:request request}
        external-data  @(post "data.com" {:param param})
        b-input-value  (assoc context :data external-data)
        b-output-value (B b-input-value)
        c-input-value  (assoc context :b-data b-output-value)]
    (C c-input-value)))

B and C contain all the business logic and get unit tested, while A does not. You could pass B and C the full context if it’s convenient (it’s just a reference) and you can even spec the context with :opt-un to help ensure that a consistent pattern is followed throughout the app. Most of your unit tests can leverage a single context.edn file, which really just follows the shape of your request, responses from other services, config, and commonly used internal data.

There are a lot more app-specific details to unpack of course, like parallelization, deciding what common API call sequences can be moved into helpers, what should be moved into middleware, etc., but the general idea is still a shallow function tree with side effects at the root.

didibus · December 13, 2022, 11:20pm

Ya, exactly. Lots of benefits derive from the shallow stack.

And you can start to grow these reusable “workflows” as well where say one shallow orchestrating workflow method uses another when a big chunk of it can be used in multiple routes.

This grows the call stack a bit, but benefits reuse, and it still keeps it shallower:

A -> B
A -> C  -> D
        C  -> E
        C  -> F
A -> G

K -> L
K -> C ;; C is reused here, like a child workflow 
K -> M

But within each of these “workflow”, you design it as if you had no knowledge of anything outside of them. This includes even the input/output.

So “C” isn’t passed the same input A is using. If it’s all maps, you can merge them obviously and ignore things, but I actually prefer to not have more than necessary, because you can become easily lazy and just start using stuff that weren’t explicitly passed to C inside C just because they’re there. select-keys is pretty good for that.

Another befit is it’s trivial now to create X:

X -> B
X -> E
X -> M
X -> D
X -> L

You get so much more reuse here, then if you’d have had:

A -> B -> C -> D  -> E  -> F -> G

How do you create K or X out of this?

By extracting the control and data flow to a parent supervisor (or whatever you want to call it, parent workflow, parent orchestrator, wtv), you’ve gained a lot of reuse and limited the breakage at a distance.

That supervisor can also query/extract/transform and apply side effect, in-between each steps. It can act as an adapter between every step, so if one step changes what it returns, the supervisor can just adapt it back to what the next step expects. Or if a later step now needs one more piece of input, the supervisor can just go get it before calling the step, the other steps don’t care.

Edit:

And on the shape of data. What I do normally is I have a strong domain model, basically all entities that make sense in my app, that I operate on, things like User, Player, Car, Balance, Contact, Transaction, Damage, thing that tend to have meaning even to stakeholders and users of the app.

These shapes are well specced, you could even use Record if you wanted, of have a constructor function that returns a map and does spec validation.

I think more about those, and I commit to their shape, so if I break the shape I’m willing to refactor all functions that operated on them. Because of that, I also try to limit how often I’d make breaking changes to them, and I spend a bit more time upfront thinking about what shape they should have.

I also then consistently use those names everywhere to refer exclusively to these.

And it helps to club all functions operating on those shapes together in the same namespace, so if you change the shape everything you need to refactor is in the same namespace.

But for every other kind of input/output, I limit the shape to just the function and it’s direct callers.So I’d expect the caller to transform the shape it has in whatever the function input is, and the output back to whatever the next step wants.

didibus · December 14, 2022, 12:16am

So maybe with all that said I would apply the following refactor:

(defn A [{:keys [request], {:keys [param]} :param, :as input}]
  (log input) ;; this is the only reason I have `:as input`, because otherwise I explicitly state what keys from input A actually uses
  (let [external-data @(post "data.com" param)
        b-output-value (B {:data external-data
                           :some-value (:some-value request)})
        c-output-value (C {:data b-output-value,
                           :some-other-value (:some-other-value request)}))]
    {:a-output c-output-value})

The difference here is I’ve also decoupled the shape between all steps. The name of the keys for each step’s input is managed by A directly, and not implicitly sharing the name that A’s input was using. Also in theory the shape of the input is also managed by A, so maybe C doesn’t take a map, no problem:

(defn A [{:keys [request], {:keys [param]} :param, :as input}]
  (let [external-data @(post "data.com" param)
        b-output-value (B {:data external-data
                           :some-value (:some-value request)})
        c-output-value (C b-output-value (:some-other-value request)))]
    {:a-output c-output-value})

The benefit is that you can implement C independent of A or B, and than A can still use C and give C the data it wants in the shape it wants it.

If you reuse C in other places as well, they might not all have the same exact shape of map that C would have magically also supported, so this adapts C easily to all those places.

And the best benefit in my opinion is if B returns a map, C doesn’t need to care:

(defn B [{:keys [b]}]
  {:b (inc b)})

(defn A [{:keys [request], {:keys [param]} :param, :as input}]
  (let [external-data @(post "data.com" param)
        b-output-value (B {:data external-data
                           :some-value (:some-value request)})
        c-output-value (C {:data (:b b-output-value)
                           :some-other-value (:some-other-value request))})]
    {:a-output c-output-value})

In my opinion, it’s very little extra effort, but helps reduce breakage at a distance.

Now what I was saying was, if A is a very useful piece of functionality that is also all tightly coupled conceptually, you could instead choose to do:

(defn A [{:keys [request], {:keys [param]} :param, :as input}]
  ;; This is our context map with the initial data all steps will need from A's input
  (-> {:external-data @(post "data.com" param)
       :some-value (:some-value request)
       :some-other-value (:some-other-value request)}
      (B) ; Will grab what it needs from the context, and assoc its result to it
      (C) ; Will grab what it needs from the context, including what it needed from B and assoc its result to it
       ;; Finally we return C's result in this case, or whatever else we'd want
      (:c-result)))

This makes B and C harder to reuse, and C depends on B associng the right key in the right shape, but it’s easy on the eyes, and you can go faster implementing A this way, but you have to keep all the steps and what they do together in your head to be sure the previous steps put the right key/values for the following steps. But when you know you won’t need to reuse C or B, and that the semantic logic itself is pretty tight between all these, I think it’s ok, but doing the whole app this way is too much.

Harleqin · December 14, 2022, 10:35am

I don’t think that throwing things together just because they appear together is useful. I would say that the single responsibility principle should also be held up for data structures.

If you have to do multi-level destructuring, or if you can’t find a better name for your new thing than »data« or »context«, or if you can never use trace for debugging because its output is clogged by »context«, then this should maybe be a hint that you’re going astray somewhere.

The abstract call pattern that you write as:

A -> B
A -> C
A -> D

should in most cases translate to something like

(defn A [foo bar baz]
  (let [b (B foo bar)
        c (C b baz)]
    (D foo b c)))

This way, only actual dependencies create coupling.

I would even go so far to say that creating an aggregate data structure just for the purpose of being able to use a threading arrow (or other point-free-style composition) is an anti-pattern.

Anthony_Leonard · December 28, 2022, 1:37am

This very recent post - Structuring Clojure Applications - describes (I think) a great way of approaching complex apps:

it clearly separates side effects from pure functions using clean architecture ideas
it allows functionality to be clearly extended, new actions need only define new multimethods that do not affect existing code

The OP asked for rich codebase examples using “best practices” and I don’t know of any using the above - but perhaps the author @Yogthos could point to any available?

I have been pooking for mini-framework such as the above for some time (see below *). It is a framework (the extension points are multimethods and protocols) which are not fashionable I think in Clojure circles that prefer libraries for good reasons we all know. Web development in other languages has always been dominated by its prescriptive frameworks which eventually bloat and frustrate developers attempting even the simplest things, particularly those wizened, experienced devs in small code shops that get drawn to Clojure in the first place for the freedom it offers. Alternatively your world may be like mine in that those around you are in large, high-churn, enterprise-y teams of non Clojure enthusiasts or even particularly experienced devs, that just want to be able to maintain and extend an already huge app without understanding the rest - which is how our business thinks too. In that case I think a mini framework limiting options and guiding naming conventions and code structure really would make us more productive. I can’t be sure, it’s just I think I see the converse every day, where the lack of a consistent code structure or framework in our Clojure services makes devs have to know everything at once, which hurts productivity, and makes persuading others about the wonders of Clojure harder and not easier.

I also wonder, if Clojure is to spread to more “boring” large workplaces with less skilled teams like this, perhaps it needs more of these prescriptive mini frameworks to emerge, to give a helping hand to those curious about Clojure but ultimately give a better deal to their own paymasters. That folks are still posting novel new ways of approaching what are common problems for all of us surely shows that the “best practices” are far from being fully established

*FWIW the version in my head centred around a recursive central “runner” loop, where a pure function would take a command and any gathered information (including existing events/state) and return a new event or “missing information”. The loop would run multimethods to try and resolve the missing information if any, and if found rerun the whole (pure) business function now using the extra added information returning more events etc that the runner would know how to store and publish etc.

Yogthos · December 28, 2022, 3:04am

I don’t have any public projects to point to unfortunately. I completely agree that having some standard practices for doing things encapsulated using a micro framework is extremely valuable for beginners. Currently, the bar for using Clojure effectively is a lot higher than it needs to be. Unless you know somebody who’s already an experienced Clojure dev then it can be pretty tricky to figure out how to structure your app effectively.

geokon-gh · December 30, 2022, 7:21am

Thanks for writing the article @Yogthos ! It’s well written and deserves a few more read throughs from me

I had to work on a rather icky complicated GUI a couple years ago (with lots and lots of statefulness) and I found one thing that was a life saver was memoization (and avoiding “map managment”). It strangely seems to never come up in these conversations, I just wanted to know your thoughts on it

My own thoughts on this are a bit half baked, but just working off your example I’ll try to illustrate what I mean.

So in your example, instead of having, users, funds, emails… etc., your state would remain simply the list of transactions/events. These are much harder to mess up. You then created memoized functions/interfaces that would compute things like a list of user, the amount of funds a user has, the email of a user… etc.

All state-change then boils down to appending a transaction/event - and that’s it! As long as you don’t push a broken transaction then you should be fine. (and if you do, it’s easy to catch)

This has two primary benefits. First is your code becomes even more decoupled and stateless and it’s harder to end up with a broken state. Second, is that it becomes significantly easier to refactor as things get centralized in the memoized state-getter functions. So the system end up scaling much better.

So for instance, if say you have a new requirement where you want to track how many transaction each user has done. If you have a managed state, then you need to find each location you’ve done a transaction and add something to increment some counter (and introduce a new thing in your map that needs to be managed). While in a memoized derived state you’d just introduce a new memoized function that computed that for you directly from the transaction record.

You need to be a bit careful with the memoization cache - but I don’t think this is an intractable problem. @vlaaad has a nice state management system in CLJFX where these memoized functions can call other memoized functions and cleverly reuse results without recomputing things.

Anyways, I was wondering if you’ve looked into this side of things. I thing for large applications where there is tons of state to maintain, this approach reduces the surface area of potential issues. It’s not very black/white and you do need to settle on what is the exact “state” you’re tracking. Like in a GUI you might not want to go crazy and keep a list of all the UI interaction that are then virtually replayed to generate the current state… but you try to boil the state down to just what’s necessary, preferably in a way where you can’t make it invalid - and then have the state be interfaced through what is essentially a memoized interface. Decoupling the internal representation from the state interface.

Yogthos · December 30, 2022, 2:36pm

I find that for large GUI apps my strategy has been to try and minimize global context, and treat each page as an independent app. This might not always work depending on the nature of the app, but in general I’ve found that you just need a bit of context such as user details. I tend to use re-frame on the frontend, and then namespace the subscriptions and dispatches to the particular page they’re associated with.

didibus · December 30, 2022, 9:18pm

Personally I don’t know if all that genericity is necessary, the use of protocols, multimethods, a map of various states, all seems a bit overly complex.

If you look here: GitHub - didibus/clj-ddd-example: An example implementation of Domain Driven Design in Clojure. you’ve got the same example of an application transfering money between two users, but it only uses normal functions and maps. It is similarly following a clean architecture and in addition domain driven design, and has the same properties of being easy to test, and has good modularity.

I would have to try the alternative to know for sure, but my first impression is that it seems to bring a kind of more generic state machine framework that I’m not sure gives you much?

NoahTheDuke · January 4, 2023, 4:07pm

All of these replies are great. Thanks to everyone who’s participated so far.

One thing that’s been on my mind as I’ve read your thoughts is that different applications require fundamentally different approaches to this problem. If I’m writing a video game, I need a game loop of some kind and that informs/enforces certain decisions, whereas a web app with “normal” REST endpoints informs/enforces other decisions. I wasn’t explicit in my original post because I hadn’t thought it through enough, which I’m seeing now from the variety of approaches to this problem.

At my current workplace and codebase, it’s a traditional web app, so except for our long-running aggregation and processing tasks we run on set intervals, the majority of the code is geared towards “receive an input from a web request, process input data, convert input data into database queries, generate output data, and return some result to the requester.”

This means that a complex state machine (like the one described in @Yogthos’s blog) isn’t necessarily the best move for us. However, the hodgepodge we have now is also not working, as seen by my posts here.

Maybe a rough example would be helpful. In our app, we can generate reports. When a report is edited, we regenerate the data. This happens in the call stack starting in the (PATCH "/reports/:id" ...) compojure route. The call graph looks something like this:

Validation
Format input data
Upsert data into relevant database tables
Track actions in audit log
Generate output data
Render and return output data

On its face, this is a pretty simple workflow and each part is neatly wrapped in a top-level function. However, each one hides a lot of side-effects: there are 25 steps of validation, with ~5 of them peppered throughout making db queries; the patching/upserting itself makes queries; the audit log makes queries; and gathering the output makes queries.

Additionally, the total used code is split among 50+ different functions (in the full call graph) across 15+ namespaces. Some functions are reused, some are single-use. Tracing it all requires closely paying attention, knowing or intuiting the shape of the inputs and outputs of each function (as we don’t have any internal schema definitions).

I realize this is veering into “how to refactor 101” territory, so my apologies. But it’s a lot of code and most of it is “actions”, to use the Grokking Simplicity terminology. It feels like there should be cleaner ways of separating action from calculation structurally/architecturally than “put some functions in a namespace, require and use them”, which is where we’re at now.

How do y’all handle situations like this? Have you encountered it before?

Yogthos · January 4, 2023, 7:54pm

There are lots of ways to approach the problem. I personally like having the entire state in a single map because it makes it easy to observe and serialize the state. Meanwhile, multimethods make it easy to add new functionality without having to change existing code. In my experience, this facilitates writing code in a largely additive style.

Protocols are definitely not required there, I just wanted to show how you could formalize access to resources and facilitate mocking using them.

Finally, the value of expressing the application as a state machine at top level is that it decouples the flow from state transformation. I find that makes it easier to see what the workflow is doing without worrying about the implementation details at each step. And having each step being a function or a multimethod that simply takes a state map and returns a different map makes it easy to create composable components.

Harleqin · January 5, 2023, 4:00pm

I encounter this all the time, independently of any particular architectural style or framework used.

I think the main point here is that details matter. I can’t give you actual advice without having seen actual code and actually reading it (which is /work/).

It seems that you already have some points you’d like to work on:

better separation of pure and side-effect-y code, e. g. pulling out the queries from the validation steps
separating the creation of actions (which could be pure) from their execution/logging
taking a closer look at the re-use patterns: are the re-used functions towards the leaves, and do they form a useful language? Or are a bunch of flags passed around to do endoscopic configuration?
is there a way to more clearly define a domain model? Hopefully without adding heaps of ceremony?

I don’t think that you can avoid this work.

iainctduncan · January 6, 2023, 11:40pm

I am, when not coding for my grad school funtimes, a technical due diligence assessor for companies getting purchased. Let me just say as loudly as possible - this stuff matters. I have personally seen many companies in dire straights (to the tune of millions off their deals or selling underwater) because they didn’t want to “over-engineer” at the beginning and now have crippling tech debt because of eronneous early assumptions and over-coupled code around those.

My suggestion is to read both inside and outside of Clojure circles to get an idea of various approaches, pitfalls, and yardsticks for success. And make sure the entire app is built so you can change your mind later on anything with big ramifications. This almost always involves some kind of component look up system and some kind of “clean”, “onion”, or “hexagonal” architecture. I’ve seen lots of ways this is done, but man, it’s worth doing now. There’s some good stuff in the Bob Martin “Clean Architecture” book (and a lot of fluff admittedly), and I love the old Patterns of Enterprise Architecture, Enterprise Integration Patterns, and Domain Driven Design books. You can find “Clojury” ways of solving those issues, it doesn’t have to be traditional DI (though that works), but you gotta have some clean way of growing past your initial assumptions on data store, throughput, boundaries, where i/o comes from, etc. The Kleppmann “Desingin Data Intensive Applications” book is great for finding out what you might be worrying about later.

These are very literally the million dollar questions! HTH.

NoahTheDuke · January 7, 2023, 12:22am

These are very literally the million dollar questions! HTH

Thanks for the reply! I definitely agree, which is why I’ve come here. There are plenty of good books for OOP languages on these matters, but adapting them for a pragmatically functional language like Clojure is tough. I wish I could share code, but alas. I’m hoping that folks like yourselves can provide actionable advice, since most of the existing advice is more philosophical.

iainctduncan · January 7, 2023, 1:12am

ok cool, in the spirit of giving back, here is the one piece of advice I would give to avoid startups screwing themselves. I have seen this cost millions and millions in years 5-10.

Never assume that the incoming request will come from the web, and that your data store will be the same for all things (or stay the same over time for the same thing).

Every rapid MVC framework (Mean, Django, Rails, etc) encourages you to make those assumptions, by calling business logic operations coupled to the data persistence scheme (i.e. ORM calls), in the web controllers, which are your business logic units. Once you have hundreds or thousands of those, they are very, very hard to change. A good architecture looks like this:

web controller calls your component management system to get a resource manager thing for dealing with certain domains, but it has no idea what this resource manager thing is, just what it does.
it calls operations in your code (not library methods of 3rd party code!) on said thing to do stuff. it gets back some opaque thing it uses for a business operation, getting back some opaque response thing
under the hood somewhere, there is an implementation file that knows this is SQL or whatever, but it only gets depended on and looked up by abstract interface - no client code knows how it works or exactly what it is on the inside.

To quote our beloved shaggy haired leader, components should say “I don’t know, I don’t care”. This is the core of onion/hexagonal/clean/ports & adapters in a nutshell.

Doing the above everywhere, all the time, requires a lot of discipline. You need to make a component manager that allows outer layers (web controllers, message bus endpoints, etc) to ask for the thing that gets them things while knowing nothing about how any of those are implemented. And you need to put operations on this for business actions (this is your “domain service layer” in DDD parlance). So to the early stage coder, this looks like heinous Java-esque overengineering - all these extra abstractions to write! But they are YOUR abstractions. Swapping out deps is trivial in your own code. Once you’ve done it, you literally change ONE FILE (the wiring in your component manager) to swap out the persistence scheme for a certain domain entity. And adding input to your system that comes from messages instead of web requests (async instead of sync) is now trivial. And everything is beautifully easy to test because changing components to mocks is easy peasy.

If your business succeeds, most problem domains will eventually need this (both changes to how you assumed you could get away with storing and querying and the assumption that all input would be synchronous http calls). To write it at the beginning, the overhead is someone writing like I dunno, a bunch of pages of extra code at the beginning and then handfuls more when they make a new domain resource? An extra two call layers when they use it? It’s just not that big. But if you don’t, it can be thousands and thousands of files that can’t be fixed while running the day to day operations. I got hired once to try and fix such a mess, and it couldn’t be done - the lunatics had run the asylum for too long, we would have need more extra firepower than the business was worth. Company got sold at a loss. I have done diligences on multiple companies in the same boat, mostly from a Ruby on Rails app that just grew. “You’re not going to need it” kills companies. It kills me when I hear that crap, just kills me. It is not pleasant talking to nice devs about their work while helping them realize the house is on fire and they should dust off the resume.

HTH!

system · July 8, 2023, 1:12pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.