Architecture of big applications

Yogthos · December 28, 2022, 3:04am

I don’t have any public projects to point to unfortunately. I completely agree that having some standard practices for doing things encapsulated using a micro framework is extremely valuable for beginners. Currently, the bar for using Clojure effectively is a lot higher than it needs to be. Unless you know somebody who’s already an experienced Clojure dev then it can be pretty tricky to figure out how to structure your app effectively.

geokon-gh · December 30, 2022, 7:21am

Thanks for writing the article @Yogthos ! It’s well written and deserves a few more read throughs from me

I had to work on a rather icky complicated GUI a couple years ago (with lots and lots of statefulness) and I found one thing that was a life saver was memoization (and avoiding “map managment”). It strangely seems to never come up in these conversations, I just wanted to know your thoughts on it

My own thoughts on this are a bit half baked, but just working off your example I’ll try to illustrate what I mean.

So in your example, instead of having, users, funds, emails… etc., your state would remain simply the list of transactions/events. These are much harder to mess up. You then created memoized functions/interfaces that would compute things like a list of user, the amount of funds a user has, the email of a user… etc.

All state-change then boils down to appending a transaction/event - and that’s it! As long as you don’t push a broken transaction then you should be fine. (and if you do, it’s easy to catch)

This has two primary benefits. First is your code becomes even more decoupled and stateless and it’s harder to end up with a broken state. Second, is that it becomes significantly easier to refactor as things get centralized in the memoized state-getter functions. So the system end up scaling much better.

So for instance, if say you have a new requirement where you want to track how many transaction each user has done. If you have a managed state, then you need to find each location you’ve done a transaction and add something to increment some counter (and introduce a new thing in your map that needs to be managed). While in a memoized derived state you’d just introduce a new memoized function that computed that for you directly from the transaction record.

You need to be a bit careful with the memoization cache - but I don’t think this is an intractable problem. @vlaaad has a nice state management system in CLJFX where these memoized functions can call other memoized functions and cleverly reuse results without recomputing things.

Anyways, I was wondering if you’ve looked into this side of things. I thing for large applications where there is tons of state to maintain, this approach reduces the surface area of potential issues. It’s not very black/white and you do need to settle on what is the exact “state” you’re tracking. Like in a GUI you might not want to go crazy and keep a list of all the UI interaction that are then virtually replayed to generate the current state… but you try to boil the state down to just what’s necessary, preferably in a way where you can’t make it invalid - and then have the state be interfaced through what is essentially a memoized interface. Decoupling the internal representation from the state interface.

Yogthos · December 30, 2022, 2:36pm

I find that for large GUI apps my strategy has been to try and minimize global context, and treat each page as an independent app. This might not always work depending on the nature of the app, but in general I’ve found that you just need a bit of context such as user details. I tend to use re-frame on the frontend, and then namespace the subscriptions and dispatches to the particular page they’re associated with.

didibus · December 30, 2022, 9:18pm

Personally I don’t know if all that genericity is necessary, the use of protocols, multimethods, a map of various states, all seems a bit overly complex.

If you look here: GitHub - didibus/clj-ddd-example: An example implementation of Domain Driven Design in Clojure. you’ve got the same example of an application transfering money between two users, but it only uses normal functions and maps. It is similarly following a clean architecture and in addition domain driven design, and has the same properties of being easy to test, and has good modularity.

I would have to try the alternative to know for sure, but my first impression is that it seems to bring a kind of more generic state machine framework that I’m not sure gives you much?

NoahTheDuke · January 4, 2023, 4:07pm

All of these replies are great. Thanks to everyone who’s participated so far.

One thing that’s been on my mind as I’ve read your thoughts is that different applications require fundamentally different approaches to this problem. If I’m writing a video game, I need a game loop of some kind and that informs/enforces certain decisions, whereas a web app with “normal” REST endpoints informs/enforces other decisions. I wasn’t explicit in my original post because I hadn’t thought it through enough, which I’m seeing now from the variety of approaches to this problem.

At my current workplace and codebase, it’s a traditional web app, so except for our long-running aggregation and processing tasks we run on set intervals, the majority of the code is geared towards “receive an input from a web request, process input data, convert input data into database queries, generate output data, and return some result to the requester.”

This means that a complex state machine (like the one described in @Yogthos’s blog) isn’t necessarily the best move for us. However, the hodgepodge we have now is also not working, as seen by my posts here.

Maybe a rough example would be helpful. In our app, we can generate reports. When a report is edited, we regenerate the data. This happens in the call stack starting in the (PATCH "/reports/:id" ...) compojure route. The call graph looks something like this:

Validation
Format input data
Upsert data into relevant database tables
Track actions in audit log
Generate output data
Render and return output data

On its face, this is a pretty simple workflow and each part is neatly wrapped in a top-level function. However, each one hides a lot of side-effects: there are 25 steps of validation, with ~5 of them peppered throughout making db queries; the patching/upserting itself makes queries; the audit log makes queries; and gathering the output makes queries.

Additionally, the total used code is split among 50+ different functions (in the full call graph) across 15+ namespaces. Some functions are reused, some are single-use. Tracing it all requires closely paying attention, knowing or intuiting the shape of the inputs and outputs of each function (as we don’t have any internal schema definitions).

I realize this is veering into “how to refactor 101” territory, so my apologies. But it’s a lot of code and most of it is “actions”, to use the Grokking Simplicity terminology. It feels like there should be cleaner ways of separating action from calculation structurally/architecturally than “put some functions in a namespace, require and use them”, which is where we’re at now.

How do y’all handle situations like this? Have you encountered it before?

Yogthos · January 4, 2023, 7:54pm

There are lots of ways to approach the problem. I personally like having the entire state in a single map because it makes it easy to observe and serialize the state. Meanwhile, multimethods make it easy to add new functionality without having to change existing code. In my experience, this facilitates writing code in a largely additive style.

Protocols are definitely not required there, I just wanted to show how you could formalize access to resources and facilitate mocking using them.

Finally, the value of expressing the application as a state machine at top level is that it decouples the flow from state transformation. I find that makes it easier to see what the workflow is doing without worrying about the implementation details at each step. And having each step being a function or a multimethod that simply takes a state map and returns a different map makes it easy to create composable components.

Harleqin · January 5, 2023, 4:00pm

I encounter this all the time, independently of any particular architectural style or framework used.

I think the main point here is that details matter. I can’t give you actual advice without having seen actual code and actually reading it (which is /work/).

It seems that you already have some points you’d like to work on:

better separation of pure and side-effect-y code, e. g. pulling out the queries from the validation steps
separating the creation of actions (which could be pure) from their execution/logging
taking a closer look at the re-use patterns: are the re-used functions towards the leaves, and do they form a useful language? Or are a bunch of flags passed around to do endoscopic configuration?
is there a way to more clearly define a domain model? Hopefully without adding heaps of ceremony?

I don’t think that you can avoid this work.

iainctduncan · January 6, 2023, 11:40pm

I am, when not coding for my grad school funtimes, a technical due diligence assessor for companies getting purchased. Let me just say as loudly as possible - this stuff matters. I have personally seen many companies in dire straights (to the tune of millions off their deals or selling underwater) because they didn’t want to “over-engineer” at the beginning and now have crippling tech debt because of eronneous early assumptions and over-coupled code around those.

My suggestion is to read both inside and outside of Clojure circles to get an idea of various approaches, pitfalls, and yardsticks for success. And make sure the entire app is built so you can change your mind later on anything with big ramifications. This almost always involves some kind of component look up system and some kind of “clean”, “onion”, or “hexagonal” architecture. I’ve seen lots of ways this is done, but man, it’s worth doing now. There’s some good stuff in the Bob Martin “Clean Architecture” book (and a lot of fluff admittedly), and I love the old Patterns of Enterprise Architecture, Enterprise Integration Patterns, and Domain Driven Design books. You can find “Clojury” ways of solving those issues, it doesn’t have to be traditional DI (though that works), but you gotta have some clean way of growing past your initial assumptions on data store, throughput, boundaries, where i/o comes from, etc. The Kleppmann “Desingin Data Intensive Applications” book is great for finding out what you might be worrying about later.

These are very literally the million dollar questions! HTH.

NoahTheDuke · January 7, 2023, 12:22am

These are very literally the million dollar questions! HTH

Thanks for the reply! I definitely agree, which is why I’ve come here. There are plenty of good books for OOP languages on these matters, but adapting them for a pragmatically functional language like Clojure is tough. I wish I could share code, but alas. I’m hoping that folks like yourselves can provide actionable advice, since most of the existing advice is more philosophical.

iainctduncan · January 7, 2023, 1:12am

ok cool, in the spirit of giving back, here is the one piece of advice I would give to avoid startups screwing themselves. I have seen this cost millions and millions in years 5-10.

Never assume that the incoming request will come from the web, and that your data store will be the same for all things (or stay the same over time for the same thing).

Every rapid MVC framework (Mean, Django, Rails, etc) encourages you to make those assumptions, by calling business logic operations coupled to the data persistence scheme (i.e. ORM calls), in the web controllers, which are your business logic units. Once you have hundreds or thousands of those, they are very, very hard to change. A good architecture looks like this:

web controller calls your component management system to get a resource manager thing for dealing with certain domains, but it has no idea what this resource manager thing is, just what it does.
it calls operations in your code (not library methods of 3rd party code!) on said thing to do stuff. it gets back some opaque thing it uses for a business operation, getting back some opaque response thing
under the hood somewhere, there is an implementation file that knows this is SQL or whatever, but it only gets depended on and looked up by abstract interface - no client code knows how it works or exactly what it is on the inside.

To quote our beloved shaggy haired leader, components should say “I don’t know, I don’t care”. This is the core of onion/hexagonal/clean/ports & adapters in a nutshell.

Doing the above everywhere, all the time, requires a lot of discipline. You need to make a component manager that allows outer layers (web controllers, message bus endpoints, etc) to ask for the thing that gets them things while knowing nothing about how any of those are implemented. And you need to put operations on this for business actions (this is your “domain service layer” in DDD parlance). So to the early stage coder, this looks like heinous Java-esque overengineering - all these extra abstractions to write! But they are YOUR abstractions. Swapping out deps is trivial in your own code. Once you’ve done it, you literally change ONE FILE (the wiring in your component manager) to swap out the persistence scheme for a certain domain entity. And adding input to your system that comes from messages instead of web requests (async instead of sync) is now trivial. And everything is beautifully easy to test because changing components to mocks is easy peasy.

If your business succeeds, most problem domains will eventually need this (both changes to how you assumed you could get away with storing and querying and the assumption that all input would be synchronous http calls). To write it at the beginning, the overhead is someone writing like I dunno, a bunch of pages of extra code at the beginning and then handfuls more when they make a new domain resource? An extra two call layers when they use it? It’s just not that big. But if you don’t, it can be thousands and thousands of files that can’t be fixed while running the day to day operations. I got hired once to try and fix such a mess, and it couldn’t be done - the lunatics had run the asylum for too long, we would have need more extra firepower than the business was worth. Company got sold at a loss. I have done diligences on multiple companies in the same boat, mostly from a Ruby on Rails app that just grew. “You’re not going to need it” kills companies. It kills me when I hear that crap, just kills me. It is not pleasant talking to nice devs about their work while helping them realize the house is on fire and they should dust off the resume.

HTH!

system · July 8, 2023, 1:12pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.