Organizing Clojure code - A real problem?

mvarela · April 25, 2021, 7:02am

I’ve had a slightly different approach to this… Typically, I will start with a dev.clj namespace, outside of src, which is a sort of (mostly) append-only scratch pad. I use this to interact with the REPL, and iterate over my code. Stuff that has been “finished”, I archive inside a Rich comment form (so it’s not eval'ed if I reload the namespace). Once I’m happy with that, I will send it off to its own namespace. For one-off stuff, this could be a single file, but in general, I will split the functions and spec definitions over several namespaces that carry some semantic meaning. For example, I’m currently developing an API with several endpoints grouped by functionality (say, projects, analysis, and diagnostics).

Each one of those groups gets its own little namespace (com.blah.api.projects, etc) with route definitions and handlers, and I compose those into a single api.core ns that builds the actual Reitit routes table from the separate components. I have a similar approach to my Malli specs (which is arguably more useful than for the API components, since they can get fairly verbose, and I prefer working on smaller files).

For the data layer, I have a com.blah.db namespace that deals with all DB matters, and there’s also the com.blah.util ns, with general helpers, a db-specific com.blah.db-utils with small DB-specific helpers. I find that working with several smaller namespaces makes my code feel less cluttered. In this particular project, I have 17 separate namespaces (plus the dev scratchpad). It may sound like quite a lot, but it is actually quite easy to navigate them, and more importantly, they make sense to me.

This last point is probably a caveat, since I’m currently the only person doing Clojure in my team, so if there were more hands in the pot, some sort of formalization on how to group code might be necessary.

Regarding the “append-only” nature of the scratch pad, I sometimes trim stuff that’s become obsolete, to make it easier to navigate the file, but I tend to keep a good amount of “history” there, as well as small helpers I keep coming to when working on the REPL.

I think I should also have a look at Polylith, it looks like it may be a good way to enforce a bit more structure on my approach.

witek · April 25, 2021, 10:31am

I was building Java applications for over 20 years and I am switching to Clojure now. And I have a hard time fighting my OO habbits. In my current pure ClojureScript project I am again back to a MVC like structure.

In the domain namespace I have a namespace for each entity (x.domain.user, .product, .order, …). All pure functions, most of them getters to the keys in the entity map. Some computations and combinations of getters. In the controller namespace I have calls to the database to update the entities. Most of the application is React view components which render the entities and forward events to the functions in the controller namespace. There is also a namespace for some cloud functions.

This is why:

I have a namespace with getters to fields for each entity so I don’t have to use keywords in the view. Because I can not remember the keywords. Instead I use the getters to get auto-completion and compiler errors. I also often start with getters like x.domain.order/status which starts as a real field in the db and gets a computation result later in the project. Having getters from the start I don’t have to refactor the view from using keywords to calling computation functions when this happens.
Since the ui components are by far the most code, I have them split in multiple namespaces. Mostly one for each “page” in the application. Since multiple pages access the same entities, it seams I need these entity namespaces for the shared getters and computations in the entities.
Functions in the controller namespace also use getters an computations from the domain layer to make some decisions for their updates. I have this separate controller namespace because besides the view components it is used by the cloud functions. Instead of user click events it is triggered by cloud functions events.

I see this is the same reasoning I had in my OO code. But what could be a better approach in a serverless (Firebase) application?

mvarela · April 25, 2021, 1:28pm

Using getters and setters is typically seen as non-idiomatic in Clojure. What IDE are you using? I use Cider and LSP, and keywords are auto-completed (if the namespaces have been loaded) for the most part…

BadChicken · April 25, 2021, 1:40pm

You might be interested in a presentation called Solving Problems the Clojure Way .

didibus · April 25, 2021, 7:44pm

Ya that doesn’t sound right to me in Clojure, sorry to say.

You’ll get used to it, just trust yourself, start learning your data model and you’ll begin to remember what it contains. Your editor should also auto-complete keywords, just not in a way that you know which entity has what keywords. You’ll also learn to quickly refer to the definition to help you out.

This sounds wrong to me, your model should be data based. Your view will read status and present it, there should be a function in your model that computes the status and sets its new value on your model, which then updates the view with it. It shouldn’t be that the view calls a function to retrieve the status. Think push from model to view instead of pull.

Seems the same issue, your UI shouldn’t be using your model in MVC, the arrow is from View to Controller and Controller to View. Your UI components shouldn’t depend on your Model at all. Instead your model should push to the view a map, and then your View should use that data to render itself. Then user events on the View will be sent to your controller, who will then leverage the Model functions to modify the model data which once updated will call the view with the new model data map and the View will render itself a new.

So your View can be separated in as many namespaces as you want, but think of those namespaces as your presentation logic. You could put it all in one giant namespace at first. In that namespace you’d have functions to format the data from the model in the user specific way, so things to say show a date in a human friendly format, a currency in the locale of the user, etc. And you’d have some render functions: render-page-x for example which takes a map of the data from the model it has to render, and options of how the user wants them rendered. The whole View layer should be pure.

That means how you break your View layer into namespaces is more about your own being able to make sense of it and of which parts of it are shared between different views.

That seems fine, but I feel there’s something about namespaces maybe you misunderstand? Like you could easily have both these controllers (the one used by the cloud functions and the one used by the browsers) in the same namespace. So having “another controller” isn’t a good reason to have two namespaces. It’s also not a reason not too, but you have to understand those are orthogonal concerns. A namespace is not like a class, it is more like a Package.

witek · April 26, 2021, 10:15am

Thank you for your suggestions.

Regarding getters in the model and pushing precomputed maps to the view. I startet with this approach (which also is the re-frame approach), but I abondened it for the following reason:

When the view is completely stateless, all the intermediate ui state and it’s management has to move somewhere else and the reuse of ui components get’s divided into two parts that have to be synchronized.

An example: Given a page which displays a person (just the name) in an expandable material ui card. When the card is expanded, details for the person get displayed. For the details a separate call to the server/database is required.

In my current architecture the view has it’s own state (card collapsed/expanded) and subscribes to the database query by itself instead of getting data pushed. That way there is a person details component, which only get’s rendered, when it’s parent (the card) is expanded. You get lazy loading of the details for free and don’t have to manage this state by yourself. You get a reusable expandable person component. If you want a second person card on the page you just put a second view component there with a second person-id parameter.

In the push all data to the view architecture such components seem a lot harder, because you can not package all the functionality in one view function. You have to have some code which manages the view state (expanded/collapsed) and then the render function which uses this sate. When you need a second person card on your page, you have to put it there and then you have to refactor some other code which previously was a simple boolean person-expanded? into a map of expanded persons and then more code which distributes this values to the view components. And then you have even more code which subscribes to the db based on the expanded states and passes the results to the view. Following this architecture, you will gat a tree of view components plus a corresponding tree of state somewhere else which kind of matches the view tree. And since a react application already has the state (the component tree) where corresponding data/models can be put alog with the view components, managing another tree of state by hand seams a waste of time and introduces more complexity.

I see, you get pure view functions from the stateless view. But I don’t get how this could be worth it. Will see when the app grows…

jogo3000 · April 27, 2021, 1:18pm

You could have stateful components, yes. In some ways it is simpler, yes.

The problems usually arise when you want to have components share some information with each other. For this problem, solutions like Flux, Redux and Elm architecture were created. I think Redux makes a good case of when a solution like this might be needed:

In Clojurescript realm we have at least Om, Re-frame and tuck (GitHub - tatut/tuck: Tuck: a micro framework for Reagent apps) but the basic Elm architecture is pretty easy to implement with Reagent too.

jeff.terrell · April 29, 2021, 7:37pm

Excellent question! I spent a semester trying to teach exactly this to students at UNC, so I’ve spent some time struggling with how to do so effectively. (Which is not to say that I’ve succeeded!)

I think a big reason why this is difficult is that, whether you call this problem one of style, design, or engineering, it’s a qualitative, right-brained thing that’s difficult (though not impossible) to convey in words. It’s also difficult to convey in the abstract, without embedding it in a concrete context of actual code—and since the problem doesn’t manifest at small scale, that’s difficult too. So, @ericnormand, you struggling to convey it, despite your extensive teaching experience, isn’t surprising.

One example of effective teaching about style (of written English in this case) is the famous Elements of Style by Strunk & White. (I wonder if, with some effort, we could come up with a similar set of rules for Clojure style.) One of Strunk & White’s rules is to keep related words together. This rule has served me very well in both written English and in code; in fact, @didibus said the same thing here:

Don’t group functions by anything, simply make sure that functions that use others are relatively close to each other as much as possible.

I’d propose another rule, summarizing what many have said above: start with one namespace, and split when it feels unwieldy. The word “feels” is obviously subjective, as it should be. Hopefully, programmers can refine their sensibilities through their own experience as well as by inviting those with more experience to weigh in on matters of style. I use this concept in English as well: when an email (or forum post?) gets too long, it’s time to introduce section headings.

Another rule might be: use names that convey accurate meaning. For example, the name of a namespace should capture the idea of what everything in the namespace is about. If something in the namespace doesn’t fit, it’s worth asking why, so that you can refine the names and/or the boundaries of the namespaces. The name should not attempt to educate the uninitiated—you can do that with a namespace docstring if necessary—but to serve as a pointer to some range of meaning in the language part of your brain, the same way human language words do (e.g. “Australian Shepherd” or “oak” or “Calculus”). I have a lecture on this topic from the aforementioned class if you’re interested in a long-winded exposition (video, notes).

None of this is easy to teach. Most will be bored, either because the problem or solution is too abstract, or because the concrete grounding is too bulky. But the really interested can and will learn a lot by wrestling with this. Best of luck!

seancorfield · April 29, 2021, 8:14pm

Such as Elements of Clojure perhaps?

jeff.terrell · April 29, 2021, 8:28pm

Hey! Lookie there. Added to my reading list.

schaueho · April 30, 2021, 8:30am

I am convinced that the topic is not at all Clojure specific. You run into the exact same questions organizing your code base in Common Lisp, Perl or Python (and probably many other languages, too).

Carola Lillienthal has a great book on sustainable software architecture that discusses the problems and patterns of software systems. One key take-away for me is that if you want to build a system that is intended to be well maintainable for a long time (which might not be a goal for all systems out there) then your code base should better reflect core architectural ideas and that your team needs to agree on a common set of rules and constraints to go along with them. The point here is that programmers will have a much easier time learning and understanding the concepts if they are consistently used and obvious in the code base, which will help navigating the code base, improving decision making where and how to make changes and enable discussions when and where to change the structure and the rules. If you can check the adherence to these rules (ArchUnit was already mentioned), all the better. While Lillienthal mostly analyzes OO systems, many of the discussed principles of modularity are in my opinion also applicable to non-OO systems.

With this high-level aspect out of the way, I’ve changed my way of organizing code quite a bit over the years. Like many, I tend to start out with a single file. Usually, what is becoming obvious first is a kind of “physical” or technical structure, e.g. frontend vs. backend or web-related parts vs. database access. Typically, there would also be some informal ideas for rules in the back of my head, e.g. the code in the database layer should not use code from the web layer or some such.

More recently, I started to instead try to get more clarity on the major domain aspects and to organize my code base by structuring it along these lines. So, previously I might have created a “data” and a “views” namespace (with various sub namespaces like “errorview” and “authorview” below). Nowadays, I would much rather have e.g. a “author” namespace, with “db” and “views” namespaces beneath. This can lead to much consistency across different concepts (Lillienthal would probably call this a system specific pattern language), e.g. a “books” namespace might have very similarly “db” and “views” sub-namespaces. Unfortunately, having several files with the same name is mostly a pain to navigate in editors. Things become a bit more difficult when you need to find a place for code that needs to involve multiple domain concepts (something like DDD aggregates), but in the end this just boils down to the central naming problem of any kind of modeling.

One important aspect that might be much Clojure specific is probably the question on how to manage state. If your state is very much managed globally (i.e. a global state map / atom), you’ll probably want to put the definition of this state map outside of your more specialized domain namespaces. However, if you find that you always need the same access paths to some part of that state and that path matches nicely to some domain concept (e.g. “current user”), then of course you can define a dedicated access function, which you can place also in the “user” namespace.

Andy_Wootton · May 5, 2021, 5:39pm

I started reading your book on ‘Grokking Simplicity’ for barely functional beginners today, so it feels wrong to be offering an answer to this question, but:
When people say they want anything to be “organized”, I find they usually mean ‘hierarchically structured’, often by ‘related idea’ (see John Locke) on a tree drawn on an imaginary 2-dimensional surface - because we haven’t moved far beyond the thought patterns associated with organizing physical artifacts on shelves.
OOP code provides a natural object tree structure (in languages without multiple inheritance) but FP generally doesn’t. Because in a Lisp, “code is data”, it is more obvious that this is the same problem the ‘Knowledge Management’ community has been grappling with for years. In 1974, Ted Nelson said, “EVERYTHING IS DEEPLY INTERTWINGLED. In an important sense there are no subjects at all; there is only all knowledge, since the cross-connections among the myriad topics of this world simply cannot be divided up neatly” and has been working on his solution, Project Xanadu - Wikipedia, since 1960. It’s a very real but “non-trivial” problem.

vvvvalvalval · May 6, 2021, 2:33pm

Thanks for raising the subject @ericnormand - these are still open questions to me as well. Here are some thoughts.

Problem statement: what are the objectives of code organization?

You could in theory have a well-architected codebase in just one file - “well-architected” in the sense that it consists of sensible abstractions and decoupled components. The problem is that this “well-architected” nature would not be obvious to readers.

This leads me to think that the main goals of code organization are:

Making separation obvious: when 2 components are decoupled, it should show in the code organization.
Making cohesion hard to miss: when two pieces of code share assumptions, they should be close - ideally, you should see one from the other. I think Zach Tellman phrased this as “assumptions that are made together should be placed together”.
Making the codebase discoverable: when wondering where one piece of logic is implemented, it should be straightforward to find it. Predictability and searchability both help here. Hopefully, as searchability improves, predictability is less needed.

From there, several leads become apparent.

Organize code primarily by domain, not by mechanics

Two pieces of code should be colocated not because they do the same type of job, but because they operate around the same domain ideas - because they are logically related. If working one one of them always makes you cross your entire codebase to the other and back again in quick succession, then code structure is likely suboptimal.

Re-frame example: in the case of re-frame codebase, I recommend against having separate myapp.events, myapp.subscriptions and myapp.view namespaces. In the majority of cases, a given event or subscription is very much coupled to some portion of the view, and it’s more sensible to put them in the same file.

In that light, I would argue that Clojure’s lack of imposed structure is an improvement that feels like a regression. When writing code in other languages, it can feel comforting that the language or framework guides you to a particular code organization - the model code goes into the Model class, the controller code goes into the controller class, it’s a no-brainer. But actually, it’s not a good sign that it’s a no-brainer: arguably, if you’re not thinking, it means that you are not working on the essentials. The problem is that these frameworks tend to impose structure that does nothing to achieve the above-mentioned objectives (and which are usually motivated not by architecture principles, but by constraints and lack of programmability of the underlying code constructs, such as “each class needs its own file”, “CSS and JavaScript go to different files”, etc.). In summary, mechanical guidance towards a particular code organization yields a comforting illusion that you’re doing the right thing, especially when writing code rather than reading it, when in fact you’re being forced into suboptimal structure.

Optimal organization depends on extrinsic factors

Our ever-evolving tools, editors etc. let us navigate and visualize our codebases in many other ways than reading files linearly. This seems superficial, but it’s a game-changer, because it changes what it means for two pieces of code to be “close”: the specific ergonomics of our code-viewing tools effectively change the topology of our code, in ways that are not visible in the directory layout (I mean “topology” in the classical mathematical sense, i.e “who’s neighbours with whom”).

As an example, consider this Cursive screenshot:

Look at all the different views we have of the same code:

The code buffer in the middle. It’s not just raw text: it’s syntax-highlighted, some code is collapsed, you can navigate to other parts of the code in one keypress, etc.
An full file path on top
A birds-eye view of the directory layout
A table of contents of the definitions in this file.
Usages of a particular programmatic name at the bottom
The REPL on the right also lets you call documentation functions that you can use to navigate, e.g (source ...), (apropos ...) etc.

If I didn’t have all of those things, I suspect I would strike different tradeoffs in code organization.

In fact, I would hazard the following conjectures:

The code organization preferred by a programmer is correlated to the editor she uses
Developing custom, special-purpose “materialized views” of our codebases might be a fruitful direction in Programming Language R&D. We’re already doing it to an extent in the Clojure community, what with things like Reveal etc. But we might want to do it more consciously and purposefully.

To make matters worse, it’s not just about the tools, there are other factors:

Brain skills: Some programmers are very good at remembering file names. I’m not, and so I’m rewarded by code organization that doesn’t require me to remember many file names.
Project needs: a company that has a high churn of developers might want particularly optimize the discoverability aspect, potentially at the expense of cohesion.

This leads us to the disappointing observation that, as is the case with abstraction, the merits of a particular code organization depend not just on the code itself, but on the environment in which it evolves (which involves tools and humans in all their irregular and incompatible glory).

Some concrete guidelines

I agree with your guideline that “when a namespace grows so large that it’s become unwieldy, split it.”
Exception to the above: when some function is obviously much more general and applicable than the specific domain of your current namespace, factor it out to another namespace, right away.
Runtime state should have no influence on code organization. (I’m looking at you, Mount!) It’s hard enough without adding this sort of constraints.
When domain-close code components cannot be in the same file (because it would be too big, or because it would make the logic less portable across runtimes), it makes sense to have a ‘star graph’ structure, with one namespace expressing the core logic in a very portable way, and other namespaces for particular applications of this domain depend on it.
Namespaced keywords can do wonders for searchability.
Consider putting your front-end and back-end code in the same source directory. They won’t depend on the same libraries and cannot run on the same runtime, but it doesn’t matter if code is loaded in the right way (separating .clj from .cljs is a problem for the compiler, not for humans).

I confess I’ve never lived up fully to the high principles I’m laying out out here Thanks for the question, it’s thought-provoking.

ericnormand · May 11, 2021, 3:44pm

Hello!

It’s been more than two weeks since I asked this question and I’ve read and thought about all of the replies. Thanks so much for helping me sort through this.

Here is my synthesis of the discussion. Some caveats: This is entirely done for my purposes of having an answer when I’m asked the question and potentially making a course on the topic. When an expert says “oh, don’t worry about it, just structure it however you want”, it is usually hiding a vast wealth of expertise on the topic. I want to put that expertise into a bundle that someone can open up and learn.

Also, I’m not including answers like “just use X” (such as Polylith, suggested by a couple of people). While Polylith helps organize your code, it requires a deep understanding of modular design, and that’s exactly what people asking me this question are lacking.

Finally, all of these are filtered through my own lens. This isn’t an objective summary. It’s stuff I agree with.

Let’s get to the summary. I’m trying to keep it short and concise. If you would like me to clarify anything, I would love to hear it.

Organization is YAGNI

Start with one namespace. Split it up only when it becomes clear that a namespace would help your goals.

If you are anxious about the organization, it’s the anxiety of freedom. Relax. You can easily re-organize it later.

Many expert Clojurists wind up with this practice after many years of experience.

Don’t organize a complex solution

A lot of organization exists to manage unnecessarily complicated code. Simplify your code before you worry about organizing.

Anything besides data structures, functions, and namespaces needs a good reason to exist.

Be clear on the goals of organizing

Why are you organizing? Be concrete.

Are you trying to make things easier to find? For whom?
Would you like to make it easier to read for a beginner?

Be careful: If your goal is “I want to understand my code again”, see “Don’t organize a complex solution” again.

Take a pragmatic view, not an idealistic one.

Understand before organizing

Organization reflects understanding.

If it’s unclear how to organize it, improve your understanding of the system—domain, external dependencies (API’s, etc.), and implementation.

Experiment! Even smart, experienced people try lots of things and find dead ends.

Rework the dependency graph

Organization means aligning a) code location (e.g., which file a fn is in) with b) the dependency graph (of functions and namespaces).

Understand how namespaces work and how to refactor to adjust the dependency graph.

Dependencies are structure. Organize so that the structure between namespaces makes sense.

Let’s say a namespace called my-app.album requires a namespace called my-app.album.art-api. Read that as "my-app.album needs to know about my-app.album.art-api". Does it make sense semantically? Opinions may differ. But I think no, it doesn’t make sense. album is a domain concept. It should not know about a technical detail like how to get the artwork from an API.

To fix it, you have to open up the namespace and find the functions from album that depend on functions from album.art-api. Refactor them to remove the dependency from album to album.art-api.

The exact refactorings depend on the code. Some possible ones that might apply:

move functions from album to art-api
move functions from art-api to a third namespace
reimplement functions in album

Name your functions and namespaces well to reflect what they should do.

Hierarchical structure (directories) of namespaces is not as important

Apply modularity

Create modules to encapsulate volatility. Use namespaces as module boundaries.

Tension between:

Keep things that change together close to each other.
Decouple things so they don’t need to change together.

This is a big topic. Probably where the most gains are had.

Likely area for the Clojure Effect to happen.

Apply big boundaries

When in doubt, remember the large boundaries that exist in most applications.

Technology/language
Domain
Business rules
External APIs

The way they are listed above, dependencies should point up, not down.

primer · May 11, 2021, 4:47pm

Likely area for the Clojure Effect to happen.

Clojure newbie here. What do you consider to be the “Clojure Effect”? Is there a consensus definition of this term as part of Clojure culture?

ericnormand · May 11, 2021, 4:59pm

Hi @primer,

Thanks for the question.

No. I just made it up.

What I mean by the “Clojure Effect” is that people often say “Clojure changes the way I think about programming—in a good way.” I believe it’s because Clojure exposes the underlying system to you and you are forced to work directly in it. That helps you build a better understanding of the system than you would if there were complicated layers of indirection between you and the system.

Two examples come to mind:

Ring does very little for you. It merely translates HTTP requests into a map and a map into HTTP responses. It has no opinion on HTTP methods, status codes, or body contents. Learning to use Ring requires learning HTTP deeply.
core.async is queues and processes. Learning to use it means learning to think about ordering, repetition, data flow, and timing—all important things for parallel and distributed programming.

In the context above, namespaces and functions are very thin constructs. They expose the complexities of modularity to you directly. According to the Clojure Effect, that would mean you could learn a lot about modularity in general by organizing Clojure code.

Rock on!
Eric

seancorfield · May 11, 2021, 5:27pm

This, so much! Our ten-year-old codebase shows this in several places. We’re constantly reorganizing older code as we come back to it with wiser eyes and new requirements.

John_Conti · May 11, 2021, 6:25pm

Honestly, despite the fine dialog above, this is dead simple. The object hierarchy which organizes OOP services is exactly emulated by the result of dependency injection libraries like Stuart Sierra’s component and Weavejester’s Integrant. The central hierarchy or system object represents the static classes and the methods on those classes that wrap the state of the application and the state-full side-effecting interfaces to the outside world. The rest of the app can then be pure functions of these objects, made concurrent and immutable by the STM.

kumarshantanu · May 14, 2021, 12:46pm

Thanks to everybody for sharing their understanding - many apt and valuable suggestions in this thread.

I had several ideas on this topic, so I spent some time collecting my current thinking on this topic into a blog post: Organizing Clojure code with Functional Core, Imperative Shell

Please feel free to comment and share feedback.

Shantanu

ericnormand · May 14, 2021, 1:58pm

Great article, @kumarshantanu