One of the biggest advantages of static typing is how IDEs can leverage it to manage large codebases. And I’ve heard it said frequently that Clojure is best suited to small teams.
How can large teams and/or large codebases in Clojure be managed?
One of the biggest advantages of static typing is how IDEs can leverage it to manage large codebases. And I’ve heard it said frequently that Clojure is best suited to small teams.
How can large teams and/or large codebases in Clojure be managed?
I’d submit that the perspective of the Clojure community is that there’s an important difference between large codebases and complex codebases. Static typing and IDEs offer tools that help with some aspects of complexity, but Clojure offers mechanisms to avoid complexity, even in a larger codebase.
To put a finer point on that, a codebase becomes complex when it has lots of interwoven components mutating over time. Clojure encourages us to unweave these components where possible, use immutable data so things don’t change out from under us, use standard data structures so everything is inspectable, and invoke simple, consistent concurrency semantics when things truly must change. It doesn’t matter so much that a codebase has 300 types of records when you have immutable data and referential transparency—you need to understand the records that are in lexical scope, and you can forget about the others because they aren’t involved.
In terms of pragmatic suggestions, I think it’s mostly pretty standard stuff: code reviews, automated testing, continuous integration, and well-defined interfaces (microservices, spec, Schema, good namespacing, queues, etc.).
EDIT: a couple others that are a bit more Clojure-specific:
In no particular order (all are important):
I agree with people here. As long as you limit side effects to the limits of your program and use immutable datastructures everywhere possible (which is mostly everywhere), immutability will protect you. Then it’s a matter a good software engineering practices like @didibus said: good code naming/organization, separation of concerns etc.
And the REPL (with the immutable datastructures) really is the secret weapon. In my team I’m unfortunately the only one who uses a REPL connected editor and know some tooling to inspect/debug code, and I really feel I’m way faster than workmates when it comes to debugging. Hopefully this will change soon as I’ll be teaching them some REPL-fu soon. (I (over)use @vvvvalvalval’s scope-capture and sometimes datascope, their combination is just… futuristic for the rest of the world).
Note that the good engineering practices are the same than any other languages, the REPL just puts Clojure above most of them.
I don’t have much to add above what people here have noted but I wrote a blog on some of the issues I’ve come across in large codebases. TL;DR Developer disciplines mentioned here - REPL based dev, SRP, judicious use of Spec/Schema, etc - are very important. http://devcycle.co.uk/clojure-is-the-devil/
The best practice from the largest personal Clojure project (Lin Pengcheng Financial Analyser )
1.IDE: Notepad++ (ClojureBoxNpp)
Version Control: 7z.exe
Programming ideas (PurefunctionPipelineDataflow)simulate the following list:
Imaginative programming: Everything is an algorithm, at your fingertips.
The most valuable chapter of “Code Complete” : Chapter 2 Metaphors for a Richer Understanding of Software Development
Business management thinking
Pipeline technology for large industrial production
Business process reengineering
Enterprise organization, system, process design thinking
Accounting
Integrated circuit diagram
Urban water network
Boeing aircraft pulse production line technology
Confluence technology of rivers from the source to the sea
Data-centric, dataflow, designing a data model that is simple and fluent in manipulation. The line between the two points is the shortest, and the data is directly manipulated from the initial state to the final state.
Pure Clojure.
Don’t use OO, FP, AOP. They are overly complex hand-workshop-level technologies.
Don’t write middleware, macros, loop. They are hard to read, difficult to debug and observe.
repl drive development.
Try to design a pure function (pipe function) of a single hash-map parameter.
10.Minimize front-end code.
11 Side effects can only appear at the end of the pipe.
12.Try to use thread macros.
13.Code linearization, schematicization, simplification. What You See Is What You Get.
14.Use namespaces to achieve good code structure.
16.Data verification only appears at the beginning of the pipeline.
18.Use and design “simple DSL”, like hiccup, honeysql etc. DSL usage is code conversion, Using data style representation is better than using function style representation. A series of pipeline functions are concatenated to form a compiler for converting DSL data into target code and then evaluating it.
19.The best abstraction is: data and logic are strictly separated, data-flow is current-flow, function is chip, thread macro (->>, -> etc.) is a wire, and the entire system is an integrated circuit that is energized.
Another yes to everything that @camdez and @didibus said and I’ll particularly call out good namespace naming and organization (something that we weren’t very good about when we started in 2011 but are increasingly getting better at now). That latter area is where we could all do with a lot more guidance and written articles, I think. Many of the other bullet points mentioned are much more straightforward.
I guess there’s also the question of what is a “large” codebase in Clojure. There was a talk at Conj last year (I think) about a “large” codebase that was in the 30-40K range. Here’s the stats on our codebase (we run this every week and track the output so we can see code growth – or shrinkage – over time):
Clojure build/config 47 files 2532 total loc
Clojure source 260 files 61555 total loc,
3278 fns, 673 of which are private,
383 vars, 42 macros, 60 atoms,
468 specs, 19 function specs.
Clojure tests 147 files 19176 total loc,
23 specs, 1 function specs.
The build/config total includes both our (large) build.boot
file and all our EDN files (both for configuration and for dependencies – we manage those external to Boot in a precursor to deps.edn
).
What did you use to produce those numbers ? It doesn’t look like cloc.
It’s just a shell script that finds certain types of files and uses wc
and fgrep
I knocked it up originally to track aspects of our legacy codebase nearly ten years ago and added tracking for Clojure as we started to use that as well
“Largest” according to what metric, out of curiosity? LoC / number of contributors / scale of deployement / … ?
“Largest” according to LoC, I only wrote a formal project, this is personal amateur project.
It is not based on the OO or FP, It uses own pure function pipeline dataflow programming technology, It is a technology based on big industrial ideas, I think It is a better technology than OO and FP. OO&FP are just hand-workshop-level technologies.
Developing version (rewrite, base luminus, pure clojure(script))
clojure: 34k+ lines(Include a bit of code for repl testing)
clojurescript: 5k+ lines(Include a bit of code for repl testing)
Last version (pure clojure-clr, .net winform app):
clojure-clr: 25k+ lines (don‘’t include test code)
There will be a talk about a large codebase in Clojure in the ClojuTRE conference.
Now, after 6 years, 120 000 end-users, 10 000 permits applied monthly, well over 40k commits made by 25 programmers over the time resulting over 125k lines of Clojure code, it’s interesting to take a look how this controversial and risky language has served us.
ClojuTRE & SmallFP Goes Helsinki. Will be recorded.
My project is now 7 years old Clojure project. There are too few personal spare time available for development., it can only be regarded as the largest personal project now.
In the future, perhaps it can strive to be the project of the most end users.
Can you elaborate on that, please?
Of course. It’s a little hard to explain, but I’ll try my best.
What I mean by this is that you want to try and split up your logical operations and their control flow.
If A, B, C are the operations you need performed. Lets assume all of them need no input data and simply print out their names.
(defn A [] (println "A"))
(defn B [] (println "B"))
(defn C [] (println "C"))
Now say you want to print ABC? It’s often tempting to do:
(defn A [] (println "A") (B))
(defn B [] (println "B") (C))
(defn C [] (println "C"))
(A)
Creating a deep/nested call stack, where you’ve coupled your operations and their flow together.
Instead favor a shallow/flat call stack:
(defn A [] (println "A"))
(defn B [] (println "B"))
(defn C [] (println "C"))
(do (A) (B) (C))
With top level orchestration.
Hope that clarifies it.
Pipeline is as flat as possible.
Let the reader see the flow of the code at a glance, like a company’s business process
Data verification only appears at the beginning of the pipeline
Side effects can only appear at the end of the pipeline.
Normalize data.
(some->> data-map
opt-valid-pure-fn
pipe-normalize-data-pure-fn
(map pipe-transform-data-pure-fn1 ,)
pipe-transform-data-pure-fn2
(reduce pipe-transform-data-pure-fn3 {} ,)
opt-side-effects-fn)
(defn f [{:keys [x y z] :as m}]
(-> (> x 1)
(and , (< y 10))
(and , (> z 99))
(if , :t :f)))
(defn path-combine [s1 s2]
(cond
(string/starts-with? s2 "/")
s2
(not (string/ends-with? s1 "/"))
(-> (string/split s1 #"[\\/]")
butlast
(#(string/join "/" %))
(str , "/")
(path-combine , s2))
:else
(-> (string/join "/" [s1 s2])
(string/replace , #"[\\/]+" "/"))))
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.