Why Clojure over Python?

In order from most to least important: Immutable data structures, improved performance, better concurrency, dialects for both client and server, more sophisticated repl.

My top gripe about Python: Python’s dictionaries only permit immutable data as keys (which is smart), but Python has very limited ways of constructing immutable data (which is incredibly constraining). This severely restricts the utility of dictionaries and the ability to nest data.

The main things I miss about Python:

  1. Highly readable syntax
  2. Great interop with some high-quality C-based numerics packages

Overall, though, I’m far more productive in Clojure. It’s especially superior for any use case that one might describe as “lightweight data modeling” requiring a “data-oriented language”.

5 Likes

Thanks for explaining @mars0i. I’m not as familiar with static FP languages as I am with dynamic ones, so it’s great to learn more about the differences :slight_smile:

1 Like

As someone who would like to use Clojure more often, it pains me to say it’s often hard to make the case.

In particular Python’s performance for most use cases is very good - if you use numpy you’re effectively doing computations in C. You can parallelize a lot of problems simply with the multiprocessing module. Worse, sometimes a naive Clojure implementation can be slower than pure Python.

Also compared to Python - the I/O part is still painful. Wes McKinney, the creator of pandas, himself said that a large part of the package’s popularity is the ease of use of the read_csv / to_csv functions. I’m sure that’s a barrier to entry in Clojure for many.

I think Clojure’s advantage is in live environments where you receive a lot of data asynchronously (tracking a production line, trading financial markets) - the multithreading abilities and the safety of immutable data structures then shines.

1 Like

It sounds you’re specifically speaking about data science experimentation use cases?

I think that would fit inside the “already available mature libraries” criteria. Which I count as contributing to faster “speed of delivery” and “speed of change”. Which I feel, depending on the problem at hand, sometimes Clojure would win, other times Python would.

That said, you.are 100% correct. Python has way more available and mature libs for data science. I don’t think you should even make a case for non production data science to use Clojure over Python, unless your team is very strong in the software engineering side and willing to pioneer the tech on its own.

3 Likes

agreed, was talking about data research / modelling mostly

You are making some good points and I had to quote your entire post, because I will be following along now:

First, indeed the performance of highly-optimized matrix libraries is indeed excellent, and numpy provides a nice interface. That said, I fould that at least in the kinds of data analysis problems I encountered, nearly 80% of the effort was spent on pre-processing, cleaning up the data, extracting features, interactive analysis, and similar tasks. The actual computation took so little time (defined as a percentage of total client project wall clock time), that it really didn’t matter. That said, I did use direct interfaces to matrix libraries sometimes.

As to I/O: I agree it isn’t easy, although I find that these days a simple (drop 1 (csv/read-csv (io/reader (BOMInputStream. (io/input-stream filename)) :encoding "UTF-8"))))) gets me very far. Also, it produces a lazy sequence, which means you don’t try to hold all data in memory.

It’s interesting that you mentioned receiving data asynchronously. I had a BIG advantage when I worked as a data science consultant/freelancer. When I wrote a solution, I would start with data files from the client (logs, data dumps, etc). I would then build a solution using transducers and core.async (pipeline for quick&easy parallel processing). This would produce data for a visualization dashboard, written in ClojureScript and shipped as an Electron app.

So, what was the BIG advantage? Well, I shipped the app and told the client “now, if you’d like to give me a real-time feed of your data, in whatever form, the entire real-time data-processing pipeline is already there, so all you’re seeing could be turned into a production app/tool and updated in real-time”. That was a real eye-opener!

I think most discussions of “programming language X vs Y” are superficial, focusing mostly on syntax, “ease of starting”, available libraries. Not to mention the ridiculous timewaster discussions of “static vs dynamic typing” or “startup time”. The differences run deep and most advantages are small, but they make a huge difference when taken together, especially when you are on a schedule and a budget and need to ship real apps.

BTW, I mean no offense to the original poster, but — I am looking at the “comparison points” mentioned, and I think most of them are irrelevant from my point of view. I don’t care about “easiness” or “familiarity” (as Rich said: “instruments are made for people who can play them”), I don’t care that much about performance, I don’t do any generative testing. I care about building reliable and maintainable solutions on a schedule and a budget, and Clojure is just the right tool for that.

In my case, I use Clojure not because of any single advantage, but because of dozens of advantages. Transducers (very under-appreciated), core.async, tesser, lazy sequences, ClojureScript, EDN, spec, single data format for both server- and client-side, immutable data structures, sequence library — all of this plays together to create an environment which simply can’t be reproduced in Python.

8 Likes

I just parse <wtv>SVs myself now, using line seq. I found that to be much more reliable for some reason.

(with-open [rdr (reader "my/file.csv")]
  (->> rdr
    (line-seq)
    (map #(split % #","))))

Where if the file has issues, I can just be smarter in the split and handle them however works for the given file. And after that I can also add my own field parsing if I need to convert anything from a string into something else.

Its pretty short and quick, and I feel it can easilly accommodate any kind of broken fileset I’d need.

2 Likes

I stopped trying to split the CSVs myself when I noticed that there are all kinds of corner cases — as an example, the code above will not handle quoted commas (inside quoted fields) correctly. I’d much rather use clojure.data.csv which gets most things right and lets me set the separator and the quote character.

The BOMInputStream (which comes from org.apache.commons.io.input) is another thing learned from years of dealing with CSV in the wild. Some files will contain a BOM (byte-order mark) at the beginning and you’ll need to handle that.

2 Likes

Most OO systems are just simulations of real-world surface phenomena, and the whole system, like a mess, I think it is not good method of OO to simulate the real world, but to design it correctly with an abstract refined data model as a prototype. For example, the ggplot2 of the R language, the system is clear, with the perfect data model as the prototype. So a good OO system is more inclined to a data flow system, and I think Ggplot2 is more likely to be a data-driven plot system if OO was not in vogue at the time.:slight_smile:

So I think Python’s OO approach will only encourage casual design, it is not suitable for the construction of large, serious, formal systems.

Few people adopt pure pipeline construction system, glad that you also like pipeline system, I am also, my pipeline method with "Pure function, pipeline, data flow, relational theory " as the pillar, the article is a bit long, you can see: The Pure Function Pipeline Data Flow

2 Likes

I’m not disagreeing about the remarks about Python. I want to point out OO was supposed to be a good way to model the world, but the world doesn’t only include properties that attach to individual things; it also includes relations that can depend on the things that are related. This means that in e.g. Java, people have had to do awkward things in which a method depends not just on the class its attached too, but also its arguments. (Some less popular object systems such as Common Lisp Object System don’t have this problem, and Clojure multimethods and protocols provide some similar functionality.)

1 Like

I think it should be different people have different ways of thinking, so there is a preference for different methods.

The various design patterns for OO and FP are too complex and unnecessary for me, and I don’t want to see complex associations between objects, and I don’t want to use them.

Clojure Multimethods is just an elegant conditional branching statement, and I haven’t used a variety of OO features within the Clojure language, including protocols.

I tend to construct systems with the simplest concepts and the most basic techniques, syntax, and functions. Used to implement my mind, the line sequence structure of the pipeline is the simplest, each pipe pure function is independent, as long as the input and output in line with the series of data interface standards, you can connect to use, do not care about other things, If a bug occurs, like water pipe leakage is obvious, for me, this is the easiest way in the world, I just want to go straight from the starting point to the end. I Don’t want to use traditional OO and FP anymore.

There is a great poet Bai Juyi in China. even illiteracy understands and appreciates his poetry. I hope that my code can be understood by the junior programmer even in the most complicated system.

6 Likes

For me, its the opposite. I didn’t mean that my example was my new generic csv parser. I meant that, clojure.data.csv and others have always at one point failed me, because they all assume correct comma separated semantics. What when I have a file with two different type of escape characters? Or mix encodings? Or uneven columns? Or decimals that have currency characters in front? Etc. Sometimes there’s even missing newlines!

That’s where I found just doing it myself, I can very quickly and easily handle all corner cases specific to the file I’m parsing.

Clojure makes writing a custom parser super simple and quick.

Seriously, have you looked at data.csv’s implementation? data.csv/src/main/clojure/clojure/data/csv.clj at master · clojure/data.csv · GitHub

It’s 146 line of code, half of which are comments, and half of that half is for writing csvs. So only 1/4 of that is for reading and parsing csv files.

How awesome is that!

2 Likes

The protocol is amazing too (if you‘re used to a 50+ nodes/5+ levels class/interface tree from Java or something):

(defprotocol Read-CSV-From
  (read-csv-from [input sep quote]))

As expected in a Clojure channel, I’ve noticed the balance skew towards Clojure.
I think the question is too broad. In order to properly answer: Clojure or Python we need to restrict the context a bit more.

Factors that are relevant:

  • what is the composition of your team? What is their background?
    Python might be easier to teach. Clojure have better functional support.

  • what problem are you solving? How much does it depend on an ecosystem of libraries?
    Python was designed for easy embedding (like Lua) or re-using C libraries.
    Clojure plays nicer in the JVM realm. Both have great support on the CLR/.NET
    Python is a de facto standard for data science, Clojure is taking its first steps.
    Clojure is better at leveraging multi-threading, Python is much more light weight.

  • what is your deadline?
    Clojure needs more craft. Python has a faster development cycle.

If you (the Team) master the tool (prog, language and its ecosystem) almost anything can be accomplished. I recommend always investigating the problem first.
Both languages are worth having in your tool box.

5 Likes

Your last point is curious to me; in what sense is the Python development cycle faster, given the REPL-driven development possible in Clojure?

1 Like

Both languages are (or can be) REPL driven. Python was also inspired on Lisp and always had a REPL. Recently I had a change in my Team composition, and I had to port a Clojure API to a Python API just because there would be no time to train the team in functional programming and Clojure.
There was nothing wrong with the Clojure implementation, it was just about the poeple how had to maintain it.

I can give you a quick metaphor, comparing languages to weapons.
Clojure is like nunchucks (think Bruce Lee), elegant, powerful, versatile, but it takes longer to master, and while learning you can get hurt easily because it is not so easy in the beginning.

Python is like a rod or a bat, even a kid knows how to swing it, it is much friendlier in the beginning, if you never master it, you will probably be able to defend yourself with it with various degrees of elegance.

Both can be equally lethal and elegant in master’s hands.

3 Likes

Maybe you were concerned with more detailed aspects of what do I mean by faster.
Given that both languages have REPLs, interactivity is not an advantage when we compare Clojure against Python, so we need to look elsewhere.

I think Python (at least for the time being) provides a faster development cycle because:

  • it has a more mature library ecosystem (you can find things in Java for Clojure, when there is no native Clojure counter part, but Python beats Clojure just because of longevity and community size).
    Being a Lisp, Clojure should have the upper hand, but Lisp community is scattered, and Clojure is ~16 years younger than Python.

(Concrete example: try to find a native Clojure natural sorting library, this is my first Google result
https://medium.com/@wilkerlucio/natural-sorting-in-clojure-script-123749bf3ba)

  • Better debugging (think stack traces) and tooling (I love Emacs and Cursive) for Python.
    This is not just about longevity, but the glue to the JVM, the underlying multithreading and macros.
    Do not help making debugging easier. At this, the very thing that makes Python “slower” (the GIL - Global Interpreter Lock that prevents leveraging multi-threading to the performance level of JVM), is the thing that makes it super easy to debug (in comparison to Clojure). And for true horizontal scalability, multi-processing support in Python defeats the limits on threading.

  • Having “no macros” is also a huge benefit when it comes to debugging.

I could go deeper, but I think these arguments should suffice.

At the master level, the shortcomings cease to be blockers, because we avoid the pitfalls before they are created. And Clojure has advantages in terms of code composition, that can leverage productivity.

This is just my very humble opinion. Perhaps it just reflects that I need to learn more Clojure.

2 Likes

PREFACE: I just want to preface this by saying, ultimately, there is no right answer. This is an opinion piece, an opinion with which I might change my mind on tomorrow. At the end of the day, the choice of language is based on whatever works for you, your team, and the business.

I think you are making the: “Appeal to the lowest common denominator argument”. One that managers love.

You’re right off course. Short term, Python will allow you to hire cheaper talent quicker. But Python will never be your last language, unless your projects themselves are trivial. I think you will also eventually lose out good talent. The great coders won’t stick to Python very long, before yearning something more. Either they’ll wander off to typed languages, explore lower level raw languages, venture into Actor and other distributed models, or enamore themselves in the meta universe of Lisps, Prologs, etc. Or you need to give them really interesting projects.

Obviously, we’re making a people argument. So its not about the language, but about the people who uses the language.

So, assuming you only hire junior beginner programmers, you think Python will be most appropriate, and yield quick onboardings, and good yield. And I say that’s probably true in the short term.

What if you hired experienced senior devs? Or even intermediate devs? Maybe they already know 2 or 3 languages, or already moved beyond Python, they know C#, Java, Scala, C++, etc.

Or what when your junior devs have now been working on the team for 2 years?

I don’t have any data on this, and I’m not fully sure I’ve made up my mind. But I’ll make a proposition that choosing the path of least resistance will strangle a team and projects long term.

If you use a more interesting language, with more nuances, more power, more expressivity, which allows devs creativity and continuous learning opportunities. I think you will attract better talent, quickly find who in your team isn’t as good (and be able to find someone better to take their place), while also unlocking the full potential of your devs. And you’ll much more quickly end up in a dream team scenario, and the kind of mythical 10x teams that is so sought after.

If your projects don’t need such a team, and is more focused on quantity of output, this might not apply as much. In that case, you’ll probably be on a constant rotation from junior dev to junior dev. Which makes sense in such a scenario. Not everything needs seniority, or exceptional talent, or even people with full on CS degrees. If you’re putting Django websites together on contract, for various clients which run small mom and pop stores for example, that’s way different.

I’m walking a treacherous territory here I know. Don’t get me wrong, I’m not making any moral judgement of worth. I highly respect all individuals and their accomplishments and discipline. I wouldn’t be able to thrive and be successful making Django websites. It requires a different set of skills and motivations which I don’t have. And I admire people who are successful at it.

You might think I should get off my high horse (I should and I will in an instant :stuck_out_tongue_winking_eye: ). What I’m trying to say is: “Stop treating beginners, junior devs, and all devs in general like they can’t learn hard things and learn to ride a horse, even a really high one :yum:”. Instead, you should offer them a safe environment with the appropriate mentoring and support they need to be succesful, while continuously pushing and challenging them, and making sure they know it is okay to stumble and fail sometimes.

Also, you get better by doing hard things. If the argument is: “Use Python because its easy, quick and painless”. You should ask yourself: “How am I going to get better at computer science and software engineering?” “How am I going to maximize my teams growth potential quickly? If I make the decision not to challenge them?”

They say Clojure makes you a better programmer. Never heard people say that about Python. Wouldn’t you want to use a language that makes you a better programmer? Makes your teammates better programmers? Yes it’s harder, but nobody gets stronger by lifting small weights.

I’ll finish with some quote I don’t remember the source of and will butcher a bit: “Do you want to have 10 years of experience, or 10 years of experiencing the same thing?”

P.S.: There’s hyperbole a bit in my post, because my point is kind of subtle. I love Python, still use it sometimes for small scripts, or as a plugin language when embedded. And it can be a great stepping stone, especially to people new to programming.

P.S.2: I don’t think the Python interpreter compares to the Clojure REPL in any way. But that’s another debate I’ll keep for another time.

12 Likes

Your point resonates highly with me, and reminds me of something I saw on Twitter:

A CEO and a CTO talk. The CEO says:

  • What if we train our devs and they leave?
  • What if we don’t train them and they stay?
6 Likes

I would love to explore that topic more deeply. I think there is opportunity for us to learn a lot about both languages.

1 Like