Why Clojure over Python?

jwr · March 8, 2019, 5:40pm

alex314159:

As someone who would like to use Clojure more often, it pains me to say it’s often hard to make the case.

In particular Python’s performance for most use cases is very good - if you use numpy you’re effectively doing computations in C. You can parallelize a lot of problems simply with the multiprocessing module. Worse, sometimes a naive Clojure implementation can be slower than pure Python.

Also compared to Python - the I/O part is still painful. Wes McKinney, the creator of pandas, himself said that a large part of the package’s popularity is the ease of use of the read_csv / to_csv functions. I’m sure that’s a barrier to entry in Clojure for many.

I think Clojure’s advantage is in live environments where you receive a lot of data asynchronously (tracking a production line, trading financial markets) - the multithreading abilities and the safety of immutable data structures then shines.

You are making some good points and I had to quote your entire post, because I will be following along now:

First, indeed the performance of highly-optimized matrix libraries is indeed excellent, and numpy provides a nice interface. That said, I fould that at least in the kinds of data analysis problems I encountered, nearly 80% of the effort was spent on pre-processing, cleaning up the data, extracting features, interactive analysis, and similar tasks. The actual computation took so little time (defined as a percentage of total client project wall clock time), that it really didn’t matter. That said, I did use direct interfaces to matrix libraries sometimes.

As to I/O: I agree it isn’t easy, although I find that these days a simple (drop 1 (csv/read-csv (io/reader (BOMInputStream. (io/input-stream filename)) :encoding "UTF-8"))))) gets me very far. Also, it produces a lazy sequence, which means you don’t try to hold all data in memory.

It’s interesting that you mentioned receiving data asynchronously. I had a BIG advantage when I worked as a data science consultant/freelancer. When I wrote a solution, I would start with data files from the client (logs, data dumps, etc). I would then build a solution using transducers and core.async (pipeline for quick&easy parallel processing). This would produce data for a visualization dashboard, written in ClojureScript and shipped as an Electron app.

So, what was the BIG advantage? Well, I shipped the app and told the client “now, if you’d like to give me a real-time feed of your data, in whatever form, the entire real-time data-processing pipeline is already there, so all you’re seeing could be turned into a production app/tool and updated in real-time”. That was a real eye-opener!

I think most discussions of “programming language X vs Y” are superficial, focusing mostly on syntax, “ease of starting”, available libraries. Not to mention the ridiculous timewaster discussions of “static vs dynamic typing” or “startup time”. The differences run deep and most advantages are small, but they make a huge difference when taken together, especially when you are on a schedule and a budget and need to ship real apps.

BTW, I mean no offense to the original poster, but — I am looking at the “comparison points” mentioned, and I think most of them are irrelevant from my point of view. I don’t care about “easiness” or “familiarity” (as Rich said: “instruments are made for people who can play them”), I don’t care that much about performance, I don’t do any generative testing. I care about building reliable and maintainable solutions on a schedule and a budget, and Clojure is just the right tool for that.

In my case, I use Clojure not because of any single advantage, but because of dozens of advantages. Transducers (very under-appreciated), core.async, tesser, lazy sequences, ClojureScript, EDN, spec, single data format for both server- and client-side, immutable data structures, sequence library — all of this plays together to create an environment which simply can’t be reproduced in Python.