You are making some good points and I had to quote your entire post, because I will be following along now:
First, indeed the performance of highly-optimized matrix libraries is indeed excellent, and numpy provides a nice interface. That said, I fould that at least in the kinds of data analysis problems I encountered, nearly 80% of the effort was spent on pre-processing, cleaning up the data, extracting features, interactive analysis, and similar tasks. The actual computation took so little time (defined as a percentage of total client project wall clock time), that it really didn’t matter. That said, I did use direct interfaces to matrix libraries sometimes.
As to I/O: I agree it isn’t easy, although I find that these days a simple (drop 1 (csv/read-csv (io/reader (BOMInputStream. (io/input-stream filename)) :encoding "UTF-8")))))
gets me very far. Also, it produces a lazy sequence, which means you don’t try to hold all data in memory.
It’s interesting that you mentioned receiving data asynchronously. I had a BIG advantage when I worked as a data science consultant/freelancer. When I wrote a solution, I would start with data files from the client (logs, data dumps, etc). I would then build a solution using transducers and core.async (pipeline
for quick&easy parallel processing). This would produce data for a visualization dashboard, written in ClojureScript and shipped as an Electron app.
So, what was the BIG advantage? Well, I shipped the app and told the client “now, if you’d like to give me a real-time feed of your data, in whatever form, the entire real-time data-processing pipeline is already there, so all you’re seeing could be turned into a production app/tool and updated in real-time”. That was a real eye-opener!
I think most discussions of “programming language X vs Y” are superficial, focusing mostly on syntax, “ease of starting”, available libraries. Not to mention the ridiculous timewaster discussions of “static vs dynamic typing” or “startup time”. The differences run deep and most advantages are small, but they make a huge difference when taken together, especially when you are on a schedule and a budget and need to ship real apps.
BTW, I mean no offense to the original poster, but — I am looking at the “comparison points” mentioned, and I think most of them are irrelevant from my point of view. I don’t care about “easiness” or “familiarity” (as Rich said: “instruments are made for people who can play them”), I don’t care that much about performance, I don’t do any generative testing. I care about building reliable and maintainable solutions on a schedule and a budget, and Clojure is just the right tool for that.
In my case, I use Clojure not because of any single advantage, but because of dozens of advantages. Transducers (very under-appreciated), core.async, tesser, lazy sequences, ClojureScript, EDN, spec, single data format for both server- and client-side, immutable data structures, sequence library — all of this plays together to create an environment which simply can’t be reproduced in Python.