agreed, was talking about data research / modelling mostly
You are making some good points and I had to quote your entire post, because I will be following along now:
First, indeed the performance of highly-optimized matrix libraries is indeed excellent, and numpy provides a nice interface. That said, I fould that at least in the kinds of data analysis problems I encountered, nearly 80% of the effort was spent on pre-processing, cleaning up the data, extracting features, interactive analysis, and similar tasks. The actual computation took so little time (defined as a percentage of total client project wall clock time), that it really didn’t matter. That said, I did use direct interfaces to matrix libraries sometimes.
As to I/O: I agree it isn’t easy, although I find that these days a simple
(drop 1 (csv/read-csv (io/reader (BOMInputStream. (io/input-stream filename)) :encoding "UTF-8"))))) gets me very far. Also, it produces a lazy sequence, which means you don’t try to hold all data in memory.
It’s interesting that you mentioned receiving data asynchronously. I had a BIG advantage when I worked as a data science consultant/freelancer. When I wrote a solution, I would start with data files from the client (logs, data dumps, etc). I would then build a solution using transducers and core.async (
pipeline for quick&easy parallel processing). This would produce data for a visualization dashboard, written in ClojureScript and shipped as an Electron app.
So, what was the BIG advantage? Well, I shipped the app and told the client “now, if you’d like to give me a real-time feed of your data, in whatever form, the entire real-time data-processing pipeline is already there, so all you’re seeing could be turned into a production app/tool and updated in real-time”. That was a real eye-opener!
I think most discussions of “programming language X vs Y” are superficial, focusing mostly on syntax, “ease of starting”, available libraries. Not to mention the ridiculous timewaster discussions of “static vs dynamic typing” or “startup time”. The differences run deep and most advantages are small, but they make a huge difference when taken together, especially when you are on a schedule and a budget and need to ship real apps.
BTW, I mean no offense to the original poster, but — I am looking at the “comparison points” mentioned, and I think most of them are irrelevant from my point of view. I don’t care about “easiness” or “familiarity” (as Rich said: “instruments are made for people who can play them”), I don’t care that much about performance, I don’t do any generative testing. I care about building reliable and maintainable solutions on a schedule and a budget, and Clojure is just the right tool for that.
In my case, I use Clojure not because of any single advantage, but because of dozens of advantages. Transducers (very under-appreciated), core.async, tesser, lazy sequences, ClojureScript, EDN, spec, single data format for both server- and client-side, immutable data structures, sequence library — all of this plays together to create an environment which simply can’t be reproduced in Python.
I just parse
<wtv>SVs myself now, using line seq. I found that to be much more reliable for some reason.
(with-open [rdr (reader "my/file.csv")] (->> rdr (line-seq) (map #(split % #","))))
Where if the file has issues, I can just be smarter in the split and handle them however works for the given file. And after that I can also add my own field parsing if I need to convert anything from a string into something else.
Its pretty short and quick, and I feel it can easilly accommodate any kind of broken fileset I’d need.
I stopped trying to split the CSVs myself when I noticed that there are all kinds of corner cases — as an example, the code above will not handle quoted commas (inside quoted fields) correctly. I’d much rather use
clojure.data.csv which gets most things right and lets me set the separator and the quote character.
BOMInputStream (which comes from
org.apache.commons.io.input) is another thing learned from years of dealing with CSV in the wild. Some files will contain a BOM (byte-order mark) at the beginning and you’ll need to handle that.
Most OO systems are just simulations of real-world surface phenomena, and the whole system, like a mess, I think it is not good method of OO to simulate the real world, but to design it correctly with an abstract refined data model as a prototype. For example, the ggplot2 of the R language, the system is clear, with the perfect data model as the prototype. So a good OO system is more inclined to a data flow system, and I think Ggplot2 is more likely to be a data-driven plot system if OO was not in vogue at the time.
So I think Python’s OO approach will only encourage casual design, it is not suitable for the construction of large, serious, formal systems.
Few people adopt pure pipeline construction system, glad that you also like pipeline system, I am also, my pipeline method with "Pure function, pipeline, data flow, relational theory " as the pillar, the article is a bit long, you can see: The Pure Function Pipeline Data Flow
I’m not disagreeing about the remarks about Python. I want to point out OO was supposed to be a good way to model the world, but the world doesn’t only include properties that attach to individual things; it also includes relations that can depend on the things that are related. This means that in e.g. Java, people have had to do awkward things in which a method depends not just on the class its attached too, but also its arguments. (Some less popular object systems such as Common Lisp Object System don’t have this problem, and Clojure multimethods and protocols provide some similar functionality.)
I think it should be different people have different ways of thinking, so there is a preference for different methods.
The various design patterns for OO and FP are too complex and unnecessary for me, and I don’t want to see complex associations between objects, and I don’t want to use them.
Clojure Multimethods is just an elegant conditional branching statement, and I haven’t used a variety of OO features within the Clojure language, including protocols.
I tend to construct systems with the simplest concepts and the most basic techniques, syntax, and functions. Used to implement my mind, the line sequence structure of the pipeline is the simplest, each pipe pure function is independent, as long as the input and output in line with the series of data interface standards, you can connect to use, do not care about other things, If a bug occurs, like water pipe leakage is obvious, for me, this is the easiest way in the world, I just want to go straight from the starting point to the end. I Don’t want to use traditional OO and FP anymore.
There is a great poet Bai Juyi in China. even illiteracy understands and appreciates his poetry. I hope that my code can be understood by the junior programmer even in the most complicated system.
For me, its the opposite. I didn’t mean that my example was my new generic csv parser. I meant that, clojure.data.csv and others have always at one point failed me, because they all assume correct comma separated semantics. What when I have a file with two different type of escape characters? Or mix encodings? Or uneven columns? Or decimals that have currency characters in front? Etc. Sometimes there’s even missing newlines!
That’s where I found just doing it myself, I can very quickly and easily handle all corner cases specific to the file I’m parsing.
Clojure makes writing a custom parser super simple and quick.
Seriously, have you looked at data.csv’s implementation? https://github.com/clojure/data.csv/blob/master/src/main/clojure/clojure/data/csv.clj
It’s 146 line of code, half of which are comments, and half of that half is for writing csvs. So only 1/4 of that is for reading and parsing csv files.
How awesome is that!
The protocol is amazing too (if you‘re used to a 50+ nodes/5+ levels class/interface tree from Java or something):
(defprotocol Read-CSV-From (read-csv-from [input sep quote]))
As expected in a Clojure channel, I’ve noticed the balance skew towards Clojure.
I think the question is too broad. In order to properly answer: Clojure or Python we need to restrict the context a bit more.
Factors that are relevant:
what is the composition of your team? What is their background?
Python might be easier to teach. Clojure have better functional support.
what problem are you solving? How much does it depend on an ecosystem of libraries?
Python was designed for easy embedding (like Lua) or re-using C libraries.
Clojure plays nicer in the JVM realm. Both have great support on the CLR/.NET
Python is a de facto standard for data science, Clojure is taking its first steps.
Clojure is better at leveraging multi-threading, Python is much more light weight.
what is your deadline?
Clojure needs more craft. Python has a faster development cycle.
If you (the Team) master the tool (prog, language and its ecosystem) almost anything can be accomplished. I recommend always investigating the problem first.
Both languages are worth having in your tool box.
Your last point is curious to me; in what sense is the Python development cycle faster, given the REPL-driven development possible in Clojure?
Both languages are (or can be) REPL driven. Python was also inspired on Lisp and always had a REPL. Recently I had a change in my Team composition, and I had to port a Clojure API to a Python API just because there would be no time to train the team in functional programming and Clojure.
There was nothing wrong with the Clojure implementation, it was just about the poeple how had to maintain it.
I can give you a quick metaphor, comparing languages to weapons.
Clojure is like nunchucks (think Bruce Lee), elegant, powerful, versatile, but it takes longer to master, and while learning you can get hurt easily because it is not so easy in the beginning.
Python is like a rod or a bat, even a kid knows how to swing it, it is much friendlier in the beginning, if you never master it, you will probably be able to defend yourself with it with various degrees of elegance.
Both can be equally lethal and elegant in master’s hands.
Maybe you were concerned with more detailed aspects of what do I mean by faster.
Given that both languages have REPLs, interactivity is not an advantage when we compare Clojure against Python, so we need to look elsewhere.
I think Python (at least for the time being) provides a faster development cycle because:
- it has a more mature library ecosystem (you can find things in Java for Clojure, when there is no native Clojure counter part, but Python beats Clojure just because of longevity and community size).
Being a Lisp, Clojure should have the upper hand, but Lisp community is scattered, and Clojure is ~16 years younger than Python.
(Concrete example: try to find a native Clojure natural sorting library, this is my first Google result
Better debugging (think stack traces) and tooling (I love Emacs and Cursive) for Python.
This is not just about longevity, but the glue to the JVM, the underlying multithreading and macros.
Do not help making debugging easier. At this, the very thing that makes Python “slower” (the GIL - Global Interpreter Lock that prevents leveraging multi-threading to the performance level of JVM), is the thing that makes it super easy to debug (in comparison to Clojure). And for true horizontal scalability, multi-processing support in Python defeats the limits on threading.
Having “no macros” is also a huge benefit when it comes to debugging.
I could go deeper, but I think these arguments should suffice.
At the master level, the shortcomings cease to be blockers, because we avoid the pitfalls before they are created. And Clojure has advantages in terms of code composition, that can leverage productivity.
This is just my very humble opinion. Perhaps it just reflects that I need to learn more Clojure.
PREFACE: I just want to preface this by saying, ultimately, there is no right answer. This is an opinion piece, an opinion with which I might change my mind on tomorrow. At the end of the day, the choice of language is based on whatever works for you, your team, and the business.
I think you are making the: “Appeal to the lowest common denominator argument”. One that managers love.
You’re right off course. Short term, Python will allow you to hire cheaper talent quicker. But Python will never be your last language, unless your projects themselves are trivial. I think you will also eventually lose out good talent. The great coders won’t stick to Python very long, before yearning something more. Either they’ll wander off to typed languages, explore lower level raw languages, venture into Actor and other distributed models, or enamore themselves in the meta universe of Lisps, Prologs, etc. Or you need to give them really interesting projects.
Obviously, we’re making a people argument. So its not about the language, but about the people who uses the language.
So, assuming you only hire junior beginner programmers, you think Python will be most appropriate, and yield quick onboardings, and good yield. And I say that’s probably true in the short term.
What if you hired experienced senior devs? Or even intermediate devs? Maybe they already know 2 or 3 languages, or already moved beyond Python, they know C#, Java, Scala, C++, etc.
Or what when your junior devs have now been working on the team for 2 years?
I don’t have any data on this, and I’m not fully sure I’ve made up my mind. But I’ll make a proposition that choosing the path of least resistance will strangle a team and projects long term.
If you use a more interesting language, with more nuances, more power, more expressivity, which allows devs creativity and continuous learning opportunities. I think you will attract better talent, quickly find who in your team isn’t as good (and be able to find someone better to take their place), while also unlocking the full potential of your devs. And you’ll much more quickly end up in a dream team scenario, and the kind of mythical 10x teams that is so sought after.
If your projects don’t need such a team, and is more focused on quantity of output, this might not apply as much. In that case, you’ll probably be on a constant rotation from junior dev to junior dev. Which makes sense in such a scenario. Not everything needs seniority, or exceptional talent, or even people with full on CS degrees. If you’re putting Django websites together on contract, for various clients which run small mom and pop stores for example, that’s way different.
I’m walking a treacherous territory here I know. Don’t get me wrong, I’m not making any moral judgement of worth. I highly respect all individuals and their accomplishments and discipline. I wouldn’t be able to thrive and be successful making Django websites. It requires a different set of skills and motivations which I don’t have. And I admire people who are successful at it.
You might think I should get off my high horse (I should and I will in an instant ). What I’m trying to say is: “Stop treating beginners, junior devs, and all devs in general like they can’t learn hard things and learn to ride a horse, even a really high one ”. Instead, you should offer them a safe environment with the appropriate mentoring and support they need to be succesful, while continuously pushing and challenging them, and making sure they know it is okay to stumble and fail sometimes.
Also, you get better by doing hard things. If the argument is: “Use Python because its easy, quick and painless”. You should ask yourself: “How am I going to get better at computer science and software engineering?” “How am I going to maximize my teams growth potential quickly? If I make the decision not to challenge them?”
They say Clojure makes you a better programmer. Never heard people say that about Python. Wouldn’t you want to use a language that makes you a better programmer? Makes your teammates better programmers? Yes it’s harder, but nobody gets stronger by lifting small weights.
I’ll finish with some quote I don’t remember the source of and will butcher a bit: “Do you want to have 10 years of experience, or 10 years of experiencing the same thing?”
P.S.: There’s hyperbole a bit in my post, because my point is kind of subtle. I love Python, still use it sometimes for small scripts, or as a plugin language when embedded. And it can be a great stepping stone, especially to people new to programming.
P.S.2: I don’t think the Python interpreter compares to the Clojure REPL in any way. But that’s another debate I’ll keep for another time.
Your point resonates highly with me, and reminds me of something I saw on Twitter:
A CEO and a CTO talk. The CEO says:
- What if we train our devs and they leave?
- What if we don’t train them and they stay?
I would love to explore that topic more deeply. I think there is opportunity for us to learn a lot about both languages.
For me, programming is the process of designing a data model that is simple and fluent in manipulation. More than 80% functions of my project is
->>threading macro code block, each step is simple, verifiable, replaceable, testable, pluggable, extensible, and easy to implement multithreading. The Clojure threading macro provides language-level support for PurefunctionPipeline&Dataflow.
I completely agree! I feel like our job as engineers and developers is not to make complex tools to tackle complex problems but make complex problems more simple. If you can break down a problem into pure function pipelines, it’s a lot easier to debug, model in your head, discuss, and test as you mentioned in your article.
Your counter-point about OO also makes a lot of sense, I don’t think OOP is bad. I know it has value in certain contexts and when leveraged well it can be quite nice to work with. Ggplot2 seems like a great abstraction that doesn’t invent new vocabulary or API shapes and allows a lot of intuitive composability. It seems to follow a lot of functional programming conventions, providing light wrappers around common data types rather than anticipating users creating subclasses of these data types and reading a book’s worth of docs on all the different APIs. As you suggest though, this level of quality is more difficult to achieve in OO, where as it’s the path of least resistance in a functional programming focused language like Clojure.
I feel one point that’s getting overlooked in this discussion is the turnover rate in today’s job market. Developers don’t stay long in one place, for one reason or another. Using a language like Python ensures the next wave coming in will be able to get up to speed and contribute quickly.
The bottom line is companies have to worry about the bottom line. They’re not in business to crank out pretty code. They’re in business to solve customer problems and deliver a product that works. Nobody really cares what’s under the covers.
I believe that’s one of the reasons modern languages don’t include macros. Every article I’ve read explaining how great Lisp macros are always end with the same reveal at the end. “Aha! See! You can make Lisp look like whatever language you want!” However, that’s precisely the problem with it. Java code always looks like Java code. Python always looks like Python. The next wave of programmers coming in will know what they’re looking at and hit the ground running.
If you assume developers make, on average, around $6k per month and you have a team of 20 developers, you’re talking about $120k per month spent on salaries. They need to see a return on that investment in the form of products that customers will buy. It’s a hard sell to convince them to spend that so people can learn a new language that isn’t guaranteed to result in higher sales.
I’m enjoying Clojure, but it’s a hard sell. Especially when the counter is, “OK, so what happens when the current people leave and we need to replace them? Spend another $120k teaching a new batch of programmers a new language?”
The way to get Clojure used more is to do non-essential projects in your spare time. Write little tools that help solve some internal company problems. Grow it from there. It’s like the old adage, “it’s easier to ask for forgiveness than permission.” They’re more likely to let you use Clojure if you already have a useful tool written in it.
I admit, I’m probably way behind on the latest Python interpreters. Some of the things I felt put Python behind was:
- Language is not expression based, which I found make things more clunky at the Interpreter, and force more code into a multi-line structure.
- Dependencies are not isolated. You can use virtual-env, but then it requires a folder structure to first be created.
- It doesn’t (didn’t ?) have network based interpreters and direct editor integration with them.
- The whitespace matters syntax can be annoying at the interpreter, again, managing multi-lines gets tedious in those cases.
- Reliance on mutability couples a lot of things to state, which gets trickier when doing heavy reloading and trying to maintain a working program state in your session.
- Simple things, like having to put commas between elements in list and tuple literals, and colons in dictionaries. Oh, and between arguments to functions and in the parameters list.
- Modules are tied to files. So in the interpreter, you can’t create modules, switch between them, etc.
- Modules can’t be reloaded within the same session. After you imported it once, if you change something in the module, you won’t be able to see the changes reflected.
- The whole declaring if your variable is global or local is confusing in an interactive session.
- Methods cannot be re-defined, in that, the whole class has to be instead, and existing instances won’t get the new behavior.
- Not everything is inspectable, for example:
help(+)doesn’t work. And you can’t get the source of a function. Well, you can if it is defined in a file using the inspect module, but not for things defined in the session.
I think this is overblown personally. You can read my blog post here for details of my experience regarding this: https://www.rubberducking.com/2018/03/my-observations-from-evangelizing.html
Basically, my experience is that, learning the domain, the architecture, the tools, the libraries and the existing code base far outweighs learning the language. But, you need to offer good support of the language, so, without an existing expert available to do that, I can’t say if the result would be the same.
Also, the reason I stick to my team is partly because of Clojure. My team is an outlier using Clojure, and we are also an outlier in how long people stay, with a very low churn. Maybe that’s also because of Clojure, but I can’t speak for others, so I’m just speculating. It also has been a great recruiting tool.
So, I’m with you, I think companies and their managers share that opinion. Or at least, it is a source of perceived risk. And its possible it is justified. Not all niche language would necessarily end up not being a major overhead, or help to retain and attract talent.
But I think sometimes, it can actually be a differentiator to attract top talent, which might end up staying way longer because of their passion for the language. Can’t say for sure, so I don’t blame anyone who thinks its not worth the risk. For my team, it payed off though, that’s all I can say, small anecdotal sample.