Hi all,
Another thread brought up the topic of what is the concept behind Clojure’s map
function and what are Clojure’s fundamental primitives? So I thought I’d offer some perspective.
By the way, some of this is intermediate to advanced knowledge, so expect that if you’re just starting out, most of it will go over your head for now. And I recommend reading this, but not worrying if you don’t get it. Go work on some 4clojure.org problems, implement small Clojure programs, and then come re-read this again after, and it might start to make more sense.
First of all, to truly understand Clojure’s map
function, and its primitive, you need to go read about the Untyped Lambda Calculus. Unlike Java, C and Python, which are modeled over the Turing Model of computation. Clojure is mostly modeled over the Lambda Calculus as its computational foundation. Basic understanding of it is thus required to understand some of the Clojure primitives.
What makes it confusing is that, there is no machine for the Lambda Calculus. So it is easy to confuse the abstract computational model of functional programming (FP), for the translation layer to encode it over a Turing machine.
Most algorithm and data-structures you learn are designed for practical computing, which can run over existing hardware, and the complexity is measured in term of the hardware itself. These will be imperative in nature, because the machines are imperative, and the Turing model is imperative.
All problems can also be computed with the Lambda Calculus (there’s a proof for it). The algorithms will be different, because the entire model is different. map
being named map
is actually very appropriate, even though most people think it is confusing. I’m going to get to that.
In Lambda Calculus, there are no loops, only term re-writing. Evaluation is the process of recursively rewriting terms in a giant nesting of anonymous functions. This process is called a reduction. There are no variables, no memory, no jumps, no instructions, there arn’t even any numbers or operators like addition and multiplication. In that sense, everything is immutable. And this is why in Clojure variables don’t get assigned, but bound. A let
binds a value to a term, where the term will get re-written on evaluation for its value.
In that way, map
is thus a function which defines a mapping between two sets of values, where the rule of the relationship is the mapping function. It is not a loop, and no mutation takes place (in the abstract, on real hardware, it could be translated into an imperative model, which will loop and make use of mutation). Map simply re-writes terms of terms of terms recursively, and in a mind blowing manner, this reduces to the same thing that a for-loop over the data would, without ever needing a single variable or jump.
You can think of map
in terms of relational algebra, where it defines a one to one relation between inputs and outputs. Where as reduce
defines a one to many or many to one relation. So it is often said to be declarative in that way. When you use map
, you are saying: there exist a mapping
between inputs
of this domain to outputs of that domain, where for each input and output pair, mapping function
is the relationship rule. For example: (map inc [1 2 3])
. There exist a mapping between 1,2,3 and 2,3,4, where for each of 1,2 and 3; 2,3 and 4 are their increment. This relational view is how Pig, Hive, and the likes implement SQL over Map-Reduce and can scale to such high loads.
Yet, map
is not a real primitive. The real primitives are anonymous functions and eval. As I said, map
is simply defined as a big composition of the mapping function
over a recursion
over terms being re-written
, which is in itself a reduction
, and why map
can be implemented using reduce
.
Now, the Untyped Lambda Calculus says that with just eval and anonymous functions, you can implement all possible Turing complete programs. Which is kind of amazing and beautiful. It is also too clever, and operates at a level which is unnatural to people. Imagine if the dictionary only had two words, and to express any complex idea just required you to combine these two words in longer and longer combinations. It would be tedious, and inefficient.
People’s brain work at much higher levels of abstraction, and we prefer having many more words, then having fewer words but longer sentences. There’s probably a balance to be had here. We also wouldn’t want to have to memorize one word for all possible ideas. Thus English has in practice around 10 000 commonly used words, out of a total of ~171000. Similarly, Clojure adds many more words, which form common ideas, to reach a good balance between too many words and not enough. And just like the dictionary defines words using a longer combination of other words, the definition there of, Clojure defines its words in terms of combination of other words. Where as, in the start, you really just need eval and anonymous functions (lambda) as the words you have to magically know and understand to start defining all others.
Now, as I said before, there are no machine (as of yet) that can run eval and lambda directly on hardware. So for performance, Clojure words are often implemented using Java’s words, which are themselves defined in Turing Model words, not Lambda Calculus. But the Clojure words are inspired from the ones that make most sense to add to the Lambda Calculus, and not the ones that do for Turing Models. This is the fundamental difference between FP and imperative. They take their inspirations for their vocabulary from different roots, and end up being vastly different languages in that way. True as well of natural languages.
This idea of languages that have too few vocabulary and abstractions, yet are Turing complete is known as the Turing tarpit. Lambda calculus is a Turing tarpit the same way that machine code is one for Turing model languages. Even though once you understand them, you could implement any logic you want with just that, it won’t be practical.
So just like Python does not expose the true primitives of the Turing model, but some slightly higher level ones, Clojure does as well. Lets get into them.
For Lisps, normally the primitives are going to be:
- first
- rest
- cons
- read
- apply
- eval
- fn
- def
- cond
- quote
- let
In Clojure, because the JVM lacks tail call optimization, I would also add:
- loop/recur
It can be a fun exercise (if you are masochist ) to try and implement things with just these. Recursion will be your best friend here.
Now, in proper Clojure, you can actually easily see Clojure’s true primitives here: https://clojure.org/reference/special_forms. Special forms are those things that are not implemented in term of other Clojure functions/macros, thus they are the true primitives. Everything that is not a special form is implemented in Clojure itself. Only special forms are implemented in Java. Now, there’s a slight cheat here which Clojure (as opposed to other Lisps) takes. One of the special form allows interop with Java code. That makes a lot of other things simpler to implement, because you can just leverage all of Java already for them. It also minimizes the translation required from Clojure to JVM byte-code for compilation. Making compilation quicker and the resulting code possibly faster. Another aspect of this is that, Clojure doesn’t give you its own words for things which Java already has almost the same ones, instead it tells you to just use interop to access them.
So again, in theory, you can implement every other function with just these special forms. In practice, you’ll want many more of the most common functions to already be implemented using the above. That way, you can start with a much larger vocabulary, enabling higher level of abstractions.
The most common ones you’ll want to learn are probably:
- All the sequence ones: map, reduce, for, filter, remove, take, keep, keep-indexed, concat, mapcat, flatten, etc.
- All the data-structure ones: +, -, *, inc, dec, min, max, =, >, <, str, join, split, count, seq, vector, get, assoc, dissoc, conj, etc.
- And the ones for atoms: reset!, swap!, @, deref, etc.
For a more complete set, I really really recommend going through the official Clojure Reference. In my opinion, it covers just the right amount of the most common functions/macros which together form the vocabulary that 95% of Clojure code uses.
Just like English tends to use the same common 10 000, even though it has ~171000, Clojure tends to use the same common ones found in the Reference. For the other ones, just like for English, you look them up only when you need them
You’ll find that I made a lot of similarity with natural languages, and I believe Clojure has a lot of similarity to them. Learning Clojure is actually more similar to learning a natural language than other programming language in my opinion. This is because it has a simple grammar of words that compose together in sentences and paragraphs to create more complicated ideas:
(<function or macro> arg1
(<nested func or macro> argA
argB
argM
...)
argN
...)
For functions, arguments are evaluated from left to right first, and then their result is passed to the outer function. They can be nested as an argument to other functions, and will evaluate bottom up. That is, the innermost function will evaluate first, and the outermost function will evaluate last. Functions always return a result. When evaluated, they are replaced by their result.
For macros, arguments are not evaluated, they are just passed to the macro directly, only the return of the macro is evaluated. They can be nested as arguments to other macros, and will evaluate top down. That is, the outermost macro will evaluate first, and the innermost macro will evaluate last. Macros always return a result. When evaluated, they are replaced by their result, but their result will further get evaluated. Macros are like language exceptions. Just like all natural languages have exceptions to their rules, well that is macros for ya. They can choose to evaluate their arguments in any order, if at all.
I hope this can help demystify some more parts of Clojure for yall.
Have fun everyone!