is the “#” here still dispatch macro, or just a sugar?
If you are the clojure reader, and you have to read a string that begins with #, then you look at the next character to determine what to do. If it matches any known dispatch rule, then that determines how to read the rest of the input.
We can look at the clojure reader implementation in java, or even easier - we can look at the reader that is implemented in clojure under tools.reader.
If we start from the higher entry point, read*, we can see the basic reader syntax rules play out. We loop through the unread character stream (in this case denoted by the arg reader
, which is probably named so because it is implemented on top of a java.io.Reader) looking for
- whitespace,
- nil (possibily and EOF error)
- an expected return character set by an earlier read
- number literals
- dispatch characters for reader macros.
We see for step 5, there is a simple function that matches a character to a reader macro:
macros
(defn- macros [ch]
(case ch
\" read-string*
\: read-keyword
\; read-comment
\' (wrapping-reader 'quote)
\@ (wrapping-reader 'clojure.core/deref)
\^ read-meta
\` read-syntax-quote ;;(wrapping-reader 'syntax-quote)
\~ read-unquote
\( read-list
\) read-unmatched-delimiter
\[ read-vector
\] read-unmatched-delimiter
\{ read-map
\} read-unmatched-delimiter
\\ read-char*
\% read-arg
\# read-dispatch
nil))
In this case a reader macro is just a function that operates on the character stream prior to evaluation (e.g. it is applied at “read time”).
We see the existing rules that the clojure.org docs mentioned, e.g. "
should use read-string
, :
uses read-keyword
, (
is read-list
etc. These are the basic rules of parsing clojure forms. The rule that matches # is a function called read-dispatch
. If we look at its definition we see it gets passed the current stream that’s being read from, looks at the next character, and uses that to determine what to do (e.g. dispatch to another reader macro). This is like a compound rule, since we are effectively determining how to parse based on both #
and the next character.
So the read-dispatch
looks to see if there is match using the dispatch-macros function. We see a similar function (like the earlier macros
function) that looks up another reader macro based on a character (in this case, it is the character following #
).
(defn- dispatch-macros [ch]
(case ch
\^ read-meta ;deprecated
\' (wrapping-reader 'var)
\( read-fn
\= read-eval
\{ read-set
\< (throwing-reader "Unreadable form")
\" read-regex
\! read-comment
\_ read-discard
\? read-cond
\: read-namespaced-map
\# read-symbolic-value
nil))
There are similar characters from the baseline rules in the macros
lookup, except they are mapped to different things now. (
[or #(
] maps to read-fn
, {
[or #{
] maps to read-set
, etc. So the original #
indicated a possible dispatch macro, which may alter the behavior of the reader going forward if a rule exists for the next character. This allows a localized changing of how the reader works, and enables the common syntax sugar that we see e.g. for hash-sets, anonymous functions, and other dedicated literals (like the var syntax). We can see the specific case you asked about: :
[or #:
] now dispatches to the macro read-namespaced-map. This corresponds with the behavior I linked earlier, and that Sean demonstrated.
If no dispatch macro is found, then the reader defaults to read-tagged, which exposes us to the notion of potentially user-defined edn readers for tagged literals. That is beyond the scope of this already massive reply, but at least the path is there if you really want to follow it…
read-dispatch
The idea is the same in the java implementation:
dispatch macros
lookup any reader macro associated with current character
etc.
So…with all of that on the table…if you are the Clojure reader, and I ask you to read a stream of characters, then
- you will try to match the current character with a known rule (e.g. whitespace, nil, pre-set return character, number, or reader macro)
- If the current character corresponds to a reader macro, you will look up the reader macro (a function that operates on the character stream), and apply that function to the unread stream.
-
- For
#
that function will be read-dispatch
, which has another look up table based on the next character.
-
-
- If you find a function from
dispatch-macros
that matches the next character in the stream, then you will apply that function to the rest of the stream as a reader macro (a dispatched reader macro in this case, where #
indicated the need to dispatch).
So this would “loosely” flow like the following (I intentionally ignore the extra arguments, and represent the character stream as a string here; the reality is more involved but this conveys the casuality):
clojure.tools.reader=> (read-string "#:blah{:x 1 :y 2}") ;;->
(read "#:blah{:x 1 :y 2}" ...);;->
(read* "#:blah{:x 1 :y 2}" ...) ;;->
((macros \#) ":blah{:x 1 :y 2}") ;;->
(read-dispatch ":blah{:x 1 :y 2}" ...) ;;->
((dispatch-macros \:) "blah{:x 1 :y 2}") ;;->
(read-namespaced-map "blah{:x 1 :y 2}" ...)
#:blah{:x 1, :y 2}
If you used the tools.reader version of read-string
and traced the function calls you should see a similar trace.
In practice, we use the reader shipped with clojure (implemented in java), which is exposed via clojure.core/read
, clojure.core/read-string
, and (preferably) clojure.edn/read
, clojure.edn/read-string
. Why prefer clojure.edn/read? If you look back at the dispatch macros, one of them is
\= read-eval
This let’s the reader eval arbitrary code at read time (which has some use cases). We may not like that behavior if we are reading input from untrusted sources…
user=> (read-string "^{:blah #=(println :you-got-pwned)} [:hello]")
:you-got-pwned
[:hello]
;;clojure.edn uses a limited dispatch macro table and reads the input as inert data, so read-eval can't be invoked
user=> (clojure.edn/read-string "^{:blah #=(println :you-got-pwned)} [:hello]")
Execution error at user/eval153 (REPL:1).
No dispatch macro for: =