Why "(type #:examples.e15-task-tracker{:a 1 :b 2})" is a map?

ninesolaries · August 25, 2023, 3:48am

please tell me how to understand “#:examples.e15-task-tracker{:a 1 :b 2}”？

what is the function of symbol “#” here?

I know that “#{” is the prefix of set
(type #{:a 1 :b 2})
=> clojure.lang.PersistentHashSet

but what’s meaning of with “:” and namespace?
(type #:examples.e15-task-tracker{:a 1 :b 2})
=> clojure.lang.PersistentArrayMap

Thanks

joinr · August 25, 2023, 4:05am

“#” is actually significant to the clojure reader and indicates special dispatch rules. There are several, including anonymous functions, hash sets, an tagged literals (user defined reader extensions; somewhat a simplistic reader macro with much less flexibility).

reader dispatch

This ties in with a specific reader literal for maps that have namespace-qualified keys. It provides a shorthand for writing (and reading) them.

specifically map namespace syntax

ninesolaries · August 25, 2023, 4:19am

thank you. I understanded by reference 2(specially map namespace syntax).
but I still can’t understand:
The string of this topic is not a Set\Regex patterns\Var-quote\Anonymous function\Ignore next form which mentioned in first reference link(reader dispatch). is the “#” here still dispatch macro, or just a sugar? Or is it the one of them but I did not match yet?

seancorfield · August 25, 2023, 5:30am

@joinr linked you to the map namespace syntax and that’s what #: is.

#:examples.e15-task-tracker{:a 1 :b 2}
;; is the same as:
{:examples.e15-task-tracker/a 1 :examples.e15-task-tracker/b 2}

joinr · August 25, 2023, 11:58am

is the “#” here still dispatch macro, or just a sugar?

If you are the clojure reader, and you have to read a string that begins with #, then you look at the next character to determine what to do. If it matches any known dispatch rule, then that determines how to read the rest of the input.

We can look at the clojure reader implementation in java, or even easier - we can look at the reader that is implemented in clojure under tools.reader.

If we start from the higher entry point, read*, we can see the basic reader syntax rules play out. We loop through the unread character stream (in this case denoted by the arg reader, which is probably named so because it is implemented on top of a java.io.Reader) looking for

whitespace,
nil (possibily and EOF error)
an expected return character set by an earlier read
number literals
dispatch characters for reader macros.

We see for step 5, there is a simple function that matches a character to a reader macro:

macros

(defn- macros [ch]
  (case ch
    \" read-string*
    \: read-keyword
    \; read-comment
    \' (wrapping-reader 'quote)
    \@ (wrapping-reader 'clojure.core/deref)
    \^ read-meta
    \` read-syntax-quote ;;(wrapping-reader 'syntax-quote)
    \~ read-unquote
    \( read-list
    \) read-unmatched-delimiter
    \[ read-vector
    \] read-unmatched-delimiter
    \{ read-map
    \} read-unmatched-delimiter
    \\ read-char*
    \% read-arg
    \# read-dispatch
    nil))

In this case a reader macro is just a function that operates on the character stream prior to evaluation (e.g. it is applied at “read time”).

We see the existing rules that the clojure.org docs mentioned, e.g. " should use read-string, : uses read-keyword, ( is read-list etc. These are the basic rules of parsing clojure forms. The rule that matches # is a function called read-dispatch. If we look at its definition we see it gets passed the current stream that’s being read from, looks at the next character, and uses that to determine what to do (e.g. dispatch to another reader macro). This is like a compound rule, since we are effectively determining how to parse based on both # and the next character.

So the read-dispatch looks to see if there is match using the dispatch-macros function. We see a similar function (like the earlier macros function) that looks up another reader macro based on a character (in this case, it is the character following #).

(defn- dispatch-macros [ch]
  (case ch
    \^ read-meta                ;deprecated
    \' (wrapping-reader 'var)
    \( read-fn
    \= read-eval
    \{ read-set
    \< (throwing-reader "Unreadable form")
    \" read-regex
    \! read-comment
    \_ read-discard
    \? read-cond
    \: read-namespaced-map
    \# read-symbolic-value
    nil))

There are similar characters from the baseline rules in the macros lookup, except they are mapped to different things now. ( [or #(] maps to read-fn, { [or #{] maps to read-set, etc. So the original # indicated a possible dispatch macro, which may alter the behavior of the reader going forward if a rule exists for the next character. This allows a localized changing of how the reader works, and enables the common syntax sugar that we see e.g. for hash-sets, anonymous functions, and other dedicated literals (like the var syntax). We can see the specific case you asked about: : [or #:] now dispatches to the macro read-namespaced-map. This corresponds with the behavior I linked earlier, and that Sean demonstrated.

If no dispatch macro is found, then the reader defaults to read-tagged, which exposes us to the notion of potentially user-defined edn readers for tagged literals. That is beyond the scope of this already massive reply, but at least the path is there if you really want to follow it…

read-dispatch

The idea is the same in the java implementation:

dispatch macros

lookup any reader macro associated with current character

etc.

So…with all of that on the table…if you are the Clojure reader, and I ask you to read a stream of characters, then

you will try to match the current character with a known rule (e.g. whitespace, nil, pre-set return character, number, or reader macro)
If the current character corresponds to a reader macro, you will look up the reader macro (a function that operates on the character stream), and apply that function to the unread stream.
- For # that function will be read-dispatch, which has another look up table based on the next character.
- - If you find a function from dispatch-macros that matches the next character in the stream, then you will apply that function to the rest of the stream as a reader macro (a dispatched reader macro in this case, where # indicated the need to dispatch).

So this would “loosely” flow like the following (I intentionally ignore the extra arguments, and represent the character stream as a string here; the reality is more involved but this conveys the casuality):

clojure.tools.reader=> (read-string "#:blah{:x 1 :y 2}") ;;-> 

(read "#:blah{:x 1 :y 2}" ...);;->
  (read* "#:blah{:x 1 :y 2}" ...) ;;->
     ((macros \#) ":blah{:x 1 :y 2}") ;;->
     (read-dispatch  ":blah{:x 1 :y 2}" ...) ;;->
       ((dispatch-macros \:) "blah{:x 1 :y 2}") ;;->
       (read-namespaced-map "blah{:x 1 :y 2}" ...)

#:blah{:x 1, :y 2}

If you used the tools.reader version of read-string and traced the function calls you should see a similar trace.

In practice, we use the reader shipped with clojure (implemented in java), which is exposed via clojure.core/read, clojure.core/read-string, and (preferably) clojure.edn/read, clojure.edn/read-string . Why prefer clojure.edn/read? If you look back at the dispatch macros, one of them is

\= read-eval

This let’s the reader eval arbitrary code at read time (which has some use cases). We may not like that behavior if we are reading input from untrusted sources…

user=> (read-string "^{:blah #=(println :you-got-pwned)} [:hello]")
:you-got-pwned
[:hello]

;;clojure.edn uses a limited dispatch macro table and reads the input as inert data, so read-eval can't be invoked

user=> (clojure.edn/read-string "^{:blah #=(println :you-got-pwned)} [:hello]")
Execution error at user/eval153 (REPL:1).
No dispatch macro for: =

ninesolaries · August 26, 2023, 8:46am

excellent answer to me, this answer is my most wanted! thank you.

but，how can I get these infomations? is it just by read source code of clojure itself like you demostrated in answer?

symbol “#” documents and source can’t granted by (doc …) or(source …) in clojure.repl. and I did not get the idea from function read before, and the read* is private that I did not know at all when not explored.

Is the behavior of reading source code make you know these knowledge?

joinr · August 26, 2023, 10:19am

symbol “#” documents and source can’t granted by (doc …) or(source …) in clojure.repl.

So for any lisp implementation (like clojure), there are a few things that are pre-defined as part of the environment. Those include things like special forms. If you look at the source for let (e.g. using (source let)) you would see it uses let*. If you try the same with let* then you get no source…You will find this for several of the foundational special forms which have a built-in form they use (let → let*, fn → fn* etc.). These forms with * on the end have no source available from the repl; they are part of the implementation. You cannot inspect them…they just exist. You have to look at the implementation to see how they are built (like letExpr in the java compiler…). That is probably more information than necessary (unless you are interested in learning about the implementation, maybe to alter it or make your own Clojure implementation or a lisp).

The reader and therefore the syntax of the language - encoded in the default read table [the reader macros, and the dispatch macros] - are similarly pre-defined outside of the language itself. So the symbol # has no meaning outside of the context of the reader, and no documentation about the reader semantics is provided in the source.

There are in-depth discussions of the clojure reader, in particular the docs. In fact, I think the current docs cover it all pretty well (at least the semantics).

They specifically discuss macro characters and the dispatch macro character.

I just used the source of the implementation in tools.reader to walk through how things are implemented and to show causality (since it is also implemented in clojure). There is no fundamental magic; the implementation is pretty straightforward (parsing a character stream, using functions that map to characters to form syntax rules [these become reader macros in practice]).

I learned most of this stuff before the official docs were in their current state though, and in the early days I had to go through the source (in addition to docs, blog posts, books) to get an idea of how it was implemented. I don’t think people have to do that these days.

I also spent some time in Common Lisp, which has the idea of reader macros as a language feature available to the user. This helped understand clojure’s implementation greatly. Notably, Clojure intentionally avoids user-defined reader macros (outside of the subset provided by tagged literals and user-defined data readers). There are some libraries that can hack around this, but in practice reader macros (for me) are more of a gimmick or party trick.

ninesolaries · August 28, 2023, 3:06am

Yes, I have realized that it’s a trick. I did not follow the “right” document, that cause the confusion to me.
thank you, more doc need to read for me.

joinr · August 28, 2023, 1:08pm

I was specifically referring to the idea of “user defined” reader macros as a party trick or gimmick (in other words not particularly useful in general, but some people really hold them in high value); not the implementation of the reader or how the language syntax is implemented.

system · February 27, 2024, 1:09am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.