Tagged literal for `clojure.core.Vec` -not possible for Clojure?

I’m struggling with building a reader function for a tagged literal that will yield a clojure.core.Vec -which, when initialized to hold :byte, is a pretty close stand-in for a Java native byte array but with the nice property of being immutable and following Clojure equality semantics.

Regardless of my reader function, the resulting value is always a clojure.lang.PersistentVector -which is not at all the same thing.

Here’s a quick reader function you can try that looks like it should work:

(defn read-bytes [bs] (apply vector-of :byte (map byte bs)))

Then add an appropriate data_readers.clj entry:

{cch/bytes cch.bytes/read-bytes}

Then, from the REPL, type in

(class #cch/bytes [-1 0 1])

I am surprised by the result, especially since this yields the expected answer:

(read-string "#cch/bytes [1 0 -1]")

It’s as though the Clojure reader cycles through the print/read process a second time after the reader function has done its job. But even defining print-dup for clojure.core.Vec doesn’t seem to resolve the problem. Here’s my attempt:

(defmethod print-dup clojure.core.Vec
  [^clojure.core.Vec v ^java.io.Writer w]
  (.write w "#cch/bytes ")
  (.write w (str v)))

In fact, it doesn’t seem to matter what I use for the print-dup implementation -Clojure always prints a clojure.core.Vec as a clojure.lang.PersistentVector.

So I guess there are really two questions: 1. Why can’t I supply a print-dup for clojure.core.Vec? and 2. Why is my tagged literal reader function not sufficient to yield a clojure.core.Vec?

I had a similar double-printing phenomenon where the read-bytes function was invoked 2x. I’m guessing that there’s some special processing happening in the reader logic that’s maybe getting confused by using Vecs which derive from vectors…

Out of curiosity (just avoiding any problems that might happen if you are working on something that inherits from vector), if you define your own typed container that wraps a byte array:

(deftype bytevector [^bytes bs]
  clojure.lang.Counted
  (count [this] (alength bs))
  clojure.lang.Indexed
  (nth [this idx] (aget bs idx))
  (nth [this idx not-found]
    (if (and (pos? idx)
             (< idx (alength bs)))
      (aget bs idx)
      not-found))
  clojure.lang.Seqable
  (seq [this] (seq bs)))

(defn read-bytes [xs]
  (bytevector. (byte-array xs)))

(defmethod print-method bytevector [v ^java.io.Writer w]
  (.write w (str "#user/bytevector"(vec (seq v)))))

(defmethod print-dup bytevector [o w]
  (print-ctor o (fn [o w] (.write w  "(byte-array ")
                  (print-dup  (vec (seq o)) w)
                  (.write w  ")")) w))

(defn tst []
  (let [res (binding [*data-readers* {'user/bytevector user/read-bytes}]
              (read-string "#user/bytevector[0 0 1]"))]
    (println [(type res)  res])
    res))

It seems to work okay:

;; user=> (tst)
;; [user.bytevector #user/bytevector[0 0 1]]
;; #user/bytevector[0 0 1]

Keep in mind that if you are using something like Cider, it will override the default print method and use pprint when it prints to your REPL in emacs; it may be necessary to implement a pprint method as well (this caused some headaches with defining custom printers for record types that seemed to never work, when in fact you have to define a print-method and a pprint method for cider to pick it up lol).

edit:

Funny enough, this seems to work fine!

(defn other-tst []
  (let [res (binding [*data-readers* {'user/bytevector (fn [xs] (into (vector-of :byte) (map byte xs)))}]
              (read-string "#user/bytevector[0 0 1]"))]
    (println [(type res)  res])
    res))

;; user=> (def res (other-tst))
;; [clojure.core.Vec [0 0 1]]
;; #'user/res

Wondering if maybe your call to apply isn’t doing something unexpected somehow.

My experience is that read-string works where direct input into the REPL fails. I do not know why.

After setting my data_readers.clj to reference a var with the implementation @joinr supplied (including the apply) the result of any REPL evaluation is a clojure.lang.PersistentVector.

Funny enough, if you map type over the resulting persistent vector, they are all Byte types.

That is really weird. It’s probably an artifact resulting from the mechanism that transmutes the vector as a whole into a clojure.lang.PersistentVector.

I think I figured it out (partly). I remember from implementing clcojure that vectors have different evaluation semantics (as do other persistent structures). So when you “read” a vector, you may be computing a byte-vector (hence the byte coercions here) at read time, but that Vec result is then “eval’d” from the REPL because (I think) it looks like a form that implements IPersistentVector which is caught during analyze during parsing yielding a clojure.lang.Compiler$VectorExpr, and when eval is invoked on the VectorExpr, it implicitly returns a persistent vector with boxed types. So that’s the problem, and it’s why read-string works (the resulting Vec is not eval’d). So

user=> (def the-vec (apply vector-of :byte [0 1 0]))
#'user/the-vec
user=> the-vec
[0 1 0]
user=> (type the-vec)
clojure.core.Vec
user=> (type (eval the-vec))
clojure.lang.PersistentVector
user=> (type the-vec)
clojure.core.Vec

user=> (defn read-bytes [bs] `(apply vector-of :byte (map byte ~bs)))
#'user/read-bytes
user=> #user/bytevector[0 1 0]
[0 1 0]
user=> (def res #user/bytevector[0 1 0])
#'user/res
user=> (type res)
clojure.core.Vec

Ugh, getting print-dup working with a deftype that wraps a either a primitive vector or a byte array is a pain. That seems to be the only way to work around the built-in parsing of IPersistentVectors at the moment…

1 Like

That’s impressive analysis. I wonder why eval is called on the Compiler$VectorExpr and whether the same collapsing of types happens for associative types as well. Either way, it seems like an intractable problem that will limit my ability to use clojure.core.Vec as a more Clojure-idiomatic byte array.

It does happen with the other interfaces for persistent structures, since they are a sort of pseudo form. I would not say it’s intractable to use Vec, just not directly since it will get caught by the analyzer in this way. The approach I am looking at, which almost works, is to just wrap the Vec in a type the implements the interfaces of a vector, but not IPersistentVector. This allows the wrapped Vec to participate in common functions, while being left unevaluated after read (like a constant). If I can figure out the gripes from print-dup, it seems viable.

Did you find a way to get print-dup working with a deftype that implements clojure.lang.ISeq or similar? I cannot…

I’ve added some details here: https://ask.clojure.org/index.php/9567/possible-create-tagged-literal-reader-for-clojure-core-vec and here: https://github.com/jafingerhut/vec-data-reader

It seems to do what you are hoping for requires either a separate implementation of primitive vectors in Clojure that is designed with this use case in mind, or changes to the Java code that implements Clojure.