I’ve been using Clojure for ten years and would like to think I’m familiar enough with the language that I would not miss something obvious to solve this problem:
I want to read, write and operate on sequences of bytes. This is a common problem in diverse fields such as image and audio processing, Cryptography and low-level I/O.
The underlying JVM platform provides an efficient type for operating on byte arrays (the native Java Byte array, aka “[B” in Clojure). Clojure also has support for a more Clojure-idiomatic (and immutable) byte array via the (vector-of :byte ...)
construct backed by a clojure.core.Vec
. But neither of these types has a literal representation.
BigInt would be efficient with a compact literal representation but for the suppression of leading zeros both in storage and in printing.
Overall, the lack of an intuitive literal representation for a byte array seems to be a problem in Clojure. I have stumbled upon rumblings of a native support for representing sequences of bytes as a hex string, but nothing has come of it as far as I can tell. Hex strings are a common literal representation and I think Clojure would be well served by having a variation of a hex string as the literal representation of a sequence/vector/array of bytes. Reader support for a literal like #x"00ffabcd"
for a clojure.lang.Vec
of byte would do very nicely.
I’ve hacked around with other approaches, the most promising of which was a deftype backed by a clojure.core.Vec
of :byte
. But even with a custom print-dup
and tagged literal, Clojure does not seem to tolerate reading a tagged literal into a deftype backed by a type without its own literal representation (a limitation I can’t quite understand).
I rule out a simple Clojure persistent vector of bytes. While Clojure neatly handles (de-)serialization, and operations on the byte elements via the built-in bitwise functions are reasonable, Clojure vectors are heterogeneous -a severe semantic mismatch. They also have no compact literal representation when filled with bytes (compare the print-dup
of a vector of bytes with a hex string…)
So, cutting to the chase, how do you print, read and operate on byte sequences?