Clojure vectors are so much more than a convenient way to encode sequential data to the repl without having to quote lists. Still it’s not aparent how to use a vector for storing your application data. Clojure doesn’t even have an apparent “look for this specific single object in this sequence or vector” built in.
Also shuffling around data in a vector is not very easy using otherwise conventient sequence operations - constructions like (vec (concat (take ...) new-item (drop ...)))
are obviously… not what the creator of Clojure had in mind.
Let’s see what vectors have speaking for them:
- We don’t have to invent unique keys for storing things in the vector.
- The vector as has a reproducible ordering
- Anything can reference to a specific position in the vector storing only an integer index (a 64-bit long).
Things vectors are not suitable for:
- inserting things anywhere in the collection
- using the position indexing for domain-modelling
I will try to convince you that it is often better to use a “bare” vector than to have a map with various collections of items. I will also show you how to pick various specific collections out of the vector without having to use sequence functions like filter
with various custom predicate functions.
Vectors as functions
Vectors behave similar to maps in some aspects, the position (index) of the item works as the key. Vectors can work as functions like hash-maps, but throws IndexOutOfBoundsException
when the argument is not in the index (like the speedy nth
do). Using get
, nil
is returned when the key is not present. Both behaviours have their use cases.
([1 2] 0) => 1
(nth [1 2] 3) => throws!
(nth [1 2] 3 :default) => :default
(get [1 2] 3 :default) => :default
(get [1 2] 3) => nil
A vector like [:a :b]
is in some ways like map {0 :a 1 :b}
.
There are some caveats:
assoc
only works for either replacing something in the vector:
(assoc [:a :b] 0 :A)
[:A :b]
or add an item the very end of the vector.
(assoc [:a :b] 2 :c)
;=> [:a :b :c]
This conj
-like assoc
was new to me, but it’s reasonable and convenient in many use cases.
Trying to assoc
something beyond the end of the vector throws an IndexOutOfBoundsException:
(assoc [:a :b] 44 :c)
;=> Execution error (IndexOutOfBoundsException)
Destructuring
Vectors can be destructured as
(let [[a b _ d :as the-vector] [1 2 3 4 5]] [a b d the-vector])
;=> [1 2 4 [1 2 3 4 5]]
It is also possible to destructure a vector with the syntax for hash-maps, :as
returns the original vector, :or
(defaults) works as well:
(let [{a 0 b 1 c 2 d 3
:or {c :c}
:as the-vec} [:a :b]] [a b c d the-vec])
[:a :b :c nil [:a :b]]
I would not use this type of destructuring in regular code, but it’s certainly possible.
reduce-kv
is also highly usable with vectors. The key is the vector-index, and the value is the value at the position. A contrived example using the assoc
conjy behaviour:
(reduce-kv
(fn push [agg idx value]
(assoc agg (inc idx) value))
[:hello] ;; aggregate
[5 4 3 1])
[:hello 5 4 3 1]
Using collections of indexes to work on vectors
Vectors of items (or pointers in the form of integers) can also be used as the to copy/move things around in various ways.
An example: we want to move the item in the third position to the first position in the vector, and the item from the first position to the the third position:
(mapv [:llama :camel :snake :capybara] [2 1 0 3])
[:snake :camel :llama :capybara]
The vector [2 1 0 3]
is used as a sequence in this case.
It’s of course also possible to express a partial select of the vector, let’s say we only want the llama and the capybara:
(mapv [:llama :camel :snake :capybara] [0 3])
[:llama :capybara]
Of course we can return copies of objects in any order as well:
(mapv [:llama :camel :snake :capybara] [0 3 3 0])
[:llama :capybara :capybara :llama]
If you have a complex data structure for your application code you can index this data structure to be able to access the data in various ways. The clojure.data.int-map-library gives you sorted compact int-sets which seems like a match made in heaven for indexing vectorized data.
Lets say we have this small data set with two cats, a dog and a snake in various colors.
(def animals
[{:animal :cat :color :brown :name "Bob"}
{:animal :dog :color :brown :name "Dido"}
{:animal :snake :color :yellow :name "Alex"}
{:animal :cat :color :yellow :name "Garfield"}])
It’s possible to create an index like this:
(def index
{:animal {:cat #{0 3}
:dog #{1}
:snake #{2}}
:color {:brown #{0 1}
:yellow #{2 3}}})
If you want to get a vector of all cats, do:
(mapv animals (get-in index [:animal :cat]) =>
[{:animal :cat :color :brown :name "Bob"}
{:animal :cat :color :yellow :name "Garfield"}]
Adding a back-reference
We really want to be able to have some pointer back to the original data to know where it came from. The keep-indexed
is perfect for this use-case:
(def selected #{0 3})
(vec (keep-indexed (fn [idx v] (if (contains? selected idx) (assoc v :idx idx))) animals))
[{:animal :cat :color :brown :name "Bob" :idx 0}
{:animal :cat :color :yellow :name "Garfield" :idx 3}]
More specific selections using set operations on indexes
If you want to select (get the index to) all yellow cats, you can get the set intersection of the sets representing animal type and color using either clojure.set/intersection
(for regular sets), or the clojure.data.int-map
:s own super fast i/intersection
:
(i/intersection (get-in index [:animal :cat]) (get-index [:color :yellow])) ;;=>
(i/intersection #{0 3} #{2 3})
;;=> #{3}
Which can be selected just as before:
(def selected #{3})
(vec (keep-indexed (fn [idx v] (if (contains? selected idx) (assoc v :idx idx))) animals))
=>[{:animal :cat :color :yellow :name "Garfield" :idx 3}]
You just made yourself a indexed database out of a vector!
It is possible to add references (as a single index integer or sets of integer index values) between various entities in the vector, and with more indexing work there can be relational queries and walks in the database. That’s for next time.