How do you deal with big data structures and spec?


#1

I’ve been working on an application that uses org-mode files for data storage.

As part of this, I parse and load several big org mode files into one map {filename [items]} - lets call this the db. Then, most of my functions take the db as the first argument and then perform action on it.

Now, I would like to spec some of my functions to ensure that they get the right data in and out.

However, what happens now is that any time there is a spec problem in my db, my whole application, repl and emacs crashes because the exception that is printed contains several megabytes of text. This is super frustrating.

I’ve set up a custom caught handler for cider, which helps somewhat. However, I also use yada which seems to use clojure.tools.logging to print errors, which insists on printing the entire massive datastructure.

I was just wondering if there are any common solutions to this problem? Is there a way to specify custom to-string functions for a data structure? It is just a map, not a record, but I can probably change it to a record?

I was also wondering whether the new datafy protocols would help with this in any way?


#2

I’ve had a similar problem where our component system was printed, which wasn’t very nice to look at. In combination with expound, I solved it like this:

(ns ... (:require [expound.alpha :as expound]))
(declare my-expound-value-str)

(def expound-opts
  {:show-valid-values? false
   :value-str-fn #'my-expound-value-str
   :print-specs? true})

(defn my-expound-value-str
  "Prints only keys if system map, else defaults to expound."
  [_spec-name form path value]
  (if (and (map? value)
           (every? (fn [ns]
                     (when ns
                       (str/starts-with? ns "dre.app")))
                   (map (fn [k]
                          (and (keyword? k)
                               (namespace k)))
                        (keys value))))
    (str (keys value))
    (#'expound/value-in-context expound-opts _spec-name form path value)))

(defn my-explain-out
  [explain-data]
  ((expound/custom-printer
    expound-opts) explain-data))

(defn activate-specs
  "Check spec asserts and fdefs"
  []
  (alter-var-root #'s/*explain-out*
                  (constantly #'my-explain-out))
  (s/check-asserts true)
  (st/instrument))

#3

I’m afraid I cannot give any help, but I got really curious as to what you’re using for handling org-mode files in Clojure… Are there any good libraries out there, or did you roll up your own? I haven’t seen any such libraries on the org-mode site…


#4

You could wrap your data in one (or many nested) defrecords and define custom print-methods for them. You can even make the behavior dynamic based ona flag in the record or whatever. I‘ve used this to shorten prints of large isolated component systems (using stuartsierra‘s component lib)


#5

Thanks for the suggestions and help.

I’ve tried the custom print methods, but that does not work with pretty printing unfortunately. I also tried the expound method, but for some reason I can’t seem to get it working consistently - probably because I don’t understand alter-var-root and set! and their interaction with nrepl well enough.

For now I’ve replaced cider’s caught-fn with a custom method that just prints “ERROR”, and sets a global last-err atom, and then I use cider-inspect to inspect the last-err map. Not ideal, but seems to be working fine for now.


#6

I ended up writing my own org-mode parser because I couldn’t find any that fit my requirements well enough. I plan to publish it once it is somewhat stable.


#7

Alright… I found a couple of older ones on github since seeing your thread, and got the impression that they were half-baked. I look forwards to yours!!