Expected syntax for destructuring-bind

Jim_Newton · September 24, 2020, 4:36pm

This was question here earlier. I’m copying here for hopefully a more in-depth discussion.

In Common Lisp I can write a lambda form like the following:

(lambda (a b c d)
   (declare (type number a c)
            (type integer  c)
            (type string d))
   ...)

And depending on which compiler I’m using, the compiler is allowed to optimize the code in specific ways.
I can also write a macro to implement my own version of lambda such as

(my-lambda (a b c d)
   (declare (type number a c)
            (type integer c)
            (type string d))
   ...)

and in the macro, examine the declarations (as they are just raw s-expressions given to the macro as input). The advantage being that every CL user automatically knows how to use my-lambda because I’m promising that it has the same syntax as lambda.

QUESTION: is there any sort of type optional type declaration which I can use in my macro which corresponds to something the language already supports? I don’t want to invent my own if there is already one.

Here is what I have so far. I wonder if what syntax a clojure programmer would expect for the variable type correspondences.

(destructuring-case (some-function-call)
  [a [b c] & d]
  (a Boolean b String d Boolean)
  (println [:first a b c d])

  [a b]
  (a Boolean b (or String Boolean))
  (println [:second a b]))

The meaning is that if (some-function-call) returns a sequence matching the template [a [b c] & d] where a is a Boolean and b is a String and d (which is guaranteed to be a sequence) is a sequence of zero or more Boolean, then (println [:first a b c]) with the appropriate elements bound. Otherwise if the sequence matches [a b] where a is Boolean and b is String or a Boolean then (println [:second a b]).

seancorfield · September 24, 2020, 5:10pm

Have you looked at clojure.spec? That has s/fdef to provide a specification of function arguments for dev/test validation (via “instrumentation”) and of function results/behavior for generative testing.

I’d also suggest reading my blog post about Spec: How do you use clojure.spec?

Jim_Newton · September 24, 2020, 5:11pm

are you suggesting that a user of destructuring-bind would expect a spec-like syntax?

seancorfield · September 24, 2020, 5:17pm

Your destructuring syntax looks very un-Clojure-like to me – but clojure.spec is built-in so I would expect Clojure users to be more comfortable with something standardized and well-documented.

I understand, from comments Alex has made, that Spec 2 may provide some sort of integrated syntax for declaring function specifications (rather than Spec 1’s requirement that you write both defn and s/fdef).

Jim_Newton · September 24, 2020, 5:18pm

I took a look at the blog you suggested. But I’d like to hear more about your idea. I know of the existence of spec, and I’ve seen several videos on youtube about spec, but I haven’t used it.

Jim_Newton · September 24, 2020, 5:28pm

Thanks for the comments. destructuring-bind is not intended to be a way to document function, but rather an efficient pattern matcher which selects the correct code-path according to whether the data fits one of several given patterns. Patterns may be specified by shape or type or combinations thereof. This is also intended to avert the limitations of clojure multi-arglist-functions which do not include structure checking like the following.

((fn ([a] 1) ([a [b]] 2)) 10 20)
Execution error (UnsupportedOperationException) at clojure-rte.core/eval17671$fn (form-init6569138664565782700.clj:9204).
nth not supported on this type: Long

It will also guarantee that complex type checks are as efficiently arranged as possible to avoid calling the same type predicate on the same data more than once, especially if a type check involves calling user given code (such as a type defined by spec).

BTW, when I say type I do not mean java type, but rather a designated possibly infinite set of values.

seancorfield · September 24, 2020, 5:36pm

It sounds like you are thinking about predicate-based dispatch, for which Spec could be used but it perhaps wouldn’t be the fastest approach (although it may well be “fast enough” for a lot of cases, as part of your pattern matching).

http://tgk.github.io/2013/06/generic-operator-predicate-dispatch-semantics-in-clojure.html might be good reading and I’m trying to remember the GitHub username of someone who has already written a predicate-based dispatch system for Clojure… Ah, here we are: https://github.com/johnmn3/dispacio

Jim_Newton · September 24, 2020, 5:38pm

w.r.t. unfamiliar syntax, that’s the purpose of the post.
clojure has a let/case/cond like syntax where the programmer specifies a series of forms which symantically are viewed as pairs, to avoid the programming needing to delimit with parens:

(case x
  a 100
  b 200
  c 300)

and

(cond
  (test-1) (consequent-1)
  (test-2) (consequent-2)
  (test-3) (do (side-effect) (consequent-3)))

and

(let [a value-of-a
      b (value-of-b-using a)
      c (value-of-c-usung a b)]
  ...)

However, in other cases the clojure syntax demands that the user delimit these pairs using extra parens, such as letfn and multi-arglist-functions

(letfn [(f1 [a b] ...)
        (f2 [a b c] ...)]
 ...)

and

(fn ([a] (compute-with-a))
    ([a b] (compute-with a b)))

In the case of my code, what would be the clojure way of grouping the operands of destructuring-bind into these three pieces of data, the lambda-list, the type restrictions, and the code body.

Another question would be how might the user like to associate variables with type restrictions.

Jim_Newton · September 24, 2020, 5:44pm

The reference to my work can be found here: typecase optimization and condition variable binding.

The relation to predicate dispatch, in my approach is delegated to the type system itself. If the type system allows user-defined-types which call user-code to determine type-membership, then yes it supports predicate dispatch. However, because of the halting problem it is therefore impossible to detect at macro-expansion-time whether an intersection of two types (one of which contains a user predicate) is vacuous. Apart from user predicates, it is very often possible to either 1) predict at macro-expansion time that certain code paths will never be reached because of vacuous type intersection or 2) generate code which will avoid redundant expensive type checks at runtime.

Jim_Newton · September 24, 2020, 5:49pm

That’s an important point my approach succeeds in avoiding. The pattern matching algorithm can guarantee that no backtracking occurs. If the pattern matching including backtracking, then there are easy to construct cases which have exponential complexity, whereas the non-backtracking approach guarantees linear complexity.

seancorfield · September 24, 2020, 5:51pm

Since Clojure is not a statically-typed system, I think you’re going to inherently produce something that is non-idiomatic – in which case you probably shouldn’t be too concerned about syntax, I guess.

clojure.spec is all about predicates. Rich and others are deliberately avoiding “type systems” and instead working toward more predicate-based approaches. Rich has also explicitly rejected pattern-matching as a feature in Clojure (he’s talked about why on the old Clojure mailing list from time to time and I think it has cropped up in some of his talks and interviews as well).

Jim_Newton · September 24, 2020, 6:04pm

Don’t confuse type-system with static-type-system. When we say type, dont think “Java Type”, but rather think of a set of values which is possible to designate. A type system (especially a dynamic type system) allows the programmer to describe sets of values at runtime and also at compile time, and reason about unions, intersections, and complements of types. For example, if it is discovered that x is an element of both type X and also Y, but it can be proven that X and Y are disjoint, then we can sometimes know that certain code is unreachable. In the case of destructuring-bind that applies to figuring out which if any of the clauses are completely subsumed into earlier ones. Or figuring out whether the cases are exhaustive.

Jim_Newton · September 24, 2020, 6:06pm

It would be nice to understand whether and why he’s opposed to pattern matching in general. Or whether it is pattern matching simply based on static types.

Jim_Newton · September 24, 2020, 7:04pm

I thought about implementing the syntax like this, to emphasize that the type constraints argument is a map from variable to type.

(destructuring-case (some-function-call)
  [a [b c] & d]  {a Boolean b String d Boolean}
  (println [:first a b c d])

  [a b]          {a Boolean b (or String Boolean)}
  (println [:second a b]))

It could also be done with the reverse map, which would allow multiple variables to be declared the same type.

(destructuring-case (some-function-call)
  [a [b c] & d]  {Boolean [a d] String b}
  (println [:first a b c d])

  [a b]          {Boolean a (or String Boolean) b}
  (println [:second a b]))

but this risks the problem that the syntax {Boolean a Boolean b String c} gets a syntax error during parsing: Syntax error reading source at (REPL:9214:49). Duplicate key: Boolean.

The syntax could also be made to look more like multi-arglist function definition, making it easier to have multi-expression function definitions, avoiding mandatory calls to (do ...). However, this violates the cond no-unnecessary parens principle.

(destructuring-case (some-function-call)
  ([a [b c] & d]  {Boolean [a d] String b}
   (println [:first])
   (println [a b c d]))

  ([a b]          {Boolean a (or String Boolean) b}
   (println [:second])
   (println [a b])))

joinr · September 24, 2020, 10:23pm

It’s idiomatic (at least in core clojure) to push type hinting into metadata. I wrote a little library called structural that extends this idea and allows efficient typed destructuring based on user-supplied type hints to emit faster code than the very general polymorphic clojure defaults.

Accordingly, I would think something like destructuring-case could be written as

(destructuring-case (some-function-call)
  [^Boolean a [^String b c] & ^Boolean d]
  (println [:first a b c d])

  [^Boolean a ^{:type (or String Boolean)} b]
  (println [:second a b]))

(destructuring-case (some-function-call)
  [^Boolean a [^String b c] & ^Boolean d]
  (println [:first a b c d])

  [^Boolean a ^{:type [String Boolean]} b] ;;maybe vectors denote or, sets denote and
  (println [:second a b]))

During macro expansion time, you can then scrape the tags or other relevant typing information from the meta using &form,

(defmacro example [hinted-form & body]
    (let [[hinted-form & body] &form] ;;preserves meta
       ...some-processing))

orchestra’s defn-spec has one variant of including type/spec information (for matching arg vectors).
gradual does more complicated type signatures.
typespec similar idea.
speck another flavor.

dust is an interesting meld of pattern matching and spec.

This is probably considered open research at this point; at least there is no obvious popular choice or default language construct.

Jim_Newton · September 25, 2020, 4:26am

What kind of information can I put in metadata? For example, [^(or Boolean (and Number (not Long))) a], this is the genre of type designators supported by my underlying system to which I’m trying to define a programmer-facing api.

Jim_Newton · September 25, 2020, 4:32am

Given such a vector, [^Boolean a [^String b c] & ^Boolean d] how can I extract the meta data associated with a and with d?

Jim_Newton · September 25, 2020, 4:38am

Yes, my implementation in clojure is part of a larger research project applying similar principles to various languages… I’m actively looking for students interested in working on the project.

Jim_Newton · September 25, 2020, 6:40am

Thanks for pointing out these references. They’ll be useful for summarizing previous work. BTW I’d be especially interested in peer-reviewed academic papers related to these works, if you know of any.

joinr · September 25, 2020, 7:10am

In clojure, metadata is idiomatically a persistent map. Further, anything can implement the IMeta interface, or the more common IObj interface (there are similar protocols in cljs), and it will now have metadata capabilities. When using the ^ reader macro that denotes metadata, the default inference is that ^some-type will associate metadata of {:tag some-type} with the following object (clojure literals like vectors, maps, sets, lists, symbols all support metadata). If you want more explicit control, you can pass in a map literal to be the actual metadata, e.g. ^{:type long}x .

This would be no different than constructing a symbol, e.g. via gensym, and giving it metadata…

(use 'clojure.pprint)
(binding [*print-meta* true]
   (pprint  (let [x (with-meta (gensym "x") {:tag 'long})]
                   `[~x] )))
;;[^long x21037]

Per the docs, metadata must be a symbol, keyword, string, or map. Maps are inferred to be the literal metadata, keywords are inferred as metadata of {:the-keyword true}, and strings/symbols are inferred as {:tag the-string|the-symbol} metadata for type hinting. You rarely see strings in practice, although there are cases (as with java’s weird primitive array type) where they show up. Since you can supply a map literal, you can put pretty much anything in there; it’s just the form that the reader expects (e.g., the syntax sugar) infers the aforementioned behavior from a limited set of types. metadata is commonly used to pack supplemental information; you just throw stuff into a map basically. This is particularly useful for metaprogramming, since you can pack annotations about the forms and expressions in the metadata and access it as needed. The only hurdle is that the default defmacro does not include the metadata from the input form by default, hence my original example to use &form to get that information back (or write your own defmacro that does this by default).