Determining subtype at runtime

I took at look at this article about types in clojure but I didn’t understand how one can manipulate types at run time. In particular how can I ask the question at runtime whether one type is a subtype of another? And in which cases can clojure answer this question correctly.

For example in Scala I can use A <:< B to ask whether one type is a subtype of another, but it is documented to return false negatives. I.e., if it returns true then A is indeed a subtype of B, but if it returns false, then it is not guaranteed that A is not a subtype of B.

Thanks for any help.

You can use the function isa? to do it.

It will work for both Java type inheritance, and for derive hierarchies, see: http://clojuredocs.org/clojure.core/isa_q

You might also find the derive page on Clojuredocs relevant.

Addendum: this might not be novel to you, but Typed Clojure has had a turbulent history. Since Typed Clojure came out, a recent development has been that of Clojure Spec. Their use cases overlap somewhat. Clojure Spec seems to be attracting more usage and development effort.

According to the document you linked, each spec specifies a set of values, i.e. a type. A type is not more than just a set of values. So how does this relate to isa? Isa can’t solve the halting problem, so how does it distinguished between subtype, not-subtype, and dont-know?

Can anyone point me to academic papers about subtypeness, typed clojure, or spec, which I can cite in my article?

Oh sorry, I just saw that you linked to a core.typed article.

Core.typed types don’t exist at run-time. They are compile only.

The actual runtime uses JVM types and a custom tag in certain scenarios to establish derive hierarchies.

Here’s the thesis for core.typed:

And here’s the author doing a practice defense of it:

And this is as much as you’ll find for spec (since its not an academic backed project):

Also Specs are not types, they are predicates, and they define contracts. They implement a system called Design By Contracts popularized by the Eiffel programming language, and the Clojure one pull a lot of inspiration from the Racket one and the RDF spec. You can read more here: https://wiki.c2.com/?DesignByContract

Hum, as far as I know, isa? always knows. There are never ambiguous cases. So it’ll just return true or false.

@didibus, here is a description of the important case. A type (in my understanding) is a set of values. If I’m wrong, if a type in clojure is something other than a set of values, can someone please tell me what a type is?

And the question of whether A is a subtype of B is the same as asking whether the set A is a subset of set B. A spec defines a set of values. Therefore, IT IS A TYPE. Since I’m not a closure expert, I cannot say whether there is a syntax in the language to represent such a type for use with isa. I.e. is closure capable of denoting the type designated by a spec?

The problematic case is when you have two arbitrary functions, F and G. With spec, you can designate the set of values for which F returns true, call this Tf and the set of values for which G returns true, call this Tg. However, by the halting problem, it is impossible for isa to decide whether Tf is a subset/subtype of Tg.

It is a bit of a subtle difference, but a type isn’t exactly the same as a set of values. A type is a symbolic tag associated with some data, with implicit rules related to it as defined by the given type system that manages it.

For example, these are types in Clojure:

  • java.lang.Long
  • java.lang.String
  • clojure.lang.PersistentVector

You can think of a type as a label, and typed data as a piece of data with an accompanying type label.

What happens next is that, the type of some data (aka data type) will be tracked by the type system. And its specific type will define implicit rules the type system can rely on, for example, it can define the operations that are permitted to manipulate the data, the semantic meaning of the data, the memory encoding for the data, relations to other types in the type system, etc.

Now here is where things get a bit tricky. A type can be used to restrict the set of values from which a variable, function, form, etc. may take its value, but a type is not a set of values and it doesn’t need to be.

Let me try to simplify this. Here’s a piece of data using Clojure’s EDN notation:

{:name "John"
 :age 12}

You can consider this to be an untyped piece of data. Now, say we wanted to add types to it, here’s one way to do it:

{:name "John"
 :age 12
 :type :person}

This is now typed data, and :person is now a type. That’s all a type is. Now, the type system can choose to use this type in any way it wants. So you can imagine for example, it could validate that only data of a certain type is provided to a given function, for example:

(defn kill-person [person]
  {:pre [(= :person (:type person))]
   :post [(= :person (:type %))]}
  (assoc person :dead? true))

Now in a real type system, the type is at a much more low level. Untyped data are bits in memory. And the type system for any known section of bits will track the type of those bits. Thus that will specify how the bits are to even be encoded/decoded and stored. And then things grow out of that.

The relation to a set is thus dependent on the type system in use, and it is implicit. The type doesn’t actually list out the possible values, and it doesn’t even necessarily have a predicate that it can use to validate if the data actually fits the implicit assumptions of a particular type, though most type system will, at least for primitive types.

So, if you followed along, you can see that a type doesn’t necessarily define a set. So checking if a type is a subtype of another doesn’t need to perform a subset operation. It can just check the type information provided by the type system. For example:

{:value "c"
 :type :char}

{:value "cool"
 :type :string}

Now the type system might define the following subtyping relation:

(def subtype
  {:char :string})

And now if you asked (isa? :char :string) the type system can just look in the subtype map to see if it is true or false. No set operations are involved.

So now in Clojure (JVM), the first type system involved is the Java type system. This type system is used to define encodings for the memory bits in use by the program, so you have all the primitive types like:

  • int
  • long
  • char
  • double
  • array
  • etc.

Those types define the memory layout and the decoding scheme (like big endian). They are also used to validate and specify the possible values they represent. That’s where you could say that there exist a set of values that maps to the specific type, but that set existing, does not mean that the type itself is that set. It is more like a symbolic representation of that set. And it doesn’t even mean there are subtyping relations if their respective sets were to be subset of each other. For example, a long is not necessarily an int. You can’t use an int in place to something that needs a long without converting one to the other. That’s because their encodings aren’t compatible, an int is not just the subset of longs from -2,147,483,648 to 2,147,483,647. So even though the set of value they represent have a subset relation, the types are not compatible.

Types in Clojure are thus closely related to types in Java. Each type being either a primitive type, or a class/interface type. Like in Java, classes/interfaces can extend/implement others, thus defining subtyping relations. You can create new JVM types using deftype and defprotocol. The rules of the Java type system will apply to them. Protocols extend that beyond what the Java type system allows, by making type relations definable dynamically at runtime as well, and so that they can be declared outside the type definition itself. Java determines if a given type isa subtype of another purely through these relationship definition. In real Java, you’d also have generic types, which have more complex rules around determining subtyping relations, those don’t exist in Clojure.

Clojure also has another type system which is the derive based one. This type system is a lot like the one I just demonstrated in my examples above. You can kind of use it to build your own custom type systems on top of it. It integrates with the Java one, because Java types can be defined as subtypes, so you can for example:

(derive java.lang.LinkedList ::collection)

Nothing uses this by default. Though isa?, parents, ancestors, descendants, etc., all integrate with it. The most common use case is probably to use it in combination with multi-methods for polymorphic dispatch.

No it can’t. And like I said, types won’t necessarily map back and forth. That said, it isn’t impossible, there can exist a relation from predicates to types and back. This is what spectrum tries to do: https://github.com/arohner/spectrum

Predicates, sets and types?

Alright, so now, all of these concepts can overlap in some ways and have a lot of intersect. Yet, none of them have universal formal definitions as well. So each one must be taken within the context of the theory or concrete system they are being used in.

So if:

  1. A set SX is a distinct collection of values.
  2. A predicate PX can be written to return true if X is an element of the set SX.
  3. A type TX can be created whose definition is that it restricts data to only those which are in the set SX or for which PX returns true.

You can see how there is some possibility of a correspondence here. That said, depending which of those three things you actually have, you can’t do the same things, and you can’t necessarily easily derive the others.

What about core.spec?

Ok, I’ll finish by talking about core.spec. This system exposes predicates, and lets the user define predicates. I hope I managed to explain how predicates are not types, and neither are they sets. Given a predicate, it isn’t trivial (or maybe even possible not sure), to figure out if any of the Java types would correspond to it or be a subtype of it if it were to have a corresponding type.

You could define corresponding types yourself, and corresponding relations if you wanted too. But I don’t know if one can derive those programmatically or even mathematically. Like:

(s/def ::foo (s/int-in 20 50))
;; Now let us also assume the type ::foo, which is the type of all values
;; that validate when tested against the spec ::foo
(s/def ::bar (s/int-in 30 40))
;; Now let us also assume the type ::bar, which is the type of all values
;; that validate when tested against the spec ::bar

;; Now, how would we know what relations exist between ::foo and ::bar?
;; Could we infer them? Maybe, but more easily you could just specify them:

(derive ::bar ::foo)

(isa? ::bar ::foo)
;;=> true

And if you wanted to grow a type system over that, for every function whose spec say that the return or any of the arguments should be a valid ::foo or ::bar spec, you can also try to type check their corresponding types.

If you wanted to infer or derive complex relationship from the specs to their types and between them, you most likely would face the halting problem, as I’m pretty sure it isn’t feasible (unless you restrict yourself to only certain predicate or use some non-turing complete predicate system), but with some level of provided annotation it might be doable.

2 Likes

OK, in the type systems I’m used to, a type is a set of values, and the thing that you call a type, I call a type specifier. In Common Lisp the SUBTYPEP function does not take types as arguments but rather type specifiers, and answers questions about the types which its arguments specified.

@didibus, is your description of the type system of clojure (which you’ve summarized here) actually documented in a source I can make a citation to? Otherwise, I can cite this forum posting as a last resort.Admittedly, I’ve scanned but not yet read the thesis of Ambrose Bonnaire-Sergeant

Hum, not that I know of, this is my own understanding for now, and I would consider myself to still be an apprentice in all of this, so I wouldn’t be surprised if I got a few things wrong.

Because mostly the types in Clojure are those of Java, your best bet is to search for sources covering Java. I did find this official doc from Oracle: https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.10 it lists out the subtyping relation rules.

For protocols there is the Clojure doc: https://clojure.org/reference/protocols

I also want to say, nothing I said pertained to core.typed. core.typed is a separate type system that one can use for type checking Clojure. I don’t know enough about it to speak. It has its own concept of types and type relations. The Clojure runtime does not use it though.

I found the following paper: https://arxiv.org/pdf/1605.05274 which seems to prove that subtyping S <: T in Java is undecidable. That said, it seems to rely heavily on wild cards and generics, and in the case of Clojure, it doesn’t make use of those with its use of Java’s type system, so I don’t think the proof can be extended to Clojure.

Edit: Looks like there is a revised version of the paper as well here https://arxiv.org/pdf/1605.05274v2

Yes, I know this paper or similar ones. It was part of my initial reason for wondering what isa does on undecidable cases, or how it avoids them.

Isn’t there a difference in decidability between static analysis type checking and runtime type checking? I mean, Clojure’s isa? is a runtime construct, and as far as I can tell <:< in Scala is a compile time construct; or is it usable at runtime as well in Scala?

So, I have no formal proof, but my instinct is that isa? and instanceof are both decidable, because there is no generics or wildcard involved in their resolution algorithm.

Since generics and wildcards in Java are erased at run-time, and exist only in the compile time language of type, which Clojure does not make use of, then Clojure’s usage of subtyping would be decidable.

Similarly, Clojure adds a few extra things to the resolution algorithm. I think those are also all decidable, because it has no parameterization or any kind of complicated rules, it’s all basically an ordered lookup.

First look if there is a metadata extension for the type, then look if there is an extend-type extension, finally call Java’s resolution which at run-time is itself decidable given that generics are erased.

If this happened to be undecidable, I don’t think anything would be done about it, your program would just hang. I don’t think there is any timeout or other such thing where it would return unknown. So at least, while there might not be any proof, I think the assumption Clojure make is that it is decidable.

When Clojure compiles and specifies compile time types (such as when inferred or using type hints), it always uses Java raw types. Those types are never generic and never use wildcards. So similarly, I think these will always be decidable too.

Frankly, I don’t understand the semantic difference between a compile time and runtime type system. Why would the decidability question be different. I’m not claiming they’re the same. Rather the distinction is something I’ve never grasped.

In common lisp, types are sets of objects. And this is the same at compile time. So it’s difficult for me to understand a system where this fails to be the case.

Hum, I think type inference could be one aspect. At run-time, that should be easier, since you’ve got an actual value which you can just get the type of. So I think sometimes type inference is what can be undecidable, and not the type checking itself. So if you are missing type annotations at compile time that you’d need to infer so you can type check, that could be one scenario. Where are runtime, you might not need to infer, since you can just inspect the value for its type.

In Java though, it’s very specific to the fact that the compile time type system has more features than the runtime type system, and from my understanding, it is due to those additional features that the Java compile time type system is turing complete and undecidable. Since those features are not present in the runtime type system though, that one is not turing complete and decidable (though I couldn’t find a proof actually proving so).

What happens is that generic types and wild cards are erased at run-time. So a List<T> is just a List of anything. And a List<? extends Number> is just a List of anything. If you think in terms of set of values, you could say the set of values allowed at run-time is always wider. The type at compile time can often be more restricted than the one at run-time. Like a List<Integer> at compile time only allows lists of Integers. At run-time, the list elements can be anything, even a mix and it will type check.

Those generics I think can be used to create recursive types, like types of types of types. And I think that’s one of the reason you can then abuse them to make the type system turing complete, which also result on this being undecidable. Without them though, you can’t do that, and it would be decidable again.

At least the paper I linked depends on the presence of generics to form its proof. So I guess without them, there are no longer any proof that the remaining type system of Java (as the one used at run-time) is undecidable. And I take that lack of proof as probably indicating that it is in fact decidable.

This is a design decision made by the developers of the type system. JVM Clojure has one notion of types; core.typed has built its own.

Some type systems completely erase themselves at runtime, especially type systems built on top of dynamic languages such as dialyzer (Elixir), TypeScript (JavaScript), Flow (JavaScript), BuckleScript (OCaml -> JS). Java erases generics, ensuring that a call always realizes to a concrete class at runtime.

Often these overlaid type systems offer ways to integrate with runtime checking in some way, such as TypeScripts type guards, however this needs to be done explicitly.

It’s not clear to me whether core.typed types get erased at runtime or not, but I do not think it would be completely outlandish if it did not persist its types past the static type checking step.