Need help to resolve symbols programmatically

Jim_Newton · October 4, 2020, 11:26am

I’m using resolve in my program in several places. And I don’t really know how to do this correctly. I expect the calling function (someone using my library) to provide a quoted hierarchical list which designates particular semantics which are explained in the library documentation. In that quoted list, the caller is allowed to name classes such as Boolean or java.lang.Comparable or clojure.lang.Symbol.

Such a list might look like the following in the client code.

(deftest t-canonicalize-pattern-subtypes
  (testing "canonicalize-pattern with subtypes"
    (is (= 'Number (canonicalize-pattern '(:or Integer Number))) 
        "Number")
    (is (= '(:* :sigma) (canonicalize-pattern '(:or Number (:not Number))))
        "sigma")
    (is (= 'Integer (canonicalize-pattern '(:and Integer Number))) 
        "Integer")))

(deftest t-canonicalize-pattern-14
    (testing "canonicalize-pattern 14"
      (is (= '(:or (:and
                    clojure.lang.IMeta
                    clojure.lang.IReduceInit
                    java.io.Serializable)
                   (:and
                    clojure.lang.IMeta
                    clojure.lang.IReduceInit
                    java.lang.Comparable))
             (canonicalize-pattern '(:and (:or java.io.Serializable java.lang.Comparable)
                                          clojure.lang.IMeta clojure.lang.IReduceInit)))
          "and-distribute")))

My code examines the given symbol to determine whether it really names a class. This is done with code similar to the following:

(and (symbol? type-designator)
     (resolve type-designator)
     (class? (resolve type-designator)))

The problem is that resolve is documented to search in the namespace *ns*. However, when the user runs my program he might have set his repl namespace to anything. So relying on the current namespace is certainly not the correct thing to do.
However, if I use (ns-resolve (find-ns 'clojure.core) type-designator) or even the namespace my program is written using, (ns-resolve (find-ns 'clojure-rte.core) type-designator) then the call to ns-resolve won’t notice classes the user has defined, nor variables the user has defined such as (def Symbol clojure.lang.symbol).

QUESTION: How can I figure out how to resolve the symbol the user has given me?

My original implement of this program was in Common Lisp where this kind of problem does not occur. In CL the reader resolves the symbols to the correct namespace. As I understand it, in clojure, this namespace resolution must be done at runtime, when there doesn’t seem to be enough information to do so.

Jim_Newton · October 4, 2020, 11:28am

I think I need to somehow resolve the symbols in the namespace the reader was in when the symbol was read. right? but ow can I do that?

Jim_Newton · October 4, 2020, 1:56pm

Another reason why this problem doesn’t happen in the CL implementation is that in CL classes don’t appear as values of variables. To find a class by name we use the function find-class, such as (find-class 'x). the symbol x is not bound to the value of the class as in clojure.

DrLjotsson · October 10, 2020, 12:34pm

Maybe you need a macro?

didibus · October 10, 2020, 6:53pm

Hum, resolve should find the class even if it is not imported in the namespace, as long as the class is fully qualified.

If the class isn’t fully qualified, then it must look in some namespace to figure out where this class is supposed to be coming from, so you need a namespace to give it the package of the class to look up.

This is true of all symbols actually. If your symbol isn’t fully qualified, then resolve has to have some context in which to look for it, which is the namespace you give ns-resolve. But if the symbol is fully qualified, resolve will be able to find it (unless the symbol hasn’t been defined yet but you can use requiring-resolve for that https://clojuredocs.org/clojure.core/requiring-resolve

Edit:

I don’t know common-lisp and was a bit curious. So from reading, if I understand correctly common-lisp has a global namespace. That’s why it always works in common-lisp. Clojure also has a global namespace, but the global namespace forces symbols to be qualified. That said, most users don’t want to make use of qualified symbols in their code, because they are longer to type. That’s when Clojure says, if you are using only the name portion of a symbol (and not the full namespace/name), then Clojure will look to a symbol mapping in the current namespace to resolve the symbol. And when you require and import, you are adding to that current namespace symbol map.

It looks like in common-lisp there are packages as well, with use-package, defpackage, in-package and export. But that seems actually quite different to how it works in Clojure. Because the user will be working in their own package, and they will be using symbols from other packages which will intern them inside their own package. Now to be fair, I’m not sure if your common-lisp solution would work with common-lisp packages either.

Jim_Newton · October 11, 2020, 9:30am

That’s not really a correct summary of how Common Lisp works. CL equivalent of the Clojure namespace concept is the package. In CL each package contains a list of symbols and they are classified as exported or not.

When you say the CL has a global namespace. that’s a bit misleading. All packages are global, ie. CL does not support local, private, or hierarchical packages. Given a name find-package will find it.

CL uses the word namespace as well but it is not a formally defined concept. There is a namespace for functions and for variables and for classes. Thus you can have a function, and variable, and class with exactly the same name, in exactly the same package and the semantics of the name is not confused. But this type of namespace is not an object, it is just part of the definition of how names are resolved to objects (functions, classes, variables, etc).

Jim_Newton · October 11, 2020, 9:36am

The thing that I really don’t understand (although my code seems to work) is that resolve uses *ns* in executing code that I wrote and theoretically have sent to a customer. However, when the customer runs the program his value of *ns* might be different than my value when I developed and tested the program. So what must I do so that I get the same behavior regardless of the user’s personal value of *ns*?

My proposed solution to this problem was that every time I never call resolve directly, but rather call ns-resolve, but then which namespace should I provide it?

My experimentation seems to show that classes are resolved independent of the namespace. If someone has a counter example, I’d love to see it to help me understand what’s really doing on.

Jim_Newton · October 11, 2020, 11:24am

Not sure what you mean. how would a macro help? Perhaps a macro which expands to a call to ns-resolve inserting the namespace, but which namespace would it insert into the call?

DrLjotsson · October 11, 2020, 12:44pm

I’m a Clojure newbie and I don’t think I fully understand your question, but “macro” popped up in my head when read it. My thinking was that since the macro is expanded in place (from where the user is giving the symbol to you), it should be able to figure out to what namespace the symbol belongs, and then use that to call resolve.

didibus · October 11, 2020, 5:07pm

You can imagine that Clojure symbols are interned as such:

{<namespace> {<name> <var>}}

So if you have:

(ns my.program.core)
(def hello 10)
(ns my.program.other)
(def hello 20)

You’d get:

{"my.program.core" {"hello" #Var 10}
 "my.program.other" {"hello" #Var 20}}

So when you call resolve it’s basically just doing a lookup in that map. Which means you need two keys, because it’s a nested map, basically you need:

(get-in interned-symbols [namespace name])

But symbols in Clojure are a data-structure of two elements:

;; Imagine they are akin to:
(defrecord Symbol [namespace name])
;; Where namespace is optional, and name is required

So when you call resolve, if you give it a symbol that has a namespace, it doesn’t care about *ns*, it has everything it needs to lookup the binding. But if you resolve a symbol that only has a name, it doesn’t know in what namespace to look it up, so it will use the value of *ns* for that.

So in effect it will be:

(ns user.program.core)
(require '[my.program.core])

(def hello 30)

@(resolve 'my.program.core/hello)
;;=> 10

@(resolve 'my.program.other/hello)
;;=> 20

@(resolve 'hello)
;;=> 30

So when you provided a symbol that had a namespace to resolve, it just looked up the binding for it. But when you provided a symbol without a namespace, it used the current namespace which is executing the call to resolve.

Now Java classes work similarly. But Java classes don’t have namespaces, they have packages. And the package name is everything before the last . in a symbol’s name. While the class name is the part after the .

{<package> {<class-name> <Class>}}

When the symbol doesn’t have a package, instead of using the namespace of *ns* as package, resolve will use the package that maps to the symbol name from the import declaration found in *ns*.

So for classes that means:

(ns my.program.core)
(deftype Hello [])

(ns my.program.other)
(deftype Hello [])

(ns user.program.core)
(import '[my.program.other Hello])

(resolve 'my.program.core.Hello)
;;=> #Class my.program.core.Hello

(resolve 'my.program.other.Hello)
;;=> #Class my.program.other.Hello)

;; And now a symbol without a package
(resolve 'Hello)
;;=> #Class my.program.other.Hello

So again, when given a symbol with a package, resolve doesn’t care about *ns*, because it has everything it needs to look up the class. But when given a symbol without a package, it will look for one in the current namespace import declaration.

didibus · October 11, 2020, 5:19pm

You most likely don’t want to get the same behavior regardless of their value of *ns*. Because I assume you want your lib to behave from the perspective of the user program.

Like, the user gives you symbols that points to Classes, but if they give you an ambiguous symbol, i.e., a symbol without a package? Then what?

The question is for your library, what do you want to default too when they give you such a symbol?

So say they gave you 'Account as a symbol. Now most likely you do not have an 'Account class in your library. So I doubt you want to default to some package inside your library, because most likely you won’t have any of the classes the user is giving you. Those are going to be classes from the user’s program.

But if you did, then you’d use ns-resolve and give it a namespace from your library where you’ve declared inside an import all of those classes.

But as I said, normally that’s not what people want. So if you expect the user tells you where to find their Account class, then you want resolve to use the current namespace to find it. The current namespace will be the user’s namespace because *ns* is a dynamic var. So now your library code will be able to lookup the package of 'Account from their import declaration instead.

Jim_Newton · October 11, 2020, 10:55pm

I still don’t see how that can work. The problem is that the user provides a quoted data structure such as a quoted list or vector or a list of lists of lists of lists… with a symbol such as Account in it somewhere. When my code parses this list it notices Account and interprets it as the value of (resolve 'Account) which depends on the user’s value of *ns*. The reason the user thinks this works is because the symbol appears in a source code file which defines the actual namespace.

However, for such a function to work later, the code which interprets the data structure must remember that the code was defined when the namespace was a certain thing, NOT that the function was executed when the namespace was a certain thing.

This seems like a really difficult problem for defining application specific DSLs. I know how this is solved using the Common Lisp mindset, but I don’t know how to solve it using the Clojure mindset, and the problem is indeed subtle.

Jim_Newton · October 11, 2020, 10:59pm

Perhaps as @DrLjotsson suggested, You cannot use quoted data in clojure, but rather you have to require the user to wrap the data in a macro call. The macro will be expanded at compile time when *ns* is actually the same namespace as the file defines. The macro can then walk the quoted list and replace all such problematic symbols with the value of (resolve 'that-symbol). Thus causing serialization/deserialization problems later.

This is really ugly as it forces the person using the API to understand how the implementation works, so it is an ugly, leaky abstraction.

joinr · October 12, 2020, 3:31am

You are delegating symbol resolution to runtime…so without qualified symbols that explicitly point to a concrete class, resolution will have to start somewhere. You can, of course, go about looking through all possible classes for things, and hope you don’t find duplicate/ambiguous matches from different packages. This seems non-ideal.

The better solution would be to sit atop a registry, as clojure spec does (and to an extent, it appears CL does so as well). There’s nothing stopping you from defining your own cache of symbols that are registered to known classes, and using that for symbol resolution. Users can opt in to this behavior (as one would in CL via defclass) and define their own symbols as rules. This expands the options to keywords as well, which clojure spec leverages quite a bit (e.g. the :: double colon reader macro provides a ns-qualified keyword, like ::blah would be :current-ns/blah; it’s idiomatic to use ns-qualified keywords for spec references).

The other option is to prevent users from projecting their own symbols onto the ones you have defined, absent a fully qualified class (e.g. one that exists as.a.package.with.a.class.in.java or.a.namespace.with.a.type.or.class.in.clojure). The resolution is quite simple then; if the symbol maps to a known class, use the class (onerous on the caller but clean). Otherwise, look up the symbol in your class/rule registry. If nothing exists, delegate to ns-local resolution. I think that would cover most cases and preserve portability.

If users want new symbols to act as patterns, they can define them via your API, or use explicit classes, or rely on the graces of the implicit mappings from import. Or maybe you eliminate the import route if you don’t want to leave that possible ambiguity open.

joinr · October 12, 2020, 3:39am

You can register the library-specific meaning of your quoted data somewhere, and allow the users to extend this meaning via some means, as CLOS does. Macros do not seem explicitly necessary, although they could be useful in eliminating the need to quote stuff.

Jim_Newton · October 12, 2020, 9:03am

That’s not a bad idea actually. That way the user is responsible for resolving symbols that I don’t know about.

didibus · October 13, 2020, 3:45am

Why don’t you just resolve the symbols at the point they are given to you?

Alternatively, just validate they arn’t giving you unqualified symbols and force them to provide you fully qualified ones.

You could also just force them to give you the class itself, instead of taking the symbol.

And finally, like joinr said, you can also choose to search the classpath, but that risk having ambiguous results if somehow there’s many classes of that name.

Jim_Newton · October 13, 2020, 8:41am

This is of course a possibility, but it causes serialization problems. I’d rather store the raw data structure the way the user gave it to me. If I replace one which has been resolved then it will no longer look like the user provided it. It will also cause problems in testing because I’ll have to have much smarter functions to test for identity. Basically I’ll have to add the resolver into the testing flow.
This could be done of course but I’d like to avoid it.

Jim_Newton · October 13, 2020, 8:45am

forcing the user to qualify all symbols seems like a very hostile user interface. I’d much rather allow the user to type

'(or (satisfies list?) 
     (and Integer (not (satisfies odd?))))

than to force him to type

'(or (satisfies clojure.core/list?) 
     (and java.lang.Integer (not (satisfies clojure.core/odd?))))

Jim_Newton · October 13, 2020, 8:48am

Not sure how that would work as a user interface. How would the user do that for a list like the following?

'(or (satisfies list?) 
     (and Integer (not (satisfies odd?))))

it would have to be something like the following, which would be a very hostile user interface.

`(~'or (~'satisfies ~list?) 
       (~'and ~Integer (~'not (~'satisfies ~odd?))))

or perhaps

(list 'or (list 'satisfies list?) 
          (list 'and Integer (list 'not (list 'satisfies odd?))))

But I’m not even sure how it would be possible when the user is invoking a macro which auto-quotes his given pattern

(rte-case some-sequence
  (:* Integer) 
  100

  (:+ (:or (:and Integer (not (satisfies odd?)))
           Double) 
  200)