Dynamic generation of predicate functions

I want to dynamically generate predicate functions (can use and or and not, the example shows just a basic case) based on some data. The following code works as expected but I do feel im missing some essential point here how this should be done. Can anyone show the ideomatic way of doing what the following code does?

(def my-pred?
  (eval
   (into `(fn [~'arg]
            (and ~@(reduce (fn [acc x]
                             (conj acc `(or (= ~x ~'arg)))) [] ["a" "b" "c"]))))))
(my-pred? "a")

The into call is not needed since there’s only one argument.
Also, if you evaluate that syntax-quoted form, you’ll see that it’s

(fn [arg]
  (and
    (or (= "a" arg))
    (or (= "b" arg))
    (or (= "c" arg))))

which doesn’t make a lot of sense and always returns false, regardless of the value of arg.

I do feel im missing some essential point here how this should be done

Depends on the actual goal you want to achieve. Being able to generate functions at run time is a tool, not a goal.

The most likely solution is “don’t do it”. :slight_smile: There are almost always better alternatives.

The given expression is just an example, it dont have to make sense. The use case I have is that I parse some logical expressions into an AST in runtime and then I want to turn this AST into an executable Clojure function. So I do think I have a valid usecase.

what does your actual ast look like? Can you provide an example of the input you would like to transform?

The given expression is just an example, it dont have to make sense

But when an example makes sense, it certainly helps.

So I do think I have a valid usecase.

I’m not saying that the use case is invalid, I’m saying that the approach doesn’t always have to involve dynamic code evaluation.

Depending on the AST and the requirements, it might be possible to easily compose primitive predefined functions. Or maybe your whole AST is known at macro expansion time and then you can rely on macros for at least some of the parts. Or the approach can be a combined one.
It’s also fine to fully rely on eval if you completely trust the data. In that case, you only need to convert your AST into Clojure’s AST, and that’s pretty much what your code in the OP is doing.

This is 3 examples of the ASTs @joinr . I would like to join a number of such logical expressions under a common and clause. The expressions I parse are read from a database so each query will generate a different set of expressions.

[:not [:or [:term "a" 3] [:term "a" 4]]]

[:term "a" 8]

[:and [:term "a" 44] [:term "a" 32]] ; bad example but just to show i also have and

What I would like is to get:

(fn [x] 
(and
(not (or (= x 3) (= x 4)))
(= x 8)
(and (= x 44) (= x 32))
))

FWIW, we have a DSL at work that we parse to an AST via Instaparse and then build predicates out of without using eval. You can build arbitrary predicates without resorting to that.

;;this one can actually match 8 and 10....
(def goodexpr
  [:and
   [:not [:or [:term "a" 3] [:term "a" 4]]]
   [:or [:term "a" 8] [:term "a" 10]]
   [:not [:term "a" 44]]])

(defn <and> [l r]
  (or (and l r) (reduced false)))

(defn <or> [l r]
  (or (and (or l r)
           (reduced true))
    false))

;;or/and have [op clause1 clause2 ...]
;;term [op val]
;;not [op [op ...]]

(defn interp [expr x]
  (case (expr 0)
    :term (= (expr 2) x)
    :not  (not (interp (expr 1) x))
    :and  (reduce <and> true  (map #(interp % x) (rest expr)))
    :or   (reduce <or>  false (map #(interp % x) (rest expr)))
    (throw (ex-info "unknown op!" {:in expr :x x}))))

(defn matcher [expr]
  (fn [x] (interp expr x)))

;;what if you "need" to compile this?

(defn key->sym [k] (-> k name symbol))

(defmacro compile-match [arg expr]
  (let [op (expr 0)]
    (case op
      :term `(= ~(expr 2) ~arg)
      :not  `(not (compile-match ~arg  ~(expr 1)))
      (:and :or)
      (let [children (for [child (rest expr)]
                       `(compile-match ~arg ~child))]
        `(~(key->sym op) ~@children))
      (throw (ex-info "unknown op!" {:in expr :arg arg})))))

(defn matcher2 [expr]
  (eval `(fn [x#] (compile-match x# ~expr))))

usage:

(let [f1 (matcher goodexpr)
      f2 (matcher2 goodexpr)]
  (for [i (range 20)] [i (f1 i) (f2 i)]))

([0 false false] [1 false false] [2 false false] [3 false false] [4 false false] [5 false false] [6 false false] [7 false false] [8 true true] [9 false false] [10 true true] [11 false false] [12 false false] [13 false false] [14 false false] [15 false false] [16 false false] [17 false false] [18 false false] [19 false false])

Using runtime eval is idiomatic, but there are tradeoffs. You get to compile your predicate function and enjoy jit’d performance (does that matter?) compared to what hotspot can try to optimize with the interpreted variant. You also operate under an open-world assumption, since you now eval and load the resulting classes at runtime. This prevents you from running your code on things like (default) cljs, or packing a native-image graal to provide a small, fast-starting executable. If those don’t matter though, and you just want to leverage the runtime evaluation that e.g. the repl provides, then eval away :slight_smile: [Include security disclaimer about evaluating arbitrary/untrusted forms, although the little compiler setup here would probably distance you quite a bit from getting pwned).

3 Likes

This topic reminds me of this :blush:

Probs not what you’re looking for, but some-fn and every-pred are cool functions in this space.

Hey @Henrik_Larsson ,

This is quite an interesting topic! You’re basically writing a little compiler.

The thing I find counterintuitive, yet it’s true, is that an interpreter and a compiler can have a very similar structure. Both are usually recursive functions with a big conditional, one branch for each kind of expression you have. The difference is that interpreters return the answer, while compilers return the code that will calculate the answer.

I like to take advantage of that and develop my language in an interpreter first, because it’s typically less tricky. I can write test cases for it and everything. Then, when I’ve got it where I want it, I can copy the code and convert it to the compiler version. The other cool thing is that you can then compare the outputs of both. They should be the same.

I’ll start with the interpreter for your language. Here’s the example AST you provided:

[:not [:or [:term "a" 3] [:term "a" 4]]]

Here’s an interpreter for this and and. I’m assuming :term means to compare for equality:

(defn interpret [[op & [a b]]]
  (case op
    :term (= a b)
    :not (not (interpret a))
    :or (or (interpret a) (interpret b))
    :and (and (interpret a) (interpret b))))

I’ll leave the case of more than two arguments to you. But now I can convert it into a compiler. I’ll make it return Clojure code:

(defn compile [[op & [a b]]]
  (case op
    :term `(= ~a ~b)
    :not `(not ~(compile a))
    :or `(or ~(compile a) ~(compile b))
    :and `(and ~(compile a) ~(compile b))))

That returns Clojure code. You can convert it to a function like this:

(defn make-function [expr]
  (eval `(fn [] ~(compile expr))))

This compiler returns Clojure code, so it needs an eval to run it. Since it’s wrapped in a fn form, the eval simply compiles it into a function. But we can take that and make the compiler work directly on functions. That way, we don’t need to work with code and eval. It’s still a compiler, so it has the same structure:

(defn compile2fn [[op & [a b]]]
  (case op
    :term #(= a b)
    :not (let [f (compile2fn a)]
           #(not (f)))
    :or (let [fa (compile2fn a) 
              fb (compile2fn b)]
          #(or (fa) (fb)))
    :and (let [fa (compile2fn a) 
               fb (compile2fn b)]
           #(and (fa) (fb)))))

It’s still quite fast, but it doesn’t work with code. Code writing code can get hairy.

I think I prefer the compiler to functions. I like to reserve returning code for macros. And I reserve macros for stuff that has to run at compile time.

Rock on!
Eric

4 Likes

Needed an 1-arg vs the original 0-arg fn for the predicate to work (unless I missed something). Also needed to condition the input to conform to binary op assumptions (from OP, I think and could be multiple arity but meh).

For grins, it looks like

(defn compile2fn [[op & [a b]]]
  (case op
    :term (fn [x] (= x b))
    :not (let [f (compile2fn a)]
           (fn [x] (not (f x))))
    :or (let [fa (compile2fn a)
              fb (compile2fn b)]
          (fn [x] (or (fa x) (fb x))))
    :and (let [fa (compile2fn a)
               fb (compile2fn b)]
           (fn [x] (and (fa x) (fb x))))))

I get the eval’d version ~114x faster than interpreted, function transform is 31x faster than interpreted. function transform is a nice compromise. seems like something that could be automated a bit since the patterns seem to repeat regardless of op.

[edit for completeness and personal curiosity]

If you add a basic unifying lifting operation to catch any expression that is inferred as a literal, then you get an automatic base case. From there we can define a declarative op mapping that a compiler can ingest and then build without needing runtime eval per compile2fn style. I infer a single-arg for the predicate fn (implied to be x, but one could define arbitrarily complex predicate function args in theory).

(require '[clojure.walk :as w])

(defn lift-body [fname [params body]]
  (let [known  (set params)
        used   (->> body flatten (filter known))
        lifted (for [p used]
                 [p [(symbol (str "lift-" (name p))) `(~fname ~p)]])
        binds (vec (mapcat second lifted))]
    [binds
     `(~'fn [~'x]
       ~(w/postwalk-replace (into {} (for [[p [lp _]] lifted] [p `(~lp ~'x)])) body))]))

(defmacro compiler [op-map]
  (let [expr (gensym "expr")
        fname (gensym "compiler")]
    `(fn ~fname [~expr]
       (if-not (vector? ~expr)
         (fn [~'x] ~expr)
         (let [op# (~expr 0)]
           (case op#
             ~@(apply concat
                      (for [[k params-body] op-map]
                        (let [[binds body] (lift-body fname params-body)]
                          [k `(let [~(params-body 0) (subvec ~expr 1)
                                    ~@binds] ~body)])))))))))

yielding

(pprint (macroexpand-1 '(compiler {:term [[a b] (= x b)]
                                   :not  [[a]   (not a)]
                                   :or   [[a b] (or  a b)]
                                   :and  [[a b] (and a b)]})))

(clojure.core/fn
 compiler1622
 [expr1621]
 (clojure.core/if-not
  (clojure.core/vector? expr1621)
  (clojure.core/fn [x] expr1621)
  (clojure.core/let
   [op__1489__auto__ (expr1621 0)]
   (clojure.core/case
    op__1489__auto__
    :term
    (clojure.core/let
     [[a b] (clojure.core/subvec expr1621 1) lift-b (compiler1622 b)]
     (fn [x] (= x (lift-b x))))
    :not
    (clojure.core/let
     [[a] (clojure.core/subvec expr1621 1) lift-a (compiler1622 a)]
     (fn [x] (not (lift-a x))))
    :or
    (clojure.core/let
     [[a b]
      (clojure.core/subvec expr1621 1)
      lift-a
      (compiler1622 a)
      lift-b
      (compiler1622 b)]
     (fn [x] (or (lift-a x) (lift-b x))))
    :and
    (clojure.core/let
     [[a b]
      (clojure.core/subvec expr1621 1)
      lift-a
      (compiler1622 a)
      lift-b
      (compiler1622 b)]
     (fn [x] (and (lift-a x) (lift-b x))))))))
(def my-compiler (compiler  {:term [[a b] (= x b)]
                             :not  [[a]   (not a)]
                             :or   [[a b] (or  a b)]
                             :and  [[a b] (and a b)]}))

(def matcher (my-compiler goodexpr))

(for [i (range 20)] (matcher i))
;(false false false false false false false false true false true false false false false false false false false false)
1 Like

Amazing feedback, thanks all! It will take me some time to digest and understand all the info in here.

If you want, you can also compile to JVM bytecode . I could swear I wrote a Notebook on that a few years ago in a repository that could be hosted on mybinder but I cannot find it :frowning: .
All I can find for now is my org-mode notes and code snippets in case anyone is interested.
I would not use the same library now because I found insn since.

* Parsing

#+BEGIN_SRC clojure
(def arith
  (insta/parser
   "prog = expr*
    <expr> = assig | addsub
    assig = varname space? <'='> space? expr
    <addsub> = multdiv | add | sub
    add = addsub space* <'+'> space* multdiv
    sub = addsub space* <'-'> space* multdiv
    <multdiv> = factor | mult |div
    mult = multdiv space* <'*'> space* factor
    div = multdiv space* <'/'> space* factor
    <factor> = number | <'('> space* expr space* <')'> | varget |assig
    <space> = <#'[ ]'>
    number = #'[0-9]+'
    varget = varname
    varname = #'[a-zA-Z]\\w*'"))

#+END_SRC
prog *intrs

* Interpret
#+BEGIN_SRC clojure
(defn interpret-instr [env ast]
  (insta/transform {:prog identity
                    :assig (fn[{varname :res_ :as env1} {value :res_ :as env2}]
                             (assoc (merge env1 env2) varname value :res_ value))
                    :add (fn[{v1 :res_ :as env1} {v2 :res_ :as env2}]
                           (assoc (merge env1 env2) :res_ (+ v1 v2)))
                    :sub (fn[{v1 :res_ :as env1} {v2 :res_ :as env2}]
                           (assoc (merge env1 env2) :res_ (- v1 v2)))
                    :mult (fn[{v1 :res_ :as env1} {v2 :res_ :as env2}]
                            (assoc (merge env1 env2) :res_ (* v1 v2)))
                    :div (fn[{v1 :res_ :as env1} {v2 :res_ :as env2}]
                           (assoc (merge env1 env2) :res_ (quot v1 v2)))
                    :number #(assoc env :res_ (Integer/parseInt %)) 
                    :varname #(assoc env :res_ (keyword %)) 
                    :varget (fn [{varname :res_ :as env1}]
                              (assoc env1 :res_ (varname env1)))}
                   ast))
#+END_SRC

* AST → instr seq

:prog (fn[&instrs] (conj [:reti] (map into [[:args-to-stack] [:loadi 0] ] instrs))))

* instr seq → 
#+BEGIN_SRC clojure
(ns interpret.core
  (:require [instaparse.core :as insta])
  (:import (clojure.asm         Opcodes Type ClassWriter)
           (clojure.asm.commons Method GeneratorAdapter)) )

(def parser-arith
  (insta/parser
   "<expr> = spaces add-sub spaces
    <add-sub> = mul-div | add | sub
    add = add-sub spaces <'+'> spaces mul-div
    sub = add-sub spaces <'-'> spaces mul-div
    <mul-div> = term | mul | div
    mul = mul-div spaces <'*'> spaces term
    div = mul-div spaces <'/'> spaces term
    <term> = number | <'('> expr <')'>
    <spaces> = <#'[ ]'*>
    number = #'[0-9]+'"))

(def arith-interpret {:number #(Long/parseLong %)
                      :mul *
                      :div /
                      :add +
                      :sub -})

(insta/transform arith-interpret (parser-arith "(2 + 1) * 3"))

(defn generate-instr [mv instr]
  "Generate the method call to an  org.objectweb.asm.MethodVisitor for a given instruction."
  (do (print instr)
  (condp = (first instr)
    :load (.visitVarInsn mv Opcodes/ILOAD (int (second instr)))
    :store (.visitVarInsn mv Opcodes/ISTORE (int (second instr)))
    :loadi (.visitLdcInsn mv (int (second instr)))
    :addi (.visitInsn mv Opcodes/IADD)
    :subi (.visitInsn mv Opcodes/ISUB)
    :multi (.visitInsn mv Opcodes/IMUL)
    :divi (.visitInsn mv Opcodes/IDIV)
    :reti (.visitInsn mv Opcodes/IRETURN)
    ))
  mv)

(defn compile-ast [name ast]
"Generate a class of given name with a run method implementing the given ast."
  (let [op-fun (fn[op]
                 (fn[instrs-v0 instrs-v1]
                   (conj (into instrs-v0 instrs-v1) [op])))
        prog (insta/transform {:prog (fn[ & instrs] (conj (reduce into [[:loadi 0]] instrs) [:reti]))
                               :add (op-fun :addi)
                               :sub (op-fun :subi)
                               :mul (op-fun :multi)
                               :div (op-fun :divi)
                               :number #(vector [:loadi (Integer/parseInt %)])}
                              ast)
        generate-prog #(reduce generate-instr % prog)
        cw (ClassWriter. (+ ClassWriter/COMPUTE_FRAMES ClassWriter/COMPUTE_MAXS ))
        init (Method/getMethod "void <init>()")
        meth-name "run"
        meth-sig "()I"]
      (.visit cw Opcodes/V1_6 Opcodes/ACC_PUBLIC (.replace name \. \/) nil "java/lang/Object" nil)
      (doto (GeneratorAdapter. Opcodes/ACC_PUBLIC init nil nil cw)
        (.visitCode)
        (.loadThis)
        (.invokeConstructor (Type/getType Object) init)
        (.returnValue)
        (.endMethod))
      (doto (.visitMethod cw (+ Opcodes/ACC_PUBLIC Opcodes/ACC_STATIC) meth-name meth-sig nil nil )
        (.visitCode)
        (generate-prog)
        (.visitMaxs 0 0 )
        (.visitEnd))
      (.visitEnd cw)
      (let [b (.toByteArray cw)
            cl (clojure.lang.DynamicClassLoader.)]
        (.defineClass cl name b nil))
      (fn [] (clojure.lang.Reflector/invokeStaticMethod name meth-name (into-array [])))))


(def compiled (compile-ast "test" (conj [:prog ] (first (parser-arith "(1 + 2) * 3")))))
(compiled)
#+END_SRC

1 Like

I actually found the repository and notebook from 6 years ago, if anyone is interested is interpreting and/or compiling code from Clojure :
https://mybinder.org/v2/gh/bhugueney/binder-test/HEAD?labpath=ProjectModulo.ipynb

I wanted to do the same with Clojurescript, but I could not find a suitable WASM library to compile in the browser. As anything happened in that space in the last 6 years ?

I think the holdup with wasm is the lack of gc (Andy Wingo from guile lang blogged about it, and how they are working one into the standard). spritely / Guile Hoot · GitLab is their current thing. Apparently gc is coming to wasm now (already in nightly previews on major browsers?), so the having a wasm compile target might be viable going forward for clojure.

1 Like

Thank you for your reply, and indeed. It seems that WASM is not feature complete yet to be a compiler target for Clojure(script?), for lack of gc.
However, my ‘need’ were much less ambitious. I wished to be able to compile not a program in Clojure, but a program from Clojurescript while running in the browser (and being able to call the result from a running Clojurescript program).
I envisioned the ability to use a DSL for instance to perform vectorized operations on arrays of primitive types and compile those on the fly and call them from the Clojurescript program.

I thought it would have been nice when the Clojure(script) thi.ng stopped, if I remember correctly because Karsten Schmidt switched to Typescript because he hit a performance wall with Clojurescript for low level arithmetic operations.

As Java can be Clojure’s assembler (as Mark Engelberg said is a conference talk presenting some optimization libraries like tarantella ) I thought Clojurescript needed the same and WASM should be able to provide it.
All the better if code could also be generated and compile on the fly.

hmmm wasm.clj ? Seems kind of like insn with some convenience layers for decompilation/compilation.