Writing a new parser for the prolin library using instaparse

https://github.com/levand/prolin is a linear programming (LP) library. I’m currently learning LP, and this library is a good match for my learning. The fork of pez was the one on clojars.
A linear program takes a linear polynomial on the form of
a_0*x_0 + … + a_n*x_n = 0
called the objective function, and some constraints over this and maximizes / minimizes the expression. Its all in the library. I find it cool that the library works nicely after 8 years with no maintenance.

This polynomial is written in in a string format. And as it is said in the readme for prolin, the parsing of string-equations is limited.

I was just a bit frustrated by its parsing of polynomials. It cannot do things like “x_0 + x_1”, it has to be “a + b”… but my book uses x_1, x_2 and everything!

So I spent most of last night writing code. Just for my own satisfaction.

Using https://github.com/Engelberg/instaparse as a parser I wrote some attempts on a grammar, until I was happy enough with it. Then some postwalk for cleaning up the tree.
Lastly I needed to move variables to the left hand side of the expression, because math has rules.

It is my first time using instaparse for something that I will actually use, so there is probably many things I could do better. I think my parsing code is a bit… esoteric, maybe even squirrel-y. But it works now.

I guess that I’ll look back at this in a year and be like :face_vomiting:, but hey, that is what learning is about.

I’m just putting this out here, in case anyone is interested.

(ns lp.parsepoly
      (:require [instaparse.core :as insta]
                [clojure.walk :refer [postwalk]]))

(def polyparser
  (insta/parser
   "<S> = qq
    <qq> = opnum opnum* (comp opnum+)?
    opnum = <ws?> sign? <ws?> (product/value)
    product = (value <ws?> <'*'> <ws?> value)|(number value)
    variable = #'[A-z]' ('_'? number)?
    comp = <ws?> #'<=|>=|=' <ws?>
    <value> = variable/number
    <number> = #'[0-9]+(.[0-9]+)?'
    sign = #'[-|+]'
    <ws> = #' '"))

(defn x->int [x]
  (try
    (if (int? x)
      x
      (read-string x))
    (catch Exception ex
      (println "error," x "is not num-able")
      :not-num-able)))

(defn if-vect-with-key [x key f]
  (if (and
       (vector? x)
       (= key (first x)))
    (f x)
    x))
(defn merge-varname [x]
  (if-vect-with-key
   x :variable
   (fn [x] [:variable (apply str (rest x))])))
(defn reduce-plain-var [x]
  (if-vect-with-key
   x :variable
   (fn [x] [(keyword (second x)) 1])))
(defn reduce-product [x]
  (if-vect-with-key
   x :product
   (fn [x] (let [r (flatten (rest x))
                 k (first (filter keyword? r))
                 factor (transduce (comp
                                    (filter #(or (string? %) (int? %)))
                                    (map x->int)) * x)]
             [k factor]))))
(defn fix-sign [x]
  (if (and
       (vector? x)
       (= :opnum (first x))
       (= 3 (count x)))
    (let [[_ [skw sign] const-or-var] x]
      (if (= :sign skw)
        (let [sfactor (if (= "+" sign) 1 -1)]
          (if (string? const-or-var)
            [:opnum (* sfactor (x->int const-or-var))]
            [:opnum [(first const-or-var) (* sfactor (second const-or-var))]]))
        x))
    x))

(defn remove-opnum-tag [x]
  (if-vect-with-key
   x :opnum
   (fn [x] (if (= (count x) 2)
             (let [[_ key-val] x]
               (if (or (string? key-val) (int? key-val))
                 [:constant (x->int key-val)]
                 key-val))
             x))))

(defn format-for-prolin-rf
  "reducing function for pass over a parsed tree."
  ([] {:constant 0
       :variables {}})
  ([result [k v]]
   (case k
     :constant (merge result {:constant v})
     :comp (merge result {:relation (symbol v)})
     (merge result
            {:variables
             (merge (:variables result) [k v])}))))

(defn move-constant-to-lhs [x]
  (transduce
   (comp
    (filter (fn [[a b]] (= a :constant)))
    (map second))
   (fn
     ([] 0)
     ([a] a)
     ([a b] (- a b)))
   x))

(defn merge-constants [lhs rhs]
  (let [lhs-add-term (merge (into {} rhs) {:constant (move-constant-to-lhs rhs)})]
    (vec (merge-with +
          (into {} lhs) lhs-add-term))))

(defn poly-to-prolin-dict 
  "use to convert a string polynomial to a dict suitable for the prolin library
{:constant (0|c)
:variables {:v0 x0 :v1 x1 ...}
[:relation (=|>=|<=)] }"
  [poly]
  (let [xf (comp remove-opnum-tag fix-sign reduce-product reduce-plain-var merge-varname)
        pr (postwalk xf (polyparser poly))
        [lhs rhs] (split-with (fn [[a b]] (not= a :comp)) pr)
        pr-corr-sign-c (merge-constants lhs rhs)]
    (reduce format-for-prolin-rf (format-for-prolin-rf) pr-corr-sign-c)))

tests:

(ns lp.parsepoly-test
  (:require
   [lp.parsepoly :refer :all]
   [clojure.test :refer [deftest is testing run-tests]]))


(def poly1 "10a + 1b -2c")
(def poly2 "x1 + x2 - x3")
(def poly3 "x1 + (x2 - x3)")
(def poly4 "x_1 + 5")
(def poly5 "x_0 + 1 + z")
(def poly6 "0.1x_0 + 0.2x_1")
(def poly7 "y - 5 <= 0")
(def var0 "x")

(deftest alltests
  (is (= (poly-to-prolin-dict var0) {:constant 0 :variables {:x 1}}))
  (is (= (poly-to-prolin-dict poly1) {:constant 0 :variables {:a 10 :b 1 :c -2}}))
  (is (= (poly-to-prolin-dict poly2) {:constant 0 :variables {:x1 1 :x2 1 :x3 -1}}))
  ;(is (= (poly-to-prolin-dict poly3) {:constant 0 :variables {:x1 1 :x2 1 :x3 -1}}))
  (is (= (poly-to-prolin-dict poly4) {:constant 5 :variables {:x_1 1}}))
  (is (= (poly-to-prolin-dict poly5) {:constant 1 :variables {:x_0 1 :z 1}}))
  )

(deftest regressions0
  (is (= (poly-to-prolin-dict poly6) {:constant 0 :variables {:x_0 0.1 :x_1 0.2}}))
  (is (= (poly-to-prolin-dict "x <= 1") {:constant -1 :variables {:x 1} :relation '<=}))
  (is (= (poly-to-prolin-dict poly7) {:constant -5 :variables {:y 1} :relation '<=}))
  )

usage with prolin:

(ns lp.notes
  (:require
   [lp.parsepoly :refer [poly-to-prolin-dict]]
   [prolin :as p]
   [prolin
    [protocols :as pp]
    [commons-math :as cm]]))
;; https://github.com/PEZ/prolin

(defn poly
 "make poly with custom parser"
  [string]
  (let [pd (poly-to-prolin-dict string)]
    (pp/linear-polynomial (:constant pd) (:variables pd))))
(defn build-constraint
 "helper for constraint"
 [item]
  (let [p (poly-to-prolin-dict item)]
    (pp/constraint
     (:relation p)
     (pp/linear-polynomial (:constant p) (:variables p)))))
(defn constraint 
 "make constraint(s) with custom parser"
 [items]
  (if (set? items)
    (set (map build-constraint
              items))
    (build-constraint items)))


(pp/constraint '= (pp/linear-polynomial 0 {:x 2 :y -1}))
;; => {:r =, :p {:c 0, :v {:x 2, :y -1}}}
(constraint "2x - y = 0")
;; => {:r =, :p {:c 0, :v {:x 2, :y -1}}}

Instaparse is heaps of fun!

P.S.: when cleaning up the result, please don’t use Clojure core’s read-string. As its docstring says, “read-string can execute code… For data structure interop use clojure.edn/read-string.”

Mathematical programming (LP, IP, MIP, QP, etc.) is pretty cool and common in operations research (under the field of Optimization). Glad to see it bubble up again. Last time I looked at Prolin, comparing it to something like Pulp, pulp seemed to have a better eDSL. Pulp makes it pretty easy to programmatically generate large collections of equations using python’s flow control (e.g. comprehensions/iterators). Other commercial systems (like GAMS) have a similar facilitary in their own scripting language that allows set definition, and control flow for generating equations.

Instaparse could be an interesting middle ground to allow general mathematical script to be translated fairly directly into formulations. Note: you can go further (more portably) if you parse from the formulation into mps which can be portably fed to any decent solver. I also think prolin rests on apache.commons.math primal simplex solver, and doesn’t have a branch-and-bound solver for MIP problems (that would be an interesting exercise). If you’re only doing LP though, it’s probably plenty (although I don’t know what the upper bounds on tractable problem instances are for it).

I think @Linus_Ericsson might have written a solver that doesn’t rely on the apache stuff. Not sure if it’s in a sharable state, but as I remember it, it needed a parser.

You mean a linear programming solver from the ground up? E.g. simplex (primal/dual) or interior point? Very curious to see if so.

I found this post from Paul Khuong on implementing a solver in CL to be pretty informative, for what it’s worth. Thought about doing something similar on top of neanderthal.

Yes, from the ground up. Simplex. I don’t know what primal/dual means here, nor interior point, so couldn’t tell you. Let’s hope Linus sees this and can enlighten us.

Our use case: We needed something that worked both for Clojure and ClojureScript for our isomorphic web app. We were initially using https://github.com/cljsjs/packages/tree/master/simplex-solver (which I put together) for the frontend. And then also for the backend, through Nashorn (iirc). Using cljc was of course much simpler/less complex.

If it’s this you mean https://github.com/claj/prolin , then it uses the Apache simplex solver.

No, we couldn’t use that because we needed it for ClojureScript as well. But also yes, because Linus ported relevant parts of the Apache solver to Clojure. I’m hoping he will find the time to make it available for public consumption.