Need a bit of help with an instaparse grammar

Hey guys,

I am playing with instaparse and I have a problem contructucting a grammar.

Here is what I am going for

(insta/defparser ex7
  "
  doc = (text | tag)*
  text = #'[^@]*'
  tag = '@' #'[a-z]*' inner-text*
  inner-text = '{' #'[^}]*' '}'
  ")

(ex7 "some text @toto{inner text}")

The problem is the parser when parsing a tag rule won’t consider the inner-text rule giving me the parse

[:doc [:text "some text "] [:tag "@" "toto"] [:text "{inner text}"]]

instead of the desired :

[:doc [:text "some text "] [:tag "@" "toto" [:inner-text "{" "inner text" "}"]]]

Any idea how I can modify the grammar so that the parser would consider the inner-text rule before going back to the text one ?

ordered choice operator

Instead of alternation |:

  doc = (text | tag)*

try the ordered choice operator /:

  doc = (tag / text)*

debugging

As you may know, insta/parses helps debug ambiguous grammars, by showing all parses.

(->> (insta/parses ex7 "some text @toto{inner text}")
     clojure.pprint/pprint)

ensuring tags, even if empty

I’d tend to ensure an :inner-text, even if empty, just to keep things regular:

  tag = '@' #'[a-z]*' inner-text
  inner-text = ( '{' #'[^}]*' '}' )*
1 Like

Hi @JeremS, if this answer solved your problem then you can mark it as “resolved” by checking the little checkmark underneath @floop’s post! That way others know that your question has been answered and you don’t need more help right now.

1 Like

Hey guys thank you for the reply. The ordered choice operator works !

As a conclusion to this thread, here is some code using @floop solution and another suggested to me by Alex Engelberg on the instaparse channel of the clojure slack.

Here is @floop’s solution using PEG’s ordered choice operator in the doc rule:

(insta/defparser ex-with-choice
  "
  doc        = (tag / text)*
  text       = #'[^@]*'
  tag        = <'@'> #'[a-z]\\w*'  text-block*
  text-block = <'{'> inner-text <'}'>
  <inner-text> =  #'[^}]*'
  ")

Alex’s solution uses PEG’s negative lookahead operator:

(insta/defparser ex-with-lookahead
  "
  doc        = (text | tag)*
  text       = #'[^@]*'
  tag        = <'@'> #'[a-z]\\w*'  text-block* !text-block
  text-block = <'{'> inner-text <'}'>
  <inner-text> =  #'[^}]*'
  ")

The solutions seem to yield the same results


(= (ex-with-choice "some text @tag1{inner text}{inner text 2} some text 2 @tag2 other text")
   (ex-with-lookahead "some text @tag1{inner text}{inner text 2} some text 2 @tag2 other text"))
;=> true

Which is:

(ex-with-choice "some text @tag1{inner text}{inner text 2} some text 2 @tag2 other text")

;=>
[:doc
 [:text "some text "]
 [:tag "tag1" [:text-block "inner text"] [:text-block "inner text 2"]]
 [:text " some text 2 "]
 [:tag "tag2"]
 [:text " other text"]]

The PEG extensions in instaparse are very useful indeed.

Thx again. Cheers,

4 Likes