Parsing

Example: Propositional Logic

  • Variable: P, Q, R, …
  • Constants: T/F, 0/1, etc.
  • Binary operations: \(\land, \lor, \implies, \iff\)
  • Unary operations: \(\lnot\)
  • Parentheses
E := ( E B E ) | ( U E ) | C | V
B := V | ^ | => | <=>
U := ~
C := 0 | 1
V := P | Q | R | ... | Z

Let’s parse: \((((P \lor Q) \land R) \lor (Q \land (\lnot P)))\)

_images/parsing1.png

A simple parser algorithm (not really robust, or working for other grammars):

  • Init stack with \(\bot\)
  • Scan left to right
  • If open paren, push
  • If operator, push
  • If constant, push pointer to it
  • If variable, lookup in symbol table, push pointer to it
  • If close paren, reduce:
    • Allocate storage for node in expression tree
    • Pop object (ptr to right operand, could be const, var, or another node)
    • Pop object (should be the operator), save in node
    • If operator is binary:
      • Pop object (ptr to left operand)
    • Pop object (should be left paren)
    • Push pointer to new node
_images/parsing2.png _images/parsing3.png _images/parsing4.png _images/parsing5.png

Ambiguity

The above example uses parens everywhere so we don’t have to worry about order of ops. What if we don’t?

E := EBE | UE | C | V | (E)
B := V | ^ | => | <=>
U := 0 | 1
V := P | Q | R | ... | Z
_images/parsing6.png

We need to give precedence to some operators. Let’s look at a math grammar and apply OoO:

E := E+E | E-E | E*E | E/E | -E | C | V | (E)
C := 0 | 1
V := a | b | c
_images/parsing7.png

There’s a lot of ambiguity without precedence:

_images/parsing8.png

Solution 1: Design

Design the grammar so that there is no ambiguity.

E := E+F | E-F | F
F := F*G | F/G | G
G := -G | H
H := C | V | (E)
C := 0 | 1
V := a | b | c

This gives the following precedence:

  1. unary -
  2. *, /
  3. +, -
  4. (
  5. \(\bot\)
_images/parsing9.png

This changes our parsing algorithm to check the operator under the operand on the stack when it reaches an operator - reduce if it has higher or equal precedence, push if not:

_images/parsing10.png

Ok, this screenshot kind of failed, but same thing: push * then c:

_images/parsing11.png

Now, when we parse the next operator, we see that the operator below (*) has higher precedence: so we need to reduce!

_images/parsing12.png

Repeat, now the operand underneath is +, which has equal precedence - so we reduce again (because we’re parsing left to right)

_images/parsing13.png

And check again, the bottom of the stack has super low precedence so we’re fine. Push.

_images/parsing14.png

Now we’re at the end of the string, so reduce as much as possible (in this case 1 reduce):

_images/parsing15.png