Regular Expressions =================== Regular expressions over alphabet :math:`\Sigma`: .. image:: _static/regex1.png :width: 350 .. image:: _static/regex2.png :width: 350 Ex. Begin with 0, end with 11: ``0(0+1)*11`` Ex. Contains at least 2 1s: ``(0+1)* 1 (0+1)* 1 (0+1)*`` or ``0* 1 0* 1 (0+1)*`` .. note:: ``a+`` is often used as an abbreviation for ``aa*`` Ex. Contains the substring ``111``: ``(0+1)* 111 (0+1)*`` Ex. Even length: ``((0+1)(0+1))*`` Ex. Odd length: ``(0+1)((0+1)(0+1))*`` Ex. Strings that don't end with ``01``: ``e + 1 + (0+1)*0 + (0+1)*11`` - it's hard to exclude things! Ex. Every 0 is followed by at least one 1: ``1* (011*)*`` Ex. 3rd symbol from right is 1: ``(0+1)* 1 (0+1)(0+1)`` Ex. Contains both 01 and 10. ``((0+1)* 01 (0+1)* 10 (0+1)*) + ((0+1)* 10 (0+1)* 01 (0+1)*)`` Ex. Not containing 00. ``(1+01)*(0+e)`` Ex. At most 1 00. Consider: ``(1+01)*`` has no 00 and does not end w/ 0, ``(1+10)*`` has no 00 and does not start w/ 0. So exactly 1 occurance of 00: ``(1+01)* 00 (1+10)*``, and up to 1 is just the combination: ``(1+01)*(0+e) + (1+01)* 00 (1+10)*``. Ex. Not containing 110. ``0* (100*)* 111*`` Kleene's Thm ------------ For every regular expression, there is an equivalent :math:`\epsilon`-NFA. Use strong induction to prove: 1. NFAs for regexes of length 1: - let the building blocks have a unique accept state .. image:: _static/regex3.png :width: 350 2. All regexes longer than length 1 are: - two smaller regexes with addition - two smaller regexes with concatenation - one smaller regex starred 3. IS: prove S(M) is true given that S(i) is true for all i < M. - Case: :math:`r = r_1 + r_2` - since :math:`r_1, r_2` are smaller than *r*, there exists an :math:`\epsilon`-NFA with a single unique accept state for each. (IH) - Let these be :math:`M_1` and :math:`M_2`. - By adding a new start state, e-steps from that start state to the start states of of :math:`M_1` and :math:`M_2`, and e-steps from the accept states to a new unique accept state, it is possible to construct an :math:`\epsilon`-NFA to accept :math:`r_1 + r_2`. .. image:: _static/regex4.png :width: 350 - Case: :math:`r = r_1 \cdot r_2` (concatenate) - since :math:`r_1, r_2` are smaller than *r*, there exists an :math:`\epsilon`-NFA with a single unique accept state for each. (IH) - Let these be :math:`M_1` and :math:`M_2`. - By adding an e-step from the accept state of M1 to the start state of M2, it is possible to construct an :math:`\epsilon`-NFA to accept :math:`r_1 + r_2`. .. image:: _static/regex5.png :width: 350 - Case: :math:`r = r_1*` - since :math:`r_1` is smaller than *r*, there exists an :math:`\epsilon`-NFA with a single unique accept state. (IH) - Let this be :math:`M_1`. - By adding an e-step from the old accept state to the old start state, from the old accept state to a new accept state, and the new start state to a new accept state. - QED. .. image:: _static/regex6.png :width: 350 The Other Way Around ^^^^^^^^^^^^^^^^^^^^ **Thm**. Given a DFA, NFA, or e-NFA, there exists a regex that accepts the language of that FA. .. image:: _static/regex7.png :width: 500 Define :math:`L_{ij}^k` as the set of strings that will move the DFA from :math:`q_i` to :math:`q_j`, where all intermediate states' indices are < k. .. image:: _static/regex8.png :width: 500 Now we can define L(M) as the union of all ways to get from :math:`q_0` to an accept state for each accept state: .. image:: _static/regex9.png :width: 500 .. image:: _static/regex10.png :width: 500 .. image:: _static/regex11.png :width: 500 (Above: let's call this regex for :math:`L_{ij}^k` :math:`r_{ij}^k`) The induction: - base case. Since k = 0, this means only direct transitions: .. image:: _static/regex12.png :width: 800 .. image:: _static/regex13.png :width: 800 this produces two cases: if the path does not encounter k, or if it does: .. image:: _static/regex14.png :width: 800 And if it does, it must have a first time it enters *k* and the last time it exits *k*, and anything in between must be *k* to *k*, where all intermediate nodes are < k: .. image:: _static/regex15.png :width: 800 .. image:: _static/regex16.png :width: 800 Now, we can expand this regex to: :math:`\sum_{q_j \in F} r^k_{0j}`. Example ^^^^^^^ (left axis: i, j; top axis: k) .. image:: _static/regex17.png :width: 500 See published notes for full work.