Regular Expressions¶

Regular expressions over alphabet \(\Sigma\):

Ex. Begin with 0, end with 11: 0(0+1)*11

Ex. Contains at least 2 1s: (0+1)* 1 (0+1)* 1 (0+1)* or 0* 1 0* 1 (0+1)*

Note

a+ is often used as an abbreviation for aa*

Ex. Contains the substring 111: (0+1)* 111 (0+1)*

Ex. Even length: ((0+1)(0+1))*

Ex. Odd length: (0+1)((0+1)(0+1))*

Ex. Strings that don’t end with 01: e + 1 + (0+1)*0 + (0+1)*11 - it’s hard to exclude things!

Ex. Every 0 is followed by at least one 1: 1* (011*)*

Ex. 3rd symbol from right is 1: (0+1)* 1 (0+1)(0+1)

Ex. Contains both 01 and 10. ((0+1)* 01 (0+1)* 10 (0+1)*) + ((0+1)* 10 (0+1)* 01 (0+1)*)

Ex. Not containing 00. (1+01)*(0+e)

Ex. At most 1 00. Consider: (1+01)* has no 00 and does not end w/ 0, (1+10)* has no 00 and does not start w/ 0. So exactly 1 occurance of 00: (1+01)* 00 (1+10)*, and up to 1 is just the combination: (1+01)*(0+e) + (1+01)* 00 (1+10)*.

Ex. Not containing 110. 0* (100*)* 111*

Kleene’s Thm¶

For every regular expression, there is an equivalent \(\epsilon\)-NFA.

Use strong induction to prove:

NFAs for regexes of length 1:
- let the building blocks have a unique accept state

All regexes longer than length 1 are:
- two smaller regexes with addition
- two smaller regexes with concatenation
- one smaller regex starred
IS: prove S(M) is true given that S(i) is true for all i < M.

Case: \(r = r_1 + r_2\)
- since \(r_1, r_2\) are smaller than r, there exists an \(\epsilon\)-NFA with a single unique accept state for each. (IH)
- Let these be \(M_1\) and \(M_2\).
- By adding a new start state, e-steps from that start state to the start states of of \(M_1\) and \(M_2\), and e-steps from the accept states to a new unique accept state, it is possible to construct an \(\epsilon\)-NFA to accept \(r_1 + r_2\).

Case: \(r = r_1 \cdot r_2\) (concatenate)
- since \(r_1, r_2\) are smaller than r, there exists an \(\epsilon\)-NFA with a single unique accept state for each. (IH)
- Let these be \(M_1\) and \(M_2\).
- By adding an e-step from the accept state of M1 to the start state of M2, it is possible to construct an \(\epsilon\)-NFA to accept \(r_1 + r_2\).

Case: \(r = r_1*\)
- since \(r_1\) is smaller than r, there exists an \(\epsilon\)-NFA with a single unique accept state. (IH)
- Let this be \(M_1\).
- By adding an e-step from the old accept state to the old start state, from the old accept state to a new accept state, and the new start state to a new accept state.
- QED.

The Other Way Around¶

Thm. Given a DFA, NFA, or e-NFA, there exists a regex that accepts the language of that FA.

Define \(L_{ij}^k\) as the set of strings that will move the DFA from \(q_i\) to \(q_j\), where all intermediate states’ indices are < k.

Now we can define L(M) as the union of all ways to get from \(q_0\) to an accept state for each accept state:

(Above: let’s call this regex for \(L_{ij}^k\) \(r_{ij}^k\))

The induction:

base case. Since k = 0, this means only direct transitions:

this produces two cases: if the path does not encounter k, or if it does:

And if it does, it must have a first time it enters k and the last time it exits k, and anything in between must be k to k, where all intermediate nodes are < k:

Now, we can expand this regex to:

\(\sum_{q_j \in F} r^k_{0j}\).

Example¶

(left axis: i, j; top axis: k)

See published notes for full work.

Regular Expressions¶

Kleene’s Thm¶

The Other Way Around¶

Example¶

cse103-notes

Navigation

Related Topics