How LR parsing works

Consider the grammar $G$ defined by the production rules below (we use $ to represent the end of input):

$S\rightarrow E$$
$E\rightarrow E+T$
$E\rightarrow T$
$T\rightarrow 0$
$T\rightarrow 1$

We want to generate a table we can reference to parse text in linear time. LR parsing gives us such a table.

While parsing an input, for example “1+1”, we make use of two stacks: state stack and symbol stack.

Using the parse table Link to heading

The corresponding parse table for our grammar (we’ll see how it’s generated later) is as follows:

State	0	1	+	$	E	T
0	$s_3$	$s_4$			1	2
1			$s_5$	A
2	$r_3$	$r_3$	$r_3$
3	$r_4$	$r_4$	$r_4$
4	$r_5$	$r_5$	$r_5$
5	$s_3$	$s_4$				6
6	$r_2$	$r_2$	$r_2$

For each state, we assign an action for the next input (0,1,+,$). The last two columns indicate which state to jump to when reducing.

LR uses a bottom-up, shift-reduce parsing method. There are 3 valid actions per step:

Shift ($s_n$): advances the input stream creating a new node. Pushes the terminal to the symbol stack and state $n$ to the state stack.
Reduce ($r_n$): we use grammar rule $n : A\rightarrow\gamma$ to pop off the top $|\gamma|$ entries of the symbol and state stack joining together nodes created while shifting. Finally we push $A$ onto the symbol stack and a new state onto the state stack determined by column $A$ of the top state in the stack.
Accept (A): accept the text as a member of $G$’s language.

The following is an example of parsing “1+1” with the above parse table:

State Stack	Symbol Stack	Input	Next Action
0	$	1+1$	Shift 4
0 4	$1	+1$	Reduce by $T\rightarrow 1$
0 2	$T	+1$	Reduce by $E\rightarrow T$
0 1	$E	+1$	Shift 5
0 1 5	$E+	1$	Shift 4
0 1 5 4	$E+1	$	Reduce by $T\rightarrow 1$
0 1 5 6	$E+T	$	Reduce by $E\rightarrow E+T$
0 1	$E	$	Accept

Generating the states Link to heading

Each state $I_n$ in the parse table represents a set of LR(0) items. Each item is a production rule paired with an index. We usually represent them by inserting a dot into the rule at the index, e.g., $E\rightarrow E\cdot+T$.

Each state is generated by two functions, $\textup{closure}$ and $\textup{goto}$. The first state $I_0$ is generated by the closure of the starting rule, i.e., $I_0=\textup{closure}(S\rightarrow\cdot E$$ $)$.

The closure of an item $A\rightarrow \alpha\cdot B\beta$ is defined to be the smallest set containing

the item $A\rightarrow \alpha\cdot B\beta$ itself; and
every item in $\textup{closure}(B\rightarrow\cdot\gamma)$ for production $B\rightarrow\gamma\in G$.

For example, for our grammar $G$ the initial state is then

$I_0={S\rightarrow\cdot E$$ $, E\rightarrow\cdot E+T, E\rightarrow\cdot T, T\rightarrow\cdot 0, T\rightarrow\cdot 1}.$

To generate the remaining states, we use $\textup{goto}$. $\textup{goto}(I, X)$ is defined to be the union of $\textup{closure}(A\rightarrow\alpha X\cdot\beta)$ for all items $A\rightarrow\alpha\cdot X\beta\in I$. For example,

$\textup{goto}(I_0, E)=\textup{closure}(S\rightarrow E\cdot$$ $)\cup\textup{closure}(E\rightarrow E\cdot+T).$

The LR(0) item sets (states) is then the smallest collection of items sets $C$ such that

$\textup{closure}(S\rightarrow\cdot E)\in C$; and
$\textup{goto}(I, X)\in C$ for all $I\in C$ and symbols $X\in G$.

Our example grammar has 7 item sets:


$I_0=\textup{closure}(S\rightarrow\cdot E)$	$S\rightarrow\cdot E$$ $\ E\rightarrow\cdot E+T\ E\rightarrow\cdot T\ T\rightarrow\cdot 0\ T\rightarrow\cdot 1$
$I_1=\textup{goto}(I_0,E)$	$S\rightarrow E\cdot$$ $\E\rightarrow E\cdot+T$
$I_2=\textup{goto}(I_0,T)$	$E\rightarrow T\cdot$
$I_3=\textup{goto}(I_0,0)$	$T\rightarrow 0\cdot$
$I_4=\textup{goto}(I_0,1)$	$T\rightarrow 1\cdot$
$I_5=\textup{goto}(I_1,+)$	$E\rightarrow E+\cdot T\ T\rightarrow\cdot0\ T\rightarrow\cdot1$
$I_6=\textup{goto}(I_5,T)$	$E\rightarrow E+T\cdot$

Generating the parse table Link to heading

With our 7 states generated, we need to create the edges connecting them and create the parse table. Luckily $\textup{goto}$ tells us which state goes where. For example, if we are on state $I_0$ and the next symbol on the input is 1, we want to move to state $\textup{goto}(I_0, 1)=I_4$. Since 1 is a terminal in our grammar, we more specifically want to shift then push state $I_4$ to the stack.

Thus the first row of our parse table is as follows:

State	0	1	+	$	E	T
0	$s_3$	$s_4$			1	2

Now consider state $I_2={E\rightarrow T\cdot}$. First of all, $\textup{goto}(I_2, X)=\emptyset$ for all $X\in G$ meaning no possible shift actions are available. However, the only item in the set is at the end of a production rule. Thus the proper action is to reduce by $E\rightarrow T$ no matter what the next input symbol is. Giving the following row in our table:

State	0	1	+	$	E	T
2	$r_3$	$r_3$	$r_3$	$r_3$

What about state $I_1={S\rightarrow E\cdot$$ $, E\rightarrow E\cdot+T}$. Since $\textup{goto}(I_1,+)=I_5$, we want to shift 5 on +. But $S\rightarrow E$$ is our starting state, thus as long as we are at the end of input, we can accept.

State	0	1	+	$	E	T
1			$s_5$	A

Generating the rest of the table can be done in the same manner.

Final Thoughts Link to heading

All of the above has been for LR(0) parsing, LR(1) is more popularily used. The 1 in LR(1) indicates our parser can peek 1 symbol ahead. This allows for a larger collection of parsable grammars. In LR(1) items are of the form $(A\rightarrow \alpha\cdot\beta, x)$ where $x\in G$. This requires slightly different definitions for $\textup{closure}$ and $\textup{goto}$.

For more details, such as conflict errors, I suggest the following slides: https://www3.cs.stonybrook.edu/~cram/cse504/Spring16/Lectures/Spring15/lrparser-handout.pdf