2. Overview
sid takes the input grammar and applies several transformations to it, before it produces the parser. These transformations allow the language descriptions to be written in a slightly more natural form than would otherwise be necessary.
2.1. Left recursion elimination
The first transformation is to eliminate any left recursive productions, replacing them with right recursive ones. This transforms a set of rules of the form:
Ai = Aj Bji, Ci
into:
Ai = Cj Xji Xji = Bjk Xki, Yji
where Yji
is the identity function if i
equals j
, and an error otherwise. In order for this transformation to work, the following conditions must hold:
-
The parameter type for all
Ai
must be the same. -
The result type for all
Ai
must be the same. -
The exception handlers for all
Ai
must be the same. -
The parameters to each left recursive call to some
Aj
must be exactly the formal parameters to the calling ruleAi
. -
Any non-local name referenced by any rule must be in scope for all rules.
-
A rule may not define non-local variables unless it is the only entry point into the cycle (i.e. there is only one named rule in the cycle).
sid will report an error if these conditions are not met.
2.2. Factoring
The second major transformation is to factor the grammar. It is possible that this phase could go on for ever, so there is a limit to the number of rules that can be generated. This limit may be changed (see the invocation section). In practice it is rare for this to happen.
The factoring phase tries to increase the number of language specifications that sid can produce a parser for, by factoring out common initial items in alternatives, e.g.:
X = { a ; b ; || a ; c ; } ;
would be transformed into something like:
X = { a ; X1 ; } ; X1 = { b ; || c ; } ;
It will also expand calls to rules at the beginning of alternatives if this is necessary to allow the parser to be produced (this is the phase that the limit is needed for). The expansions are done in the following cases:
-
If the rule is see-through (i.e. there is an expansion of the rule that does not contain any terminals or predicates) and its first set contains terminals in the first set of the rest of the alternative.
-
If the rule has a predicate in its first set.
-
If the rule has terminals in its first set that are also in the first set of another alternative that does not begin with the same rule.
If the rule being expanded contains an exception handler, and it is not identical to the exception handler in the enclosing rule, then an error will occur. Similarly if the rule being expanded defines any non-local variables then an error will occur.
2.3. Optimisations
After these two transformations, sid performs some optimisations on the grammar, such as reducing multiple copies of identical rules, removing tail recursion, inlining all basic rules, inlining single alternative rules, inlining rules used only once, and inlining everything that can be inlined. All of the inlining is optional.
After these optimisations, sid checks the language description to ensure that it is possible to produce a parser for it, and if so it outputs the parser.