Production Rule
A production rule defines a syntax element that is composed of a series of in-order references to tokens and other productions. Unlike tokens however, they can contain recursion into other productions that might potentially reference themselves. Because of this, the fully expanded form of a DFA, used by tokens is not possible. The reason for this limitation is due to the finite nature of computers and the relative infinite memory requirement necessary to fully expand a self-recursive production.

The method used to alleviate this limitation is fairly straight forward: a representation of the DFA which expands its states only as needed, which results in an automation that is a bit slower to initialize, but finite enough to be represented on today's machines. The state discovery only needs to occur once per unique path, while the same file might not be parsed twice, two portions of a file might use the same states and therefore the state discovery is skipped in those instances.

So far, the automation handles normal recursion, and the three known types of left recursion: direct, indirect and hidden.

Literal strings within production rules are replaced with token references upon compilation (a process I call deliteralization). The compiler scans for an element of an enumerator-type token which matches the source literal. If none exists, one is created in a token named __EXTRACTIONS. Since its name is unfriendly, it's best to define the literal somewhere else so when the productions are deliteralized, it is easier to refer to.

Last edited Aug 5, 2009 at 4:33 PM by AlexanderMorou, version 4


No comments yet.