LL(1) parser contribution check-ins

Topics: Suggestions
Aug 19, 2010 at 11:18 AM

Hi! I have a cointribution:

http://expressionscompiler.codeplex.com/

Can you test it and comment it out?

Phanx!

 

Coordinator
Aug 20, 2010 at 11:51 PM
Edited Aug 22, 2010 at 2:36 PM

I'm not sure this would be considered a contribution given the different focuses of our projects.

A PEG parser focuses on using a series of nested rules and terminals and an iterative approach in discerning which rule/terminal matches at any given point.  The structure of the project I'm writing focuses on writing a program for the source grammar input that unfolds much of the logic into a flat form in a deterministic (vs. nondeterministic) manner.  One's interpreted during parse whereas the other is expressly defined through the code that is generated by the software.  That is, assuming I understand your code correctly.

As an example, in your situation, I'm not entirely sure how you'd write a parser for the C# identifier.  The character set that it covers is fairly large, the first character for any identifier is able to accept, if I'm right 47,727 different characters because it covers the following character classes:

UppercaseLetter
LowercaseLetter
OtherLetter
TitlecaseLetter
ModifierLetter
LetterNumber

In the system I'm writing it's as simple as the following to add this requirement:

Identifier := 
    '@'? IdentifierOrKeyword;

UnicodeEscapeSequence :=
    "\\u" HexChar{4};

IdentifierOrKeyword :=
    IdentifierStartCharacter IdentifierPartCharacter*;

IdentifierStartCharacter :=
    LetterCharacter                         |
    '_'                                     |
    UnicodeEscapeSequence                   ;

IdentifierPartCharacter :=
    LetterCharacter                         |
    CombiningCharacter                      |
    DecimalDigitCharacter                   |
    ConnectingCharacter                     |
    CombiningCharacter                      |
    FormattingCharacter                     |
    UnicodeEscapeSequence                   ;

LetterCharacter :=
    [:Lu::Ll::Lt::Lm::Lo::Nl:];

CombiningCharacter :=
    [:Mn::Mc:];

DecimalDigitCharacter :=
    [:Nd:];

ConnectingCharacter :=
    [:Pc:];

FormattingCharacter :=
    [:Cf:];

Your project interests me, due to how it does what it does.  Would similar functionality in your project be easy to add?

 

Coordinator
Aug 21, 2010 at 1:21 PM
Edited Aug 22, 2010 at 2:21 PM

Further,

Here's an example of the code generated by OILexer as it stands (for the above Identifier):

1
 /* -----------------------------------------------------------\
2
 |  This code was generated by Oilexer.                        |
3
 |  Version: 1.0.0.24882                                       |
4
 |-------------------------------------------------------------|
5
 |  To ensure the code works properly,                         |
6
 |  please do not make any changes to the file.                |
7
 |-------------------------------------------------------------|
8
 |  The specific language is C# (Runtime version: v4.0.30319)  |
9
 |  Sub-tool Name: Oilexer.CSharpCodeTranslator                |
10
 |  Sub-tool Version: 1.0.0.24882                              |
11
 \----------------------------------------------------------- */
12
using System;
13
using System.Globalization;
14
using Languages.CSharp;
15
 
16
namespace Languages.CSharp
17
{
18
    // Module: Lexer
19
    internal class IdentifierStateMachine :
20
        CharStream
21
    {
22
        #region IdentifierStateMachine data members
23
        /// <summary>
24
        /// The state machine's current state, determining the logic path to follow for the next character.
25
        /// </summary>
26
        private int state = 0;
27
        
28
        private int exitLength;
29
        #endregion // IdentifierStateMachine data members
30
        #region IdentifierStateMachine methods
31
        /// <summary>
32
        /// Moves the state machine into its next state with the <paramref name="currentChar"/>.
33
        /// </summary>
34
        /// <param name="currentChar">The next character used as the condition for state-&gt;state transitions.</param>
35
        public bool Next(char currentChar)
36
        {
37
            switch (this.state)
38
            {
39
                case 0:
40
                    if (currentChar == '@')
41
                        goto MoveToState_1;
42
                    else if (currentChar == '_')
43
                        goto MoveToState_2;
44
                    else if (currentChar == '\\')
45
                        goto MoveToState_3;
46
                    else
47
                        goto UnicodeGraph_1;
48
                    break;
49
                case 1:
50
                    if (currentChar == '_')
51
                        goto MoveToState_2;
52
                    else if (currentChar == '\\')
53
                        goto MoveToState_3;
54
                    else
55
                        goto UnicodeGraph_1;
56
                    break;
57
                case 2:
58
                    if (currentChar == '\\')
59
                        goto MoveToState_3;
60
                    else
61
                        goto UnicodeGraph_2;
62
                    break;
63
                case 3:
64
                    if (currentChar == 'u')
65
                        goto MoveToState_4;
66
                    break;
67
                case 4:
68
                    if ((((currentChar >= '0') && (currentChar <= '9')) || ((currentChar >= 'A') && (currentChar <= 'F'))) || ((currentChar >= 'a') && (currentChar <= 'f')))
69
                        goto MoveToState_5;
70
                    break;
71
                case 5:
72
                    if ((((currentChar >= '0') && (currentChar <= '9')) || ((currentChar >= 'A') && (currentChar <= 'F'))) || ((currentChar >= 'a') && (currentChar <= 'f')))
73
                        goto MoveToState_6;
74
                    break;
75
                case 6:
76
                    if ((((currentChar >= '0') && (currentChar <= '9')) || ((currentChar >= 'A') && (currentChar <= 'F'))) || ((currentChar >= 'a') && (currentChar <= 'f')))
77
                        goto MoveToState_7;
78
                    break;
79
                case 7:
80
                    if ((((currentChar >= '0') && (currentChar <= '9')) || ((currentChar >= 'A') && (currentChar <= 'F'))) || ((currentChar >= 'a') && (currentChar <= 'f')))
81
                        goto MoveToState_2;
82
                    break;
83
            }
84
            return false;
85
        CommonMove:
86
            this.Push(currentChar);
87
            return true;
88
        MoveToState_1:
89
            this.state = 1;
90
            goto CommonMove;
91
        NominalExit:
92
            this.Push(currentChar);
93
            this.exitLength = this.actualSize;
94
            return true;
95
        MoveToState_2:
96
            this.state = 2;
97
            goto NominalExit;
98
        MoveToState_3:
99
            this.state = 3;
100
            goto CommonMove;
101
        UnicodeGraph_1:
102
            switch (char.GetUnicodeCategory(currentChar))
103
            {
104
                case UnicodeCategory.UppercaseLetter:
105
                case UnicodeCategory.LowercaseLetter:
106
                case UnicodeCategory.OtherLetter:
107
                case UnicodeCategory.TitlecaseLetter:
108
                case UnicodeCategory.ModifierLetter:
109
                case UnicodeCategory.LetterNumber:
110
                    goto MoveToState_2;
111
                    break;
112
            }
113
            return false;
114
        UnicodeGraph_2:
115
            switch (char.GetUnicodeCategory(currentChar))
116
            {
117
                case UnicodeCategory.DecimalDigitNumber:
118
                case UnicodeCategory.UppercaseLetter:
119
                case UnicodeCategory.ConnectorPunctuation:
120
                case UnicodeCategory.LowercaseLetter:
121
                case UnicodeCategory.OtherLetter:
122
                case UnicodeCategory.TitlecaseLetter:
123
                case UnicodeCategory.ModifierLetter:
124
                case UnicodeCategory.NonSpacingMark:
125
                case UnicodeCategory.Format:
126
                case UnicodeCategory.SpacingCombiningMark:
127
                case UnicodeCategory.LetterNumber:
128
                    goto NominalExit;
129
                    break;
130
            }
131
            return false;
132
        MoveToState_4:
133
            this.state = 4;
134
            goto CommonMove;
135
        MoveToState_5:
136
            this.state = 5;
137
            goto CommonMove;
138
        MoveToState_6:
139
            this.state = 6;
140
            goto CommonMove;
141
        MoveToState_7:
142
            this.state = 7;
143
            goto CommonMove;
144
        }
145
        #endregion // IdentifierStateMachine methods
146
    }
147
}
148
 /* ----------------------------------------------\
149
 |  This file took 00:00:00.0025744 to generate.  |
150
 |  Date generated: 8/6/2010 2:49:30 PM           |
151
 |  There were 5 types used by this file          |
152
 |  System.Int32, System.Char, System.Boolean,    |
153
 |  UnicodeCategory, CharStream                   |
154
 |------------------------------------------------|
155
 |  There were 1 assemblies referenced:           |
156
 |  mscorlib                                      |
157
 \---------------------------------------------- */
158