Any help with PLY?
Paul McGuire
ptmcg at austin.rr._bogus_.com
Thu Nov 17 14:52:59 EST 2005
More information about the Python-list mailing list
Thu Nov 17 14:52:59 EST 2005
- Previous message (by thread): Any help with PLY?
- Next message (by thread): catch dbi.program-error ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
<mark.green at reading.ac.uk> wrote in message news:1132253408.676406.179100 at g43g2000cwa.googlegroups.com... > Hi folks, > > I've been trying to write a PLY parser and have run into a bit of > bother. > > At the moment, I have a RESERVEDWORD token which matches all reserved > words and then alters the token type to match the reserved word that > was detected. I also have an IDENTIFIER token which matches > identifiers that are not reserved words. > > The problem is, if I put RESERVEDWORD before IDENTIFIER, then > identifiers that happen to begin with reserved words are wrongly lexed > as the reserved word followed by an identifier. For example, because > "if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the > RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just > as the IDENTIFIER "ifollowyou". > > If I put IDENTIFIER first, though, every single reserved word in the > input is lexed as an IDENTIFIER. > > Is there any way I can tell PLY that it should only return a > RESERVEDWORD in the correct circumstances? If PLY can't do this, can > any of the other Python parser generators? (It seems that Lex can..) > > Thanks! > Pyparsing uses the Keyword class for just this purpose. Before Keyword was added to pyparsing, one had to solve this problem using the Or operator, which performs a longest string or "greedy" match, as in : any_ = Literal("any") boolean_ = Literal("boolean") char_ = Literal("char") double_ = Literal("double") ... identifier = Word( alphas, alphanums + "_" ).setName("identifier") real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) ) + Optional( CaselessLiteral("E") + Word(nums+"+-",nums) ) ) integer = ( Combine( CaselessLiteral("0x") + Word( nums+"abcdefABCDEF" ) ) | Word( nums+"+-", nums ) ).setName("int") udTypeName = delimitedList( identifier, "::", combine=True ).setName("udType") # have to use longest match for type, in case a user-defined # type name starts with a keyword type, like "stringSeq" or "longArray" typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^ float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^ wchar_ ^ wstring_ ^ udTypeName ) This way, if a user-defined type was named "stringSequence" the longest matching expression would be returned. Pyparsing also has a MatchFirst alternative matcher, using the '|' operator, which returns the first matching expression regardless of length. Predictably, MatchFirst is faster at parsing, since it does not need to evaluate every path - it can just return the first matching expression. Now with Keyword, I can define: any_ = Keyword("any") boolean_ = Keyword("boolean") char_ = Keyword("char") double_ = Keyword("double") ... typeName = ( any_ | boolean_ | char_ | double_ | fixed_ | float_ | long_ | octet_ | short_ | string_ | wchar_ | wstring_ | udTypeName ) Does PLY support greedy matching? -- Paul (Download pyparsing at http://pyparsing.sourceforge.net .)
- Previous message (by thread): Any help with PLY?
- Next message (by thread): catch dbi.program-error ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list