Stackless & String-processing
Fernando Pereira
pereira at research.att.com
Tue Jul 20 00:35:53 EDT 1999
More information about the Python-list mailing list
Tue Jul 20 00:35:53 EDT 1999
- Previous message (by thread): Stackless & String-processing
- Next message (by thread): Stackless & String-processing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In article <7mvika$cpj$1 at cronkite.cc.uga.edu>, Graham Matthews <graham at sloth.math.uga.edu> wrote: > <graham at sloth.math.uga.edu> wrote: > > expression parsers as primitive parsers". There is no need for the compiler > > to "solve the halting problem". The programmer is free to use regular > > expression/automaton techniques where he/she wants. Such parsers take > > strings and return ASTs. They can then be combined with other parsers > > as if they were primitive parsers. > Fernando Pereira (pereira at research.att.com) wrote: > : One last try. I understood your point from the beginning. However, you > : did not understood *mine*. If you push the complex > : pattern-matching/parsing machinery into black boxes, there's not much > : for the monadic glue to do, is there? > > What your combinators do depends on your application. If all you are > doing is parsing using primitive automaton based parsers then your > combinators won't have much to do. But if you are doing more than > that then they will have more to do, and hence be more complex. For > example you can use the same combinators you define for parsing to do > type checking, model state, etc -- things I would not want to do with > an automaton. That is why I said that combinators are really sequencing > constructs. *What* I've been saying is that there is a generality-efficiency tradeoff, and that the monadic approach goes for generality at the expense of efficiency, by precluding optimizations in those very important cases in which combinators are used for parsing. After all, the papers cited earlier in this discussion were all about parsing, and the thread was originally about string processing, coroutines and Icon generators. Here's an interesting question. A context-free grammar can be seen as a set of simultaneous recursive definitions in which the RHSs are polynomials (where + is alternation and * is concatenation; more generally, this can be expressed in terms of semirings and formal power series, with some very interesting advantages). A regexp pattern matcher is just a special case (modulo complications with non-regular extensions like matching against pattern variables set by earlier \( ... \) ). Now, could the mappings of (special cases of) such grammars to deterministic automata be extended to interesting cases of monadic expressions? I have some vague ideas on how this might go based on recent work on weighted transducer determinization by my colleague Mehryar Mohri -- basically, one defines an algebraic structure over "outputs" so that it makes sense to talk about the greatest common factor of alternative outputs generated from the same input prefix, and one can give sufficient conditions for determinizability. > > Fernando Pereira (pereira at research.att.com) wrote: > : And even then, some > : optimizations, those involving the various black boxes, will be > : blocked. Here's a simple example. Assume that there are two black boxes > : A and B, and that you use monadic glue to say "A or B". A and B may be > : deterministic, but the result will not be, and exhaustive search will > : be required in the glue. Sure, I can recognize such cases by hand and > : create a new black box "A or B" which is optimized, but that is an > : error-prone hand optimization. And one may not notice subtler > : inefficiencies. > > Your point appears to be that if you were programming only using automaton > for parsing you wouldn't miss the above optimisation, but that somehow > using combinators means you would miss the above optimisation. You > and I must program differently then. If I were using automaton, even > in the context of also using combinators, I would be aware of the > potential for such optimisations and look for them. You might be, by looking at the code and changing it by hand. But a program would not, because in general the optimizable skeleton of sequence, alternation and Kleene closure combinators is interspersed with arbitrary functional code. The point of automata-based compilation of regexps and CFGs is that it is *automatic*. Clearly, a monadic program that is a direct transcription of a regexp or a CFG could be similarly compiled. But from my readings (those mentioned in the thread and others) as well as your comments, the whole point of monadic combinators for parsing is to interweave the grammatical structure with other computations, thus making the underlying grammatical structure less accessible to optimization. As I suggested above, it would be interesting to define classes of monadic programs that still allow (an appropriate extension of) those optimizations. But AFAIK that has not been done. -- F
- Previous message (by thread): Stackless & String-processing
- Next message (by thread): Stackless & String-processing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list