Stackless & String-processing

Fernando Pereira pereira at research.att.com
Tue Jul 20 00:35:53 EDT 1999
In article <7mvika$cpj$1 at cronkite.cc.uga.edu>, Graham Matthews
<graham at sloth.math.uga.edu> wrote:

> <graham at sloth.math.uga.edu> wrote:
> > expression parsers as primitive parsers". There is no need for the compiler
> > to "solve the halting problem". The programmer is free to use regular
> > expression/automaton techniques where he/she wants. Such parsers take
> > strings and return ASTs. They can then be combined with other parsers
> > as if they were primitive parsers.
> Fernando Pereira (pereira at research.att.com) wrote:
> : One last try. I understood your point from the beginning. However, you
> : did not understood *mine*. If you push the complex
> : pattern-matching/parsing machinery into black boxes, there's not much
> : for the monadic glue to do, is there? 
> 
> What your combinators do depends on your application. If all you are
> doing is parsing using primitive automaton based parsers then your
> combinators won't have much to do. But if you are doing more than
> that then they will have more to do, and hence be more complex. For
> example you can use the same combinators you define for parsing to do
> type checking, model state, etc -- things I would not want to do with
> an automaton. That is why I said that combinators are really sequencing
> constructs.
*What* I've been saying is that there is a generality-efficiency
tradeoff, and that the monadic approach goes for generality at the
expense of efficiency, by precluding optimizations in those very
important cases in which combinators are used for parsing. After all,
the papers cited earlier in this discussion were all about parsing, and
the thread was originally about string processing, coroutines and Icon
generators. Here's an interesting question. A context-free grammar can
be seen as a set of simultaneous recursive definitions in which the
RHSs are polynomials (where + is alternation and * is concatenation;
more generally, this can be expressed in terms of semirings and formal
power series, with some very interesting advantages). A regexp pattern
matcher is just a special case (modulo complications with non-regular
extensions like matching against pattern variables set by earlier \(
... \) ). Now, could the mappings of (special cases of) such grammars
to deterministic automata be extended to interesting cases of monadic
expressions? I have some vague ideas on how this might go based on
recent work on weighted transducer determinization by my colleague
Mehryar Mohri -- basically, one defines an algebraic structure over
"outputs" so that it makes sense to talk about the greatest common
factor of alternative outputs generated from the same input prefix, and
one can give sufficient conditions for determinizability. 
> 
> Fernando Pereira (pereira at research.att.com) wrote:
> : And even then, some
> : optimizations, those involving the various black boxes, will be
> : blocked. Here's a simple example. Assume that there are two black boxes
> : A and B, and that you use monadic glue to say "A or B". A and B may be
> : deterministic, but the result will not be, and exhaustive search will
> : be required in the glue. Sure, I can recognize such cases by hand and
> : create a new black box "A or B" which is optimized, but that is an
> : error-prone hand optimization. And one may not notice subtler
> : inefficiencies.
> 
> Your point appears to be that if you were programming only using automaton
> for parsing you wouldn't miss the above optimisation, but that somehow
> using combinators means you would miss the above optimisation. You
> and I must program differently then. If I were using automaton, even
> in the context of also using combinators, I would be aware of the
> potential for such optimisations and look for them.
You might be, by looking at the code and changing it by hand. But a
program would not, because in general the optimizable skeleton of
sequence, alternation and Kleene closure combinators is interspersed
with arbitrary functional code. The point of automata-based compilation
of regexps and CFGs is that it is *automatic*. Clearly, a monadic
program that is a direct transcription of a regexp or a CFG could be
similarly compiled. But from my readings (those mentioned in the thread
and others) as well as your comments, the whole point of monadic
combinators for parsing is to interweave the grammatical structure with
other computations, thus making the underlying grammatical structure
less accessible to optimization. As I suggested above, it would be
interesting to define classes of monadic programs that still allow (an
appropriate extension of) those optimizations. But AFAIK that has not
been done.

-- F




More information about the Python-list mailing list