Most efficient method to search text?
Tim Peters
tim.one at comcast.net
Fri Oct 18 15:59:23 EDT 2002
More information about the Python-list mailing list
Fri Oct 18 15:59:23 EDT 2002
- Previous message (by thread): Most efficient method to search text?
- Next message (by thread): Most efficient method to search text?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Michael Hudson, searching for words] > ... > Indeed, my code canes re when the wordlist gets long. Here's some > numbers ... > ... > So roughly linear behaviour from re, n-independent behaviour from my > code. Isn't it nice when things do what you expect? When you're young, it can seem that way. At my age, it mostly makes one nervous, waiting for the other and always larger and much smellier foot to fall <wink>. > On the other hand, the "compiler" code I posted yesterday consumes > vast gobs of memory -- trying to compile a 10000 long list of random > words got the OOM killer into action. A version using lists seems > slightly better in this respect, though slower in execution. Which is, of course, the traditional tradeoff between NFA and DFA approaches -- in bad cases you can pay in memory and preprocessing time due to exponential state explosion, or you can pay at runtime. I see Bengt discovered how to use dicts for this, and that's probably what you'll use in the future too <wink>. For the more general case of lexing, you're searching for patterns (like r'[A-Za-z_]\w*'), and twisting a dict gets at best obscure. There's a large literature now on compromise approaches, building only the parts of the DFA that the target string actually needs to explore, in a dynamic way. This sounds horribly expensive at first glance (don't think about that mixing eyes with ears doesn't make sense), but in some practical cases there are exceedingly clever ways to "fake it" with a handful of machine-level instructions per character, using integers as bit vectors representing the *set* of NFA nodes currently active. Gonzo speeding of all arguably common cases of pattern-matching is a lifetime job. Although there's nothing unique about pattern-matching in that ... learning-to-live-with-fast-enough-leaves-more-time-for-life-ly y'rs - tim
- Previous message (by thread): Most efficient method to search text?
- Next message (by thread): Most efficient method to search text?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list