Most efficient method to search text?
Michael Hudson
mwh at python.net
Wed Oct 16 11:19:58 EDT 2002
More information about the Python-list mailing list
Wed Oct 16 11:19:58 EDT 2002
- Previous message (by thread): calling a Python method from C
- Next message (by thread): Most efficient method to search text?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jeff Epler <jepler at unpythonic.net> writes: > On Wed, Oct 16, 2002 at 01:35:09PM +0000, Michael Hudson wrote: > > Here's a way to quickly (I hope! Haven't done any benchmarks) tell if > > one of a bunch of words is contained in a chunk of text, assuming the > > words are known beforehand [...] > > Is there any reason to suppose that this is more efficient than using > re.compile("|".join(re.escape(words))).match? No, and it's not (once you fiddle what you said to re.compile("|".join( ['\\b'+re.escape(word)+'\\b' for word in words])).search so it does roughly the same thing). > I haven't looked at the implementation of sre, but it should be able > to generate a simple DFA for this RE and execute something very much > like your 'match' function, but at C speeds. Yup. Dunno if it does either, but it's quick. > Having one dict lookup per character scanned seems like a pretty huge > chunk of overhead. The original reason I started writing code like this was that I needed to feed in the string a character at a time and there was enough else going on to easily swamp one measly dict lookup. Another difference is that it's easy to fiddle my scheme into telling you which word matched (this isn't exactly hard with re either). I can't think of many ways of speeding it up in Python -- perhaps running the text through map(ord, text) and using lists? Doesn't seem to make much difference. psyco seems to make more difference, and I guess playing for psyco a bit could get more (these are all eyeball tests, no numbers I'm afraid). Anyway, what I thought was a neat bit of code was the point... Cheers, M. -- Whaaat? That is the most retarded thing I have seen since, oh, yesterday -- Kaz Kylheku, comp.lang.lisp
- Previous message (by thread): calling a Python method from C
- Next message (by thread): Most efficient method to search text?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list