Message 334958 - Python tracker

Message334958

Author tim.peters
Recipients benspiller, ezio.melotti, mrabarnett, tim.peters
Date 2019-02-06.17:15:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1549473325.7.0.318910987921.issue35915@roundup.psfhosted.org>
In-reply-to
Content
Yes, it's quadratic time.  If the string being searched has N characters, first it fails to find "x" in all N of 'em, then `.*` advances by one and it fails to find "x" in the trailing N-1 characters, then again in the trailing N-2, and so on.  N + N-1 + N-2 + ... + 1 is quadratic in N.

That's how this kind of regexp engine works.  And it's mild, as such things go:  you can also create poor regexps that take time _exponential_ in N that fail to match certain strings.

It's unlikely this will change without replacing Python's regexp implementation entirely.  For why, see Jeffrey Friedl's book "Mastering Regular Expressions" (published by O'Reilly).  That book also spells out techniques for crafting regexps that don't suck ;-)  It's not a small topic, alas.
History
Date User Action Args
2019-02-06 17:15:26tim.peterssetrecipients: + tim.peters, ezio.melotti, mrabarnett, benspiller
2019-02-06 17:15:25tim.peterssetmessageid: <1549473325.7.0.318910987921.issue35915@roundup.psfhosted.org>
2019-02-06 17:15:25tim.peterslinkissue35915 messages
2019-02-06 17:15:25tim.peterscreate