[Python-Dev] re performance
Armin Rigo
armin.rigo at gmail.com
Sat Jan 28 06:44:37 EST 2017
More information about the Python-Dev mailing list
Sat Jan 28 06:44:37 EST 2017
- Previous message (by thread): [Python-Dev] re performance
- Next message (by thread): [Python-Dev] re performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Sven, On 26 January 2017 at 22:13, Sven R. Kunze <srkunze at mail.de> wrote: > I recently refreshed regular expressions theoretical basics *indulging in > reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html Theoretical regular expressions and what Python/Perl/etc. call regular expressions are a bit different. You can read more about it at https://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times . Discussions about why they are different often focus on backreferences, which is a rare feature. Let me add two other points. The theoretical kind of regexp is about giving a "yes/no" answer, whereas the concrete "re" or "regexp" modules gives a match object, which lets you ask for the subgroups' location, for example. Strange at it may seem, I am not aware of a way to do that using the linear-time approach of the theory---if it answers "yes", then you have no way of knowing *where* the subgroups matched. Another issue is that the theoretical engine has no notion of greedy/non-greedy matching. Basically, you walk over the source character and it answers "yes" or "no" after each of them. This is different from a typical backtracking implementation. In Python: >>> re.match(r'a*', 'aaa') >>> re.match(r'a*?', 'aaa') This matches either three or zero characters in Python. The two versions are however indistinguishable for the theoretical engine. A bientôt, Armin.
- Previous message (by thread): [Python-Dev] re performance
- Next message (by thread): [Python-Dev] re performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list