regexp search on infinite string?
Paddy
paddy3118 at googlemail.com
Sat Sep 15 11:58:59 EDT 2007
More information about the Python-list mailing list
Sat Sep 15 11:58:59 EDT 2007
- Previous message (by thread): regexp search on infinite string?
- Next message (by thread): How to avoid overflow errors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sep 15, 2:07 pm, John Machin <sjmac... at lexicon.net> wrote: > On Sep 15, 10:56 pm, Paddy <paddy3... at googlemail.com> wrote: > > > > > On Sep 14, 9:49 pm, Paddy <paddy3... at googlemail.com> wrote: > > > > Lets say i have a generator running that generates successive > > > characters of a 'string'>From what I know, if I want to do a regexp search for a pattern of > > > > characters then I would have to 'freeze' the generator and pass the > > > characters so far to re.search. > > > It is expensive to create successive characters, but caching could be > > > used for past characters. is it possible to wrap the generator in a > > > class, possibly inheriting from string, that would allow the regexp > > > searching of the string but without terminating the generator? In > > > other words duck typing for the usual string object needed by > > > re.search? > > > > - Paddy. > > > There seems to be no way of breaking into the re library accessing > > characters from the string: > > > >>> class S(str): > > > ... def __getitem__(self, *a): > > ... print "getitem:",a > > ... return str.__getitem__(self, *a) > > ... def __get__(self, *a): > > ... print "get:",a > > ... return str.__get__(self, *a) > > ...>>> s = S('sdasd') > > >>> m = re.search('as', s); m.span() > > (2, 4) > > >>> m = sre.search('as', s); m.span() > > (2, 4) > > >>> class A(array.array): > > > ... def __getitem__(self, *a): > > ... print "getitem:",a > > ... return str.__getitem__(self, *a) > > ... def __get__(self, *a): > > ... print "get:",a > > ... return str.__get__(self, *a) > > ... > > > >>> s = A('c','sdasd') > > >>> m = re.search('as', s); m.span() > > (2, 4) > > >>> m = sre.search('as', s); m.span() > > (2, 4) > > > - Paddy. > > That would no doubt be because it either copies the input [we hope > not] or more likely because it hands off the grunt work to a C module > (_sre). Yes, it seems to need a buffer/string so probably access a contiguous area of memory from C. o > > Why do you want to "break into" it, anyway? A simulation generates stream of data that could be gigabytes from which I'd like to find interesting bits by doing a regexp search. I could use megabyte length sliding buffers, and probably will have to. - Paddy.
- Previous message (by thread): regexp search on infinite string?
- Next message (by thread): How to avoid overflow errors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list