simple string parsing ?
Alex Martelli
aleaxit at yahoo.com
Fri Sep 10 05:40:41 EDT 2004
More information about the Python-list mailing list
Fri Sep 10 05:40:41 EDT 2004
- Previous message (by thread): simple string parsing ?
- Next message (by thread): Changing state of buttons.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
TAG <tonino.greco at gmail.com> wrote: > > ((Of course, you ARE restricted to what Python considers 'tokens' so you > > may need some postprocessing if you need a slightly different notion of > > tokens)) > > luckily they should all be - but in the case that they are not - how > can I checki it ? With a little post-processing. Say for example that you need := and :+ to be seen as single tokens; here's a Python 2.4 approach...: mergers = {':' : set('=+'), } def tokens_of(x): it = peekahead_iterator(toktuple[1] for toktuple in tokenize.generate_tokens(cStringIO.StringIO(x).readline) ) for tok in it: if it.preview in mergers.get(tok, ()): yield tok+it.preview it.next() else: yield tok x = 'fup(z:=97, y:+45):zap' print list(tokens_of(x)) result is: ['fup', '(', 'z', ':=', '97', ',', 'y', ':+', '45', ')', ':', 'zap', ''] Of course, you do need the handy 'peekahead_iterator', say something like: class peekahead_iterator(object): class nothing: pass def __init__(self, it): self._nit = iter(it).next self.preview = None self._step() def __iter__(self): return self def next(self): result = self._step() if result == self.nothing: raise StopIteration else: return result def _step(self): result = self.preview try: self.preview = self._nit() except StopIteration: self.preview = self.nothing return result Splitting one token into several is easier (no peeking ahead is needed). But both splitting and merging are fine, as long as the deviations between what you want to see as tokens and what Python considers tokens are minor. If you have BIG divergences -- e.g., you do not want to support triple-quoted strings as single tokens -- then you may be better off with a completely different approach, as others have suggested. Alex
- Previous message (by thread): simple string parsing ?
- Next message (by thread): Changing state of buttons.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list