Count and replacing strings a texfile
Alex Martelli
aleaxit at yahoo.com
Wed Jan 24 05:28:12 EST 2001
More information about the Python-list mailing list
Wed Jan 24 05:28:12 EST 2001
- Previous message (by thread): Count and replacing strings a texfile
- Next message (by thread): Count and replacing strings a texfile
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Greg Jorgensen" <gregj at pobox.com> wrote in message news:94llph$8dl$1 at nnrp1.deja.com... > Try this: > > ---- > # read all text from the file > f = open(filename, "r") > s = f.read() > f.close() > > # split text into a list at every occurence of %id% > t = s.split("%id%") > n = len(t) # number of list elements > result = s[0] # start with first element in list > # iterate over list, appending counter and next text chunk > for i in range(1,n): > result += str(i) + s[i] > > print "%s occurences replaced" % (n-1) > print result I like this general approach better than the RE-based ones; however, building up the 'result' string by successive concatenations is apt to be pretty slow -- remember the typical start string was said to be over 120k and to contain 'several' occurrences of '%id%'. In general, building a big string by successive + or += of many small pieces is O(N squared). A potentially better variation, roughly O(N)...: def numberIDstring(input_string): input_pieces = input_string.split('%id%') pieces_number = len(input_pieces) output_pieces = ['']*(pieces_number*2-1) output_pieces[0] = input_pieces[0] for i in range(1,pieces_number): output_pieces[i+i] = input_pieces[i] output_pieces[i+i-1] = str(i) return ''.join(output_pieces) It takes some reflection (and, even better, some testing of the boundary cases!!!) to check this works for input-strings containing %id% at start, at end, or two or three of them right next to each other, of course -- but then, one MUST, of course, ALWAYS test what one writes (and MOST particularly test boundary/anomalous cases!!!). Which leads me right into a digression about unit-testing, a subject which is discussed FAR too rarely in proportion to its importance...! A decent unit-test here might be something like (assuming one has no unit-testing framework in use -- it WOULD be much better to use one!!!): def testNumberIDstring(): testData = ( ('', ''), # no input -> no output ('xy','xy'), # no IDs -> no change ('%id%','1'), # just an ID ('%id%%id%', '12'), # just two IDs ('xy%id%', 'xy1'), ('%id%xy', '1xy'), ('xy%id%zt', 'xy1zt'), ('%id%xy%id%', '1xy2'), ('ax%id%by%id%cz', 'ax1by2cz'), ) errors = 0 tests = 0 for input, expected in testData: tests += 1 output = numberIDstring(input) if output!=expected: errors += 1 reportTestFailure(tests, errors, input, expected, output, "numberIDstring") if errors==0: reportSuccess(tests, "numberIDstring") else: reportFailures(tests, errors, "numberIDstring") Yep, there *IS* a lot of code in such a special purpose test -- which is why using a unit testing framework is SO useful: by greatly reducing the repetitious work of writing test code, it correspondingly motivates you to do more and better unit testing! Personally, I find that Tim Peters' deliciously simple "doctest.py" framework matches A LOT of my typical unit-testing needs, and really minimizes my work in constructing unit-test suites. But tastes and needs vary, and one might be well advised to look around at other Python unit test frameworks -- each has some strong point and might be just the ticket for YOUR own use!-) Alex
- Previous message (by thread): Count and replacing strings a texfile
- Next message (by thread): Count and replacing strings a texfile
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list