Comparing strings from the back?
Dwight Hutto
dwightdhutto at gmail.com
Mon Sep 3 22:06:44 EDT 2012
More information about the Python-list mailing list
Mon Sep 3 22:06:44 EDT 2012
- Previous message (by thread): Comparing strings from the back?
- Next message (by thread): Comparing strings from the back?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Sep 3, 2012 at 9:54 PM, Roy Smith <roy at panix.com> wrote: > There's been a bunch of threads lately about string implementations, and > that got me t > > On > hinking (which is often a dangerous thing). > > Let's assume you're testing two strings for equality. You've already > done the obvious quick tests (i.e they're the same length), and you're > down to the O(n) part of comparing every character. > > I'm wondering if it might be faster to start at the ends of the strings > instead of at the beginning? If the strings are indeed equal, it's the > same amount of work starting from either end. But, if it turns out that > for real-life situations, the ends of strings have more entropy than the > beginnings, the odds are you'll discover that they're unequal quicker by > starting at the end. > > It doesn't seem un-plausible that this is the case. For example, most > of the filenames I work with begin with "/home/roy/". Most of the > strings I use as memcache keys have one of a small number of prefixes. > Most of the strings I use as logger names have common leading > substrings. Things like credit card and telephone numbers tend to have > much more entropy in the trailing digits. > > On the other hand, hostnames (and thus email addresses) exhibit the > opposite pattern. > > Anyway, it's just a thought. Has anybody studied this for real-life > usage patterns? > > I'm also not sure how this work with all the possible UCS/UTF encodings. > With some of them, you may get the encoding semantics wrong if you don't > start from the front. > -- > http://mail.python.org/mailman/listinfo/python-list > First include len(string)/2, in order to include starting at the center of the string, and threading/weaving by 2 processes out. import timeit do the the rest, and see which has the fastest time. -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-list/attachments/20120903/1f82f562/attachment.html>
- Previous message (by thread): Comparing strings from the back?
- Next message (by thread): Comparing strings from the back?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list