"in" operator for strings
Alex Martelli
aleaxit at yahoo.com
Sat Feb 3 04:12:20 EST 2001
More information about the Python-list mailing list
Sat Feb 3 04:12:20 EST 2001
- Previous message (by thread): "in" operator for strings
- Next message (by thread): "in" operator for strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Magnus Lie Hetland" <mlh at idi.ntnu.no> wrote in message news:95ftl4$o6b$1 at tyfon.itea.ntnu.no... [snip] > > > only allow contiguous subsequences, or any subsequence? > > > If we ever get built-in sets contiguity would be meningless, > > > as would it be if we get "in" tests for dictionaries.) > > > > Dictionaries aren't sequences, nor will sets be, so the > > issue doesn't apply to either. > > Sure it does. It will then become the question of whether to > allow the same operator do check membership and subset-ness. Subsetness is one thing, subsequencehood another (actually two others, given contiguity may be required or not); the difference is ordering -- 'ab' is a subset but not a subsequence of 'obak'. The whole issue of subsequence is meaningless where no sequence (ordering) exists. > > with bisection (binary search) over the > > 'places' sequences, O(M log N) appears like an easy > > upperbound, and maybe it can be cut down further. > > Well... If you could store an array of size N for each > letter you could easily do it in O(1) time... But then > you'd need O(M*N) time to initialize the stuff :-) You don't know M at initialization time -- it's the size of the subsequence you're going to match (in the normal notation I've seen used), while N is the size of the big sequence in which you're looking for matches. Alphabet size is generally thought of as fixed in most published literature, though I suspect that O(N) is more realistic for generic sequences. You still can't do a match in O(1) time anyway, even with the auxiliary structure you suggest; O(M) seems the lower bound, except by precomputing ALL the possible subsequences and indexing them into some totally hypothetical O(1)-lookup-time structure (but in fact computing the hash IS still O(M), so there is no big-O advantage here). > If you add a requirement that the matched letters must > be relatively close to each other (within a constant > distance limit) you can get a speedup. (A reasonable > requirement in some cases.) Actually, I know of some > harwdare that does this very quickly. I'm curious about the algorithm and its auxiliary data structures here. The O(M) match still seems pretty elusive to me. > To me the module doesn't even have to be small. Just think > about all the other function and class names you define in > a module... Why should "split" specifically have to have > a module prefix? If I choose to import it that may be as > thought through as defining a function myself. Only if I I like the clarity that comes from use of explicit prefixes -- no wondering 'WHAT split function is this one' any more (string or re or ...?). Calling a method of an object, as in 'foo bar'.split(), is clearest -- totally local context, zero wondering-factor. Calling a prefixed function is second best. Calling a non-prefixed function sends me looking for it throughout the non-necessarily-small module -- and I won't find the '^def split' either (so I have to look at the from statements and the assignments -- sigh). > (OK... So I know you dislike the word "convenient". Well... > I find it prettier too ;-) The (modest) convenience of the code author becomes the (rather larger) IN-convenience of the maintainer. When one has spent a few years doing maintenance on large modules written by others, one's priorities tend to change -- as does one's outlook on "prettiness" when it interferes with clarity (the typical C++ scope obscurity due to implicit-this, vs. Python's clarity with explicit self, being a good example). > Why not write somesequence.extend([1]) or > somesequence.join(" ") (not both, of course) so that they would > be consistent with each other? Oh, well. I'm sure it's just > that I don't see the Greatness of it all. I've discussed the "what object should join be a method of" in a couple of lengthy postings 2 or 3 months ago -- it boils down to "where do I get the most leverage for polymorphism", and the current architecture wins hands down. extend is similar -- which object needs the polymorphism most, the one being modified or the sequence just being 'read'? And of course the answer is the same. So, the architecture DOES turn out to be fully consistent here, because exactly the same forces are being resolved! In each case, the subsequence that is just being read (for the purpose of building a string out of it, or for that of extending another object) is the argument -- all the polymorphism we need on it is that it obeys the sequence-protocol, after all. The 'joiner' (aka separator) object, and the object being modified (extended), are in each case the target of the method and get full benefit of polymorphism. > The " ".join([1,2,3]) doesn't seem like an accessor to me. > It seems like a plain function with a weird syntax. > > But, hey, since this is the way it *is*, I'm the one with > a problem here :-) And we're here to help you come to terms with it!-) What I'm doing is asking a 'joiner object' "please read this sequence of strings and join it up into one big string, will you". Function syntax does not give you polymorphism directly in Python - it can at best be syntax sugar for a method call that does do the poly thing: def myJoin(joiner, sequence): return joiner.join(sequence) which is basically what string.join now does. We do not have multimethods in Python (one generic function dispatched directly to the appropriate method), so the main Python approach to polymorphism is calling a method on an object (sometimes with weird syntax sugar, such as a+b for a.__add__(b) or maybe for b.__radd__(a), but that's another issue:-). It only remains to determine which object gets real benefit from a given case of polymorphism. Alex
- Previous message (by thread): "in" operator for strings
- Next message (by thread): "in" operator for strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list