Performance of list comprehensions vs. map
Chris Barker
chrishbarker at home.net
Wed Sep 5 14:12:44 EDT 2001
More information about the Python-list mailing list
Wed Sep 5 14:12:44 EDT 2001
- Previous message (by thread): Performance of list comprehensions vs. map
- Next message (by thread): Performance of list comprehensions vs. map
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all, I just took another look at: http://musi-cal.mojam.com/~skip/python/fastpython.html specifically, the loops section: """ Loops Python supports a couple of looping constructs. The for statement is most commonly used. It loops over the elements of a sequence, assigning each to the loop variable. If the body of your loop is simple, the interpreter overhead of the for loop itself can be a substantial amount of the overhead. This is where the map function is handy. You can think of map as a for moved into C code. The only restriction is that the "loop body" of map must be a function call. Here's a straightforward example. Instead of looping over a list of words and converting them to upper case: newlist = [] for word in list: newlist.append(word.upper()) you can use map to push the loop from the interpreter into compiled C code: import string newlist = map(string.upper, list) List comprehensions were added to Python in version 2.0 as well. They provide a syntactically more compact way of writing the above for loop: newlist = [s.upper() for s in list] It's not really any faster than the for loop version, however. """ my question is: Why not? I like list comprehensions a lot, and they would seem to offer a way to get optimization over a for loop, much like map does, but apparently not. If map can do the loop in C, why can't a list comprehension? In general, I like list comprehensions because they make Python a more "sequence oriented" language: When programming, it is very common to store a sequence of data of some type, and more often than not, do something with that whole sequence, rather than individual elements of that sequence. All that looping just to do the same thing to all the elements really clutters up the logic of the code. map and list comprehensions really help (as will element-wise operators, if we ever get them). However, it seems to me that sequence-oriented structures could really provide a performance benifit as well, but Python has not gotten there yet. I think a big stumbling block is that Python has no concept of the "homogenous" sequence. There are modules that provide such a thing (Numeric, array), and indeed, the string type is a homogenous sequence, but Python itself does not understand this concept, so that if a map or list comprehesion is using a homogenous list, it still has to check the type of every element as it goes through the list. For example: [2*a for a in list] If the interpreter could just check once what the type of all the elements of list were, it would only have to figure out what 2*a meant once, and it could whip through the list very quickly. Indeed, if you do: [2*a for a in a_string] The string is a homogenous sequence by definition (once you check that it is a string), but the interpreter has no understanding of this concept. NOTE: I fully understand that adding a concept like this allows for a whole set of optimizations that COULD be done, but someone would have to get around to doing it, which would be a whole lot of work, and might never happen. That's not a reason not to do it. First of all, the framework has to be in place for this kind of optimization before anyone can even start writing the code, and even if there are no optimizations ever built into the interpreter, the concept of a homogenous sequence would be very useful for extension writers. A number of times I have written an extension that needed a list if which all the elements where the same type. I had to put a type check inside my loop, which clutters up the code, and slows things down some. If I'm working with numvers, I use NumPy, but that's not always possible. As for what might get added, I'm imagining that it should be possible to automatically maintain a "homogeous" flag on a sequence: for an imputable sequence it would get set when it was built. For a mutable sequence, each time an element is added, a type check could be done, and the homogenous flag turned off if the type didn't match. (turning it back on when that item was removed would probably not be worth it). Ideally, there would be a way for the programmer to define what was meant by homogenous (i.e.: a NumPy array of doubles, or any Numpy array) This would be trickier, but when in doubt, the homogenous flag would just be turned off, and we'd be back where we started. Even if no changes were made to the list type, the concept could be put in place, and then existing homogenous sequences could be used more optimally. I've proposed similar ideas in the past, and not gotten much response. I imagine my ideas are full of holes, but no one has taken the time to point them out to me yet. I have come to the conclusion that the really sharp minds on this list (and the folks that might be qualified to actually impliment such a thing) are not really interested in something like this that is a perfomance only improvement. Am I right? or is the idea just so stupid that it's not worth commenting on? -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------
- Previous message (by thread): Performance of list comprehensions vs. map
- Next message (by thread): Performance of list comprehensions vs. map
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list