Why is Python so slow ?- revisited.
Thomas Wouters
thomas at xs4all.net
Tue Jun 20 08:32:56 EDT 2000
More information about the Python-list mailing list
Tue Jun 20 08:32:56 EDT 2000
- Previous message (by thread): Why is Python so slow ?- revisited.
- Next message (by thread): Why is Python so slow ?- revisited.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Jun 19, 2000 at 08:47:09PM -0400, Bijan Parsia wrote: > Thomas Wouters <thomas at xs4all.net> wrote: [ Python's lack of an instruction/function cache ] > > Indeed. Such a cache would have to be invalidated whenever the function > > object or one of the objects used to lookup the function, was altered (or > > rather, 'touched'.) The dynamic nature of Python makes this quite > > complicated, and probably quite expensive. > I'd love to hear more about why this would be the case. (Mere > "dynamicness" can't be it: Smalltalk is a s dynamic and most > implementations make quite effective use of various sorts of cache. So, > if your contention is correct, there must be some *special* aspect of > Python's dynamic nature that makes it' *especially* resistent. Note that > I am *very* skeptical about the existence of such a special aspect, but > I am quite willing to be enlightened! Merely being "more dynamic" or "so > very dynamic" which is what I usual read, doesn't cut any mustard. I know next to nothing about Smalltalk, let alone how it optimizes, but I can tell you why it would be complicated in Python ;) Lets take a simple example: def spam(x, y): if x in range(1, y): return 1 if x in range(y, 1): return 1 return 0 Might not seem very useful, but it's just an example ;) This can easily be optimized, right ? We lookup 'range', it's a builtin, we store the address, and call it two times. Well, no. We dont *know* range a is a builtin, it could be a global that points elsewhere. If the 'spam' function contained a 'from module import *' or 'exec' statement, it could even be a local. So, we have to check if range is still the same function. Which means we have to look it up and compare identity -- and we've already lost any advantage the cache could give us. Or we could place a marker on the __builtin__ module, and invalidate the cached entry of 'range' whenver 'range' is changed. But then we also have to add a marker to the global namespace, because something can *add* a range there. And don't think it can't happen: 'range()' could be a function that modifies the global or builtin namespace. Or it could be a function that deletes references to a few objects (not that uncommon an operation, really.) And those objects could then be destroyed (no more references), and then those objects could trigger destruction of other objects, and one of those objects could call a function on some module or other that modifies the original global or builtin namespace. Another example, the simple case of: def spam(egg): egg.spam() egg.spam() If egg or a parent class of egg does not define a spam() method but does define a __getattr__, this could return two entirely different things. So, in caching egg.spam, we need to make sure it's been located 'statically'. (This can be read as 'without __getattr__', or as 'analyzed __getattr__ and determined each call with the same arguments would return the same object') Also, we need a way to invalidate the cached entry of egg.spam if any of egg or any of its parent up to the one defining 'spam', have something done to their 'spam' attribute. And that means than whenever you change an attribute on a class or instance, you need to traverse the entire tree of subclasses and instances (up to one that 'masks' the change) looking for cached entries and invalidating them. And of course, if you do something like: egg.ham.viking.spam.plate.drop() you have to consider *each lookup* cached ;-P I don't think lookups are *that* expensive. I'm not sure what the penalty for a failed lookup is, I seem to recall something about a successful lookup being relatively cheap, compared to a failed one, and a 2nd lookup is not likely to fail ;) > Indeed, as my bet is that the majority of the time, in the majority of > code, things are quite "hands off" (i.e., no touching :)), I would be > surprised that this would prove a problem. In inner loops, indeed, it > could be quite effective, and would be no different than *manually* > copying the function to the local--a move, I hope you agree, can provide > a substantial speedup. I've measured about 10% speed increase (according to pybench) in method calls, by storing each method in the instance-dict after it was created. It also seemed to have the same effect on the TryExcept test, but that's probably because it uses a lot of methods. However, at the same time, some other tests decreased in performance about as much, ending up actually 0.76% slower than before ;) The speedup can probably be a lot more, if you do (compile time) analysis of howmany times a given method is called, and caching only those. Or, of course, if you *manually* cache the function to the instance-dict or the local namespace (even better -- one lookup less). > I confess to not being an expert in these matters, but I don't know I > understand why the *whole* cache would have to be invalidated if one > function object were "touched". Er..in mere other words...More please! I was a bit careless (again ;) in my wording. I meant to say that the individual cached entry has to be invalidated. > > What's more, currently 'bound methods' (methods which have a 'self' to which > > they apply) are created on demand -- the unbound method exists on the class, > > but it has to be 'bound' to an instance before it can be used. This is done > > each time you access the method, though CPython is being smart and you have > > to try hard to actually see that ;-) > Hmm. Ok. I think. Maybe. Why it has to be do each time one accesses the > *instance*'s method I find a tad confusing. I mean ok, you change the > class def, you'll want to update the instances. But surely it's ok to > make class updates more expensive than method access? Yes, but this works, and caching & invalidation doesn't :-) It would be no little ammount of code to add it, even if you discard the ideas about invalidation (say, if you add the new code under a -OOO option, and tell people not to f*** around with base classes after instances have been created, which I doubt Guido or any of the other core developers would allow.) > Ewww! :) But *why* do this? walk hasn't changed. It's surely > determinable that it hasn't changed over the course of x,y=(cleese.walk, > cleese.walk), yes? Yes, by comparing identity of the retrieved objects, but then it's already too late :-) "a,b=c.walk,c.walk" gets compiled roughly in this: 6 LOAD_FAST 0 (c) 9 LOAD_ATTR 1 (walk) 12 LOAD_FAST 0 (c) 15 LOAD_ATTR 1 (walk) 18 BUILD_TUPLE 2 21 UNPACK_TUPLE 2 24 STORE_FAST 1 (a) 27 STORE_FAST 2 (b) As you see, different bytecodes. The bytecodes have trouble seeing what previous bytecodes were doing, so you'd have to either recognize this in the compiler and generate alternative bytecodes, or rewrite the bytecode interpreter to do caching on *all* lookups (of a certain type.) And see above about that problem ;) So, to the casual reader it might seem obvious that Cleese's walk doesn't change, but the compiler doesn't know *anything* about Cleese, and though the interpreter does know everything about Cleese, it doesn't know that Cleese has already been queried about his walk method. And adding the knowledge to either (or better (for optimization), both) is a major project. Hopefully, now that a couple of the Big Brains can devote their energy to the furtherance of Python full-time, they can devote some of that in this direction ;) But I can imagine there are more important things than this potential 10% (or so) speedup. So, now for the DISCLAIMER: I'm just a hobbyist. I don't know *that* much about Python (yet ;), I've only been mucking around in its internals for a couple of weeks. I also have no experience with compilers, cache-mechanisms or optimizations other than a few good books on the subject (parts of the Dragon book, O'Reilly's High Performance Computing, and books about the internals of computers/compilers in general, like those of Tanenbaum and v/d Linden.) So, please, do not take my words for unbendable truth, for even if they are true, they're likely flexible beyond belief ;) laywer-like-ly yr's, -- Thomas Wouters <thomas at xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
- Previous message (by thread): Why is Python so slow ?- revisited.
- Next message (by thread): Why is Python so slow ?- revisited.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list