[Python-Dev] C-level duck typing
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Wed May 16 16:59:16 CEST 2012
More information about the Python-Dev mailing list
Wed May 16 16:59:16 CEST 2012
- Previous message: [Python-Dev] C-level duck typing
- Next message: [Python-Dev] C-level duck typing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 05/16/2012 02:47 PM, Mark Shannon wrote: > Stefan Behnel wrote: >> Dag Sverre Seljebotn, 16.05.2012 12:48: >>> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote: >>>>> Agreed in general, but in this case, it's really not that easy. A C >>>>> function call involves a certain overhead all by itself, so calling >>>>> into >>>>> the C-API multiple times may be substantially more costly than, say, >>>>> calling through a function pointer once and then running over a >>>>> returned C >>>>> array comparing numbers. And definitely way more costly than >>>>> running over >>>>> an array that the type struct points to directly. We are not talking >>>>> about >>>>> hundreds of entries here, just a few. A linear scan in 64 bit steps >>>>> over >>>>> something like a hundred bytes in the L1 cache should hardly be >>>>> measurable. >>>> I give up, then. I fail to understand the problem. Apparently, you want >>>> to do something with the value you get from this lookup operation, but >>>> that something won't involve function calls (or else the function call >>>> overhead for the lookup wouldn't be relevant). >>> In our specific case the value would be an offset added to the >>> PyObject*, >>> and there we would find a pointer to a C function (together with a >>> 64-bit >>> signature), and calling that C function (after checking the 64 bit >>> signature) is our final objective. >> >> I think the use case hasn't been communicated all that clearly yet. Let's >> give it another try. >> >> Imagine we have two sides, one that provides a callable and the other >> side >> that wants to call it. Both sides are implemented in C, so the callee >> has a >> C signature and the caller has the arguments available as C data >> types. The >> signature may or may not match the argument types exactly (float vs. >> double, int vs. long, ...), because the caller and the callee know >> nothing >> about each other initially, they just happen to appear in the same >> program >> at runtime. All they know is that they could call each other through >> Python >> space, but that would require data conversion, tuple packing, calling, >> tuple unpacking, data unpacking, and then potentially the same thing >> on the >> way back. They want to avoid that overhead. >> >> Now, the caller needs to figure out if the callee has a compatible >> signature. The callee may provide more than one signature (i.e. more than >> one C call entry point), perhaps because it is implemented to deal with >> different input data types efficiently, or perhaps because it can >> efficiently convert them to its expected input. So, there is a >> signature on >> the caller side given by the argument types it holds, and a couple of >> signature on the callee side that can accept different C data input. Then >> the caller needs to find out which signatures there are and match them >> against what it can efficiently call. It may even be a JIT compiler that >> can generate an efficient call signature on the fly, given a suitable >> signature on callee side. > >> >> An example for this is an algorithm that evaluates a user provided >> function >> on a large NumPy array. The caller knows what array type it is operating >> on, and the user provided function may be designed to efficiently operate >> on arrays of int, float and double entries. > > Given that use case, can I suggest the following: > > Separate the discovery of the function from its use. > By this I mean first lookup the function (outside of the loop) > then use the function (inside the loop). We would obviously do that when we can. But Cython is a compiler/code translator, and we don't control usecases. You can easily make up usecases (= Cython code people write) where you can't easily separate the two. For instance, the Sage projects has hundreds of thousands of lines of object-oriented Cython code (NOT just array-oriented, but also graphs and trees and stuff), which is all based on Cython's own fast vtable dispatches a la C++. They might want to clean up their code and more generic callback objects some places. Other users currently pass around C pointers for callback functions, and we'd like to tell them "pass around these nicer Python callables instead, honestly, the penalty is only 2 ns per call". (*Regardless* of how you use them, like making sure you use them in a loop where we can statically pull out the function pointer acquisition. Saying "this is only non-sluggish if you do x, y, z puts users off.) I'm not asking you to consider the details of all that. Just to allow some kind of high-performance extensibility of PyTypeObject, so that we can *stop* bothering python-dev with specific requirements from our parallel universe of nearly-all-Cython-and-Fortran-and-C++ codebases :-) Dag
- Previous message: [Python-Dev] C-level duck typing
- Next message: [Python-Dev] C-level duck typing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list