> sys.getsizeof(3.14) is 24. And it becomes 32 byte in 16byte aligned pymalloc. (+33%)
I've been doing some reading and trying to understand this issue. My understanding is that malloc() needs to return pointers that are 16-byte aligned on AMD64 but, in general, pointers don't have the be aligned that way. If you have a structure that contains a "long double" then that member also has to be 16-bit aligned.
It seems to me that we don't need to have the PyObject structure containing a Python float to be 16-byte aligned. If so, could we introduce a new obmalloc API that returns memory with 8-byte alignment, for use by objects that know they don't require 16-byte alignment? floatobject.c could use this API to avoid the 33% overhead.
The new obmalloc API could initially be internal use only until we can come up with a design we know we can live with long term. |