Application Binary Interface - D Programming Language

Contents

A D implementation that conforms to the D ABI (Application Binary Interface) will be able to generate libraries, DLLs, etc., that can interoperate with D binaries built by other implementations.

C ABI

The C ABI referred to in this specification means the C Application Binary Interface of the target system. C and D code should be freely linkable together, in particular, D code shall have access to the entire C ABI runtime library.

Endianness

The endianness (byte order) of the layout of the data will conform to the endianness of the target machine. The Intel x86 CPUs are little endian meaning that the value 0x0A0B0C0D is stored in memory as: 0D 0C 0B 0A.

Basic Types

bool	8 bit byte with the values 0 for false and 1 for true
byte	8 bit signed value
ubyte	8 bit unsigned value
short	16 bit signed value
ushort	16 bit unsigned value
int	32 bit signed value
uint	32 bit unsigned value
long	64 bit signed value
ulong	64 bit unsigned value
cent	128 bit signed value
ucent	128 bit unsigned value
float	32 bit IEEE 754 floating point value
double	64 bit IEEE 754 floating point value
real	implementation defined floating point value, for x86 it is 80 bit IEEE 754 extended real
char	8 bit unsigned value
wchar	16 bit unsigned value
dchar	32 bit unsigned value

Delegates

Delegates are fat pointers with two parts:

Delegate Layout
offset	property	contents
0	.ptr	context pointer
ptrsize	.funcptr	pointer to function

The context pointer can be a class this reference, a struct this pointer, a pointer to a closure (nested functions) or a pointer to an enclosing function's stack frame (nested functions).

Structs and Unions

These conform to the target's C ABI struct layout, except:

An extern(D) struct or union with no fields has size 1, not 0. See Struct Layout.
Unions are name-mangled as if they were structs.
The rest of this document treats unions extrinsically the same as structs.

Classes

An object consists of:

Class Object Layout
size	property	contents
ptrsize	.__vptr	pointer to vtable
ptrsize	.__monitor	monitor
ptrsize...		vptrs for any interfaces implemented by this class in left to right, most to least derived, order
...	...	super's non-static fields and super's interface vptrs, from least to most derived
...	named fields	non-static fields

The vtable consists of:

Virtual Function Pointer Table Layout
size	contents
ptrsize	pointer to instance of TypeInfo
ptrsize...	pointers to virtual member functions

Casting a class object to an interface consists of adding the offset of the interface's corresponding vptr to the address of the base of the object. Casting an interface ptr back to the class type it came from involves getting the correct offset to subtract from it from the object.Interface entry at vtbl[0]. Adjustor thunks are created and pointers to them stored in the method entries in the vtbl[] in order to set the this pointer to the start of the object instance corresponding to the implementing method.

An adjustor thunk looks like:

  ADD EAX,offset
  JMP method

The leftmost side of the inheritance graph of the interfaces all share their vptrs, this is the single inheritance model. Every time the inheritance graph forks (for multiple inheritance) a new vptr is created and stored in the class' instance. Every time a virtual method is overridden, a new vtbl[] must be created with the updated method pointers in it.

The class definition:

class XXXX
{
    ....
};

Generates the following:

An instance of Class called ClassXXXX.
A type called StaticClassXXXX which defines all the static members.
An instance of StaticClassXXXX called StaticXXXX for the static members.

Interfaces

An interface is a pointer to a pointer to a vtbl[]. The vtbl[0] entry is a pointer to the corresponding instance of the object.Interface class. The rest of the vtbl[1..$] entries are pointers to the virtual functions implemented by that interface, in the order that they were declared.

A COM interface differs from a regular interface in that there is no object.Interface entry in vtbl[0]; the entries vtbl[0..$] are all the virtual function pointers, in the order that they were declared. This matches the COM object layout used by Windows.

A C++ interface differs from a regular interface in that it matches the layout of a C++ class using single inheritance on the target machine.

Arrays

A dynamic array consists of:

Dynamic Array Layout
offset	property	contents
0	.length	array dimension
size_t	.ptr	pointer to array data

A dynamic array is declared as:

type[] array;

whereas a static array is declared as:

type[dimension] array;

Thus, a static array always has the dimension statically available as part of the type, and so it is implemented like in C. Static arrays and Dynamic arrays can be easily converted back and forth to each other.

Associative Arrays

Associative arrays consist of a pointer to an opaque, implementation defined type.

The current implementation is contained in and defined by rt/aaA.d.

Reference Types

D has reference types, but they are implicit. For example, classes are always referred to by reference; this means that class instances can never reside on the stack or be passed as function parameters.

Name Mangling

D accomplishes typesafe linking by mangling a D identifier to include scope and type information.

MangledName:
    _D QualifiedName Type
    _D QualifiedName Z        // Internal

The Type above is the type of a variable or the return type of a function. This is never a TypeFunction, as the latter can only be bound to a value via a pointer to a function or a delegate.

QualifiedName:
    SymbolFunctionName
    SymbolFunctionName QualifiedName

SymbolFunctionName:
    SymbolName
    SymbolName TypeFunctionNoReturn
    SymbolName M TypeModifiers_opt TypeFunctionNoReturn

The M means that the symbol is a function that requires a this pointer. Class or struct fields are mangled without M. To disambiguate M from being a Parameter with modifier scope, the following type needs to be checked for being a TypeFunction.

SymbolName:
    LName
    TemplateInstanceName
    IdentifierBackRef
    0                         // anonymous symbols

Template Instance Names have the types and values of its parameters encoded into it:

TemplateInstanceName:
    TemplateID LName TemplateArgs Z

TemplateID:
    __T
    __U        // for symbols declared inside template constraint

TemplateArgs:
    TemplateArg
    TemplateArg TemplateArgs

TemplateArg:
    TemplateArgX
    H TemplateArgX

If a template argument matches a specialized template parameter, the argument is mangled with prefix H.

TemplateArgX:
    T Type
    V Type Value
    S QualifiedName
    X Number ExternallyMangledName

ExternallyMangledName can be any series of characters allowed on the current platform, e.g. generated by functions with C++ linkage or annotated with pragma(mangle,...).

Values:
    Value
    Value Values

Value:
    n
    i Number
    N Number
    e HexFloat
    c HexFloat c HexFloat
    CharWidth Number _ HexDigits
    A Number Values
    S Number Values
    f MangledName

HexFloat:
    NAN
    INF
    NINF
    N HexDigits P Exponent
    HexDigits P Exponent

Exponent:
    N Number
    Number

HexDigits:
    HexDigit
    HexDigit HexDigits

HexDigit:
    Digit
    A
    B
    C
    D
    E
    F

CharWidth:
    a
    w
    d

n: is for null arguments.
i Number: is for positive numeric literals (including character literals).
N Number: is for negative numeric literals.
e HexFloat: is for real and imaginary floating point literals.
c HexFloat c HexFloat: is for complex floating point literals.
CharWidth Number _ HexDigits: CharWidth is whether the characters are 1 byte (a), 2 bytes (w) or 4 bytes (d) in size. Number is the number of characters in the string. The HexDigits are the hex data for the string.
A Number Values: An array or asssociative array literal. Number is the length of the array. Value is repeated Number times for a normal array, and 2 * Number times for an associative array.
S Number Values: A struct literal. Value is repeated Number times.

Name:
    Namestart
    Namestart Namechars

Namestart:
    _
    Alpha

Namechar:
    Namestart
    Digit

Namechars:
    Namechar
    Namechar Namechars

A Name is a standard D identifier.

LName:
    Number Name
    Number __S Number    // function-local parent symbols

Number:
    Digit
    Digit Number

Digit:
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9

An LName is a name preceded by a Number giving the number of characters in the Name.

Back references

Any LName or non-basic Type (i.e. any type that does not encode as a fixed one or two character sequence) that has been emitted to the mangled symbol before will not be emitted again, but is referenced by a special sequence encoding the relative position of the original occurrence in the mangled symbol name.

Numbers in back references are encoded with base 26 by upper case letters A - Z for higher digits but lower case letters a - z for the last digit.

TypeBackRef:
    Q NumberBackRef

IdentifierBackRef:
    Q NumberBackRef

NumberBackRef:
    lower-case-letter
    upper-case-letter NumberBackRef

To distinguish between the type of the back reference a look-up of the back referenced character is necessary: An identifier back reference always points to a digit 0 to 9, while a type back reference always points to a letter.

Type Mangling

Types are mangled using a simple linear scheme:

Type:
    TypeModifiers_opt TypeX
    TypeBackRef

TypeX:
    TypeArray
    TypeStaticArray
    TypeAssocArray
    TypePointer
    TypeFunction
    TypeIdent
    TypeClass
    TypeStruct
    TypeEnum
    TypeTypedef
    TypeDelegate
    TypeVoid
    TypeByte
    TypeUbyte
    TypeShort
    TypeUshort
    TypeInt
    TypeUint
    TypeLong
    TypeUlong
    TypeCent
    TypeUcent
    TypeFloat
    TypeDouble
    TypeReal
    TypeIfloat
    TypeIdouble
    TypeIreal
    TypeCfloat
    TypeCdouble
    TypeCreal
    TypeBool
    TypeChar
    TypeWchar
    TypeDchar
    TypeNoreturn
    TypeNull
    TypeTuple
    TypeVector

TypeModifiers:
    Const
    Wild
    Wild Const
    Shared
    Shared Const
    Shared Wild
    Shared Wild Const
    Immutable

Shared:
    O

Const:
    x

Immutable:
    y

Wild:
    Ng

TypeArray:
    A Type

TypeStaticArray:
    G Number Type

TypeAssocArray:
    H Type Type

TypePointer:
    P Type

TypeVector:
    Nh Type

TypeFunction:
    TypeFunctionNoReturn Type

TypeFunctionNoReturn:
    CallConvention FuncAttrs_opt Parameters_opt ParamClose

CallConvention:
    F       // D
    U       // C
    W       // Windows
    R       // C++
    Y       // Objective-C

FuncAttrs:
    FuncAttr
    FuncAttr FuncAttrs

FuncAttr:
    FuncAttrPure
    FuncAttrNothrow
    FuncAttrRef
    FuncAttrProperty
    FuncAttrNogc
    FuncAttrReturn
    FuncAttrScope
    FuncAttrTrusted
    FuncAttrSafe
    FuncAttrLive

Function attributes are emitted in the order as listed above, with the exception of return and scope. return comes before scope when this is a return scope parameter, and after scope when this is a scope and return ref parameter.

FuncAttrPure:
    Na

FuncAttrNogc:
    Ni

FuncAttrNothrow:
    Nb

FuncAttrProperty:
    Nd

FuncAttrRef:
    Nc

FuncAttrReturn:
    Nj

FuncAttrScope:
    Nl

FuncAttrTrusted:
    Ne

FuncAttrSafe:
    Nf

FuncAttrLive:
    Nm

Parameters:
    Parameter
    Parameter Parameters

Parameter:
    Parameter2
    M Parameter2     // scope
    Nk Parameter2    // return

Parameter2:
    Type
    I Type     // in
    J Type     // out
    K Type     // ref
    L Type     // lazy

ParamClose:
    X     // variadic T t...) style
    Y     // variadic T t,...) style
    Z     // not variadic

TypeIdent:
    I QualifiedName

TypeClass:
    C QualifiedName

TypeStruct:
    S QualifiedName

TypeEnum:
    E QualifiedName

TypeTypedef:
    T QualifiedName

TypeDelegate:
    D TypeModifiers_opt TypeFunction

TypeVoid:
    v

TypeByte:
    g

TypeUbyte:
    h

TypeShort:
    s

TypeUshort:
    t

TypeInt:
    i

TypeUint:
    k

TypeLong:
    l

TypeUlong:
    m

TypeCent:
    zi

TypeUcent:
    zk

TypeFloat:
    f

TypeDouble:
    d

TypeReal:
    e

TypeIfloat:
    o

TypeIdouble:
    p

TypeIreal:
    j

TypeCfloat:
    q

TypeCdouble:
    r

TypeCreal:
    c

TypeBool:
    b

TypeChar:
    a

TypeWchar:
    u

TypeDchar:
    w

TypeNoreturn:
    Nn

TypeNull:
    n

TypeTuple:
    B Parameters Z

Function Calling Conventions

The extern (C) and extern (D) calling convention matches the C calling convention used by the supported C compiler on the host system. Except that the extern (D) calling convention for Windows x86 is described here.

Register Conventions

EAX, ECX, EDX are scratch registers and can be destroyed by a function.
EBX, ESI, EDI, EBP must be preserved across function calls.
EFLAGS is assumed destroyed across function calls, except for the direction flag which must be forward.
The FPU stack must be empty when calling a function.
The FPU control word must be preserved across function calls.
Floating point return values are returned on the FPU stack. These must be cleaned off by the caller, even if they are not used.

Return Value

The types bool, byte, ubyte, short, ushort, int, uint, pointer, Object, and interfaces are returned in EAX.
long and ulong are returned in EDX,EAX, where EDX gets the most significant half.
float, double, real, ifloat, idouble, ireal are returned in ST0.
cfloat, cdouble, creal are returned in ST1,ST0 where ST1 is the real part and ST0 is the imaginary part.
Dynamic arrays are returned with the pointer in EDX and the length in EAX.
Associative arrays are returned in EAX.
References are returned as pointers in EAX.
Delegates are returned with the pointer to the function in EDX and the context pointer in EAX.
1, 2 and 4 byte structs and static arrays are returned in EAX.
8 byte structs and static arrays are returned in EDX,EAX, where EDX gets the most significant half.
For other sized structs and static arrays, the return value is stored through a hidden pointer passed as an argument to the function.
Constructors return the this pointer in EAX.

Parameters

The parameters to the non-variadic function:

foo(a1, a2, ..., an);

are passed as follows:

where hidden is present if needed to return a struct value, and this is present if needed as the this pointer for a member function or the context pointer for a nested function.

The last parameter is passed in EAX rather than being pushed on the stack if the following conditions are met:

It fits in EAX.
It is not a 3 byte struct.
It is not a floating point type.

Parameters are always pushed as multiples of 4 bytes, rounding upwards, so the stack is always aligned on 4 byte boundaries. They are pushed most significant first. out and ref are passed as pointers. Static arrays are passed as pointers to their first element. On Windows, a real is pushed as a 10 byte quantity, a creal is pushed as a 20 byte quantity. On Linux, a real is pushed as a 12 byte quantity, a creal is pushed as two 12 byte quantities. The extra two bytes of pad occupy the ‘most significant’ position.

The callee cleans the stack.

The parameters to the variadic function:

void foo(int p1, int p2, int[] p3...)
foo(a1, a2, ..., an);

are passed as follows:

The variadic part is converted to a dynamic array and the rest is the same as for non-variadic functions.

The parameters to the variadic function:

void foo(int p1, int p2, ...)
foo(a1, a2, a3, ..., an);

are passed as follows:

...

_arguments

hidden

this

The caller is expected to clean the stack. _argptr is not passed, it is computed by the callee.

Exception Handling

Windows 32 bit

Conforms to the Microsoft Windows Structured Exception Handling conventions.

Linux, FreeBSD and OS X

Conforms to the DWARF (debugging with attributed record formats) Exception Handling conventions.

Windows 64 bit

Uses static address range/handler tables. It is not compatible with the MSVC x64 exception handling tables. The stack is walked assuming it uses the EBP/RBP stack frame convention. The EBP/RBP convention must be used for every function that has an associated EH (Exception Handler) table.

For each function that has exception handlers, an EH table entry is generated.

EH Table Entry
field	description
void*	pointer to start of function
DHandlerTable*	pointer to corresponding EH data
uint	size in bytes of the function

The EH table entries are placed into the following special segments, which are concatenated by the linker.

EH Table Segment
Operating System	Segment Name
Win32	FI
Win64	._deh$B
Linux	.deh_eh
FreeBSD	.deh_eh
OS X	__deh_eh, __DATA

The rest of the EH data can be placed anywhere, it is immutable.

DHandlerTable
field	description
void*	pointer to start of function
uint	offset of ESP/RSP from EBP/RBP
uint	offset from start of function to return code
uint	number of entries in DHandlerInfo[]
DHandlerInfo[]	array of handler information

DHandlerInfo
field	description
uint	offset from function address to start of guarded section
uint	offset of end of guarded section
int	previous table index
uint	if != 0 offset to DCatchInfo data from start of table
void*	if not null, pointer to finally code to execute

DCatchInfo
field	description
uint	number of entries in DCatchBlock[]
DCatchBlock[]	array of catch information

void*

DCatchBlock
field	description
ClassInfo	catch type
uint	offset from EBP/RBP to catch variable

, catch handler code

Garbage Collection

The interface to this is found in Druntime's core/gc/gcinterface.d.

ModuleInfo Instance

An instance of ModuleInfo is generated by the compiler and inserted into the object file for every module. ModuleInfo contains information about the module that is useful to the D runtime library:

If the module has a static constructor, static destructor, shared static constructor, or shared static destructor.
A reference to any unit tests defined by the module.
An array of references to any imported modules that have one or more of:
1. static constructors
2. static destructors
3. shared static constructors
4. shared static destructors
5. unit tests
6. transitive imports of any module that contains one or more of 1..5
7. order independent constructors (currently needed for implementing -cov)
This enables the runtime to run the unit tests, the module constructors in a depth-first order, and the module destructors in the reverse order.
An array of references to ClassInfo for each class defined in the module.
Note: this feature may be removed.

ModuleInfo is defined in Druntime's object.d, which must match the compiler's output in both the values of flags and layout of fields.

Modules compiled with -betterC do not have a ModuleInfo instance generated, because such modules must work without the D runtime library. Similarly, ImportC modules do not generate a ModuleInfo.

Module Initialization and Termination

All the static constructors for a module are aggregated into a single function, and a pointer to that function is inserted into the ctor member of the ModuleInfo instance for that module.

All the static destructors for a module are aggregated into a single function, and a pointer to that function is inserted into the dtor member of the ModuleInfo instance for that module.

Unit Testing

All the unit tests for a module are aggregated into a single function, and a pointer to that function is inserted into the unitTest member of the ModuleInfo instance for that module.

Runtime Helper Functions

These are found in Druntime's rt/.

Symbolic Debugging

D has types that are not represented in existing C or C++ debuggers. These are dynamic arrays, associative arrays, and delegates. Representing these types as structs causes problems because function calling conventions for structs are often different than that for these types, which causes C/C++ debuggers to misrepresent things. For these debuggers, they are represented as a C type which does match the calling conventions for the type.

Types for C Debuggers
D type	C representation
dynamic array	unsigned long long
associative array	void*
delegate	long long
dchar	unsigned long

For debuggers that can be modified to accept new types, the following extensions help them fully support the types.

Codeview Debugger Extensions

The D dchar type is represented by the special primitive type 0x78.

D makes use of the Codeview OEM generic type record indicated by LF_OEM (0x0015). The format is:

Codeview OEM Extensions for D
D Type	Leaf Index	OEM Identifier	recOEM	num indices	type index	type index
field size	2	2	2	2	2	2
dynamic array	LF_OEM	OEM	1	2	@index	@element
associative array	LF_OEM	OEM	2	2	@key	@element
delegate	LF_OEM	OEM	3	2	@this	@function

where:


OEM	0x42
index	type index of array index
key	type index of key
element	type index of array element
this	type index of context pointer
function	type index of function

These extensions can be pretty-printed by obj2asm. The Ddbg debugger supports them.