Implementation

Be sure to review Microsoft Learn: Library Internals.

Compiler conformance

For Visual C++, the projects make use of the default C++11/C++14 mode rather than /std:c++17 mode. The library does not make use of newer C++17 language & library features such as string_view, static_assert without a message, etc. although that may change in the future. The projects make use of /Wall, /permissive-, /Zc:__cplusplus, and /analyze to ensure a high-level of C++ conformance.

For clang/LLVM for Windows, there is a CMakeList.txt provided to validate the code and ensure a high-level of conformance. This primarily means addressing warnings generated using /Wall -Wpedantic -Wextra.

Language extensions

DirectXMath is written using standard Intel-style intrinsics, which should be portable to other compilers. The ARM and ARM64 codepaths use ARM-style intrinsics (earlier versions of the library used Visual C++ specific __n64 and __n128), so these are also portable.

The DirectXMath library make use of two commonly implemented extensions to Standard C++:

anonymous structs, which are widely supported and are part of the C11 standard. Note that the library also uses anonymous unions, but these are part of the C++ and C99 standard.
#pragma once rather than old-style #define based guards, but are widely supported

Because of these, DirectXMath is not compatible with Visual C++'s /Za switch which enforces ISO C89 / C++11. It does work with /permissive-.

Naming conventions

PascalCase for class names, methods, functions, and enums.
camelCase for class member variables, struct members
UPPERCASE for preprocessor defines (and nameless enums)

The library does not generally make use of Hungarian notation which as been deprecated for Win32 C++ APIs for many years, with the exception of a few uses of p for pointers and sz for strings.

Type usage

The use of Standard C++ types is preferred including the fundamental types supplied by the language (i.e. int, unsigned int, size_t, ptrdiff_t, bool, true/false, char, wchar_t) with the addition of the C99 fixed width types (i.e. uint32_t, uint64_t, intptr_t, uintptr_t, etc.)

Avoid using Windows "portability" types except when dealing directly with Win32 APIs: VOID, UINT, INT, DWORD, FLOAT, BOOL, TRUE/FALSE, WCHAR, CONST, etc.

Error reporting

As a low-level math library, DirectXMath does not make use of C++ exception handling or HRESULT COM-style error values. Generally, parameter validation is limited to assert macros. All functions should be annotated with noexcept.

SAL annotation

The DirectXMath library makes extensive use of SAL2 annotations (_In_, _Outptr_opt_, etc.) which greatly improves the accuracy of the Visual C++ static code analysis (also known as PREFAST). The standard Windows headers #define them all to empty strings if not building with /analyze, so they have no effect on code-generation.

Calling-conventions

One of the more complicated aspects of DirectXMath's implementation is implementing the various calling-conventions optimally for SIMD which changes per architecture. This is detailed on Microsoft Learn.

128-bit SIMD

XMVECTOR XM_CALLCONV XMVectorHermite(FXMVECTOR Position0, FXMVECTOR Tangent0, FXMVECTOR Position1, GXMVECTOR Tangent1, float t) noexcept;

XMVECTOR is the standard 128-bit SIMD register type, and we return it by value.
XM_CALLCONV is set to __vectorcall where supported, __fastcall otherwise unless the target compiler doesn't support it.
FXMVECTOR is used for the first three SIMD parameters to support SIMD-passing behavior for _fastcall.
GXMVECTOR is used for the fourth SIMD parameter to support _vectorcall and the ARM ABI passing of the first four SIMD registers.
HXMVECTOR is used for the fifth and six SIMD parameter to support _vectorcall.
CXMVECTOR is used for all remaining SIMD registers which passes by 'const ref'.

In configurations where the platform doesn't support 6 SIMD registers, the types are equivalent to CXMMVECTOR.

4x4 Matrix

XMVECTOR XM_CALLCONV XMVector3Project(FXMVECTOR V, float ViewportX, float ViewportY, float ViewportWidth, float ViewportHeight, float ViewportMinZ, float ViewportMaxZ, FXMMATRIX Projection, CXMMATRIX View, CXMMATRIX World) noexcept;

Because of heterogeneous vector aggregates a matrix which consists of 4 SIMD values can be passed as if it were 4 individual SIMD values.

FXMMATRIX generally this is used if there are 0, 1, or 2 XMVECTOR parameters preceding the matrix.
CXMMATRIX is sued for all other matrix parameters which passes by 'const ref'.

Compiler directives

DirectXMath makes use of many preprocessor defines to target many different instruction sets and architectures.

A full table of defines can be found on Microsoft Learn.

inline XMVECTOR XM_CALLCONV XMVectorRound(FXMVECTOR V) noexcept
{
#if defined(_XM_NO_INTRINSICS_)

    XMVECTORF32 Result = { { {
            MathInternal::round_to_nearest(V.vector4_f32[0]),
            MathInternal::round_to_nearest(V.vector4_f32[1]),
            MathInternal::round_to_nearest(V.vector4_f32[2]),
            MathInternal::round_to_nearest(V.vector4_f32[3])
        } } };
    return Result.v;

#elif defined(_XM_ARM_NEON_INTRINSICS_)
#if defined(_M_ARM64) || defined(_M_HYBRID_X86_ARM64) || defined(_M_ARM64EC) || __aarch64__

    // ARM_NEON v8 implementation

#else

    // ARM-NEON v7 implementation

#endif
#elif defined(_XM_SSE4_INTRINSICS_)

    // SSE 4.1 implementation

#elif defined(_XM_SSE_INTRINSICS_)

    // SSE/SSE2 implementation (the minimum required for x86/x64)

#endif
}

Instruction Set Usage

See this blog series for more details on how each is applied to DirectXMath:

Implementation macros

XM_ALIGNED_DATA is used to declare aligned data variables.
XM_ALIGNED_STRUCT is used to declare an aligned struct.

x86/x64

XM_STREAM_PS, XM256_STREAM_PS, and XM_SFENCE which are controlled by the _XM_NO_MOVNT_ define.
XM_PERMUTE_PS is _mm_permute_ps when building for AVX and _mm_shuffle_ps when building for SSE/SSE2.
XM_FMADD_PS and XM_FNMADD_PS which are controlled by the use of FMA3 or not.
XM_LOADU_SI16 is a fix-up for older versions of GNUC which were missing _mm_loadu_si16.

ARM/ARM64

XM_PREFETCH is __prefetch or __builtin_prefetch for ARM/ARM64.