Adding Microsoft SECURITY.MD by microsoft-github-policy-service[bot] · Pull Request #2 · microsoft/mssql-python

@microsoft-github-policy-service

@microsoft-github-policy-service bot mentioned this pull request

Jan 24, 2025

bewithgaurav added a commit that referenced this pull request

Nov 10, 2025
- Replace pybind11 row[col-1] = value with PyLong_FromLong + PyList_SET_ITEM
- Applies to SQL_INTEGER, SQL_SMALLINT, SQL_BIGINT
- Eliminates pybind11 wrapper overhead, bounds checking, and extra reference counting
- Expected improvement: ~40-100ms for integer-heavy result sets (1.2M rows)
- Added PERF_TIMER for int_c_api_assign, smallint_c_api_assign, bigint_c_api_assign
- PyList_SET_ITEM steals reference (no Py_INCREF needed)

bewithgaurav added a commit that referenced this pull request

Nov 10, 2025
…ypes

- Replaced pybind11 wrappers with direct Python C API calls
- SQL_INTEGER, SQL_SMALLINT, SQL_BIGINT: PyLong_FromLong/PyLong_FromLongLong
- SQL_TINYINT, SQL_BIT: PyLong_FromLong/PyBool_FromLong
- SQL_REAL, SQL_DOUBLE, SQL_FLOAT: PyFloat_FromDouble
- Uses PyList_SET_ITEM macro for direct list assignment (no bounds checking)
- Eliminates pybind11 wrapper overhead for simple numeric types
- Added PERF_TIMER instrumentation for each numeric type conversion

bewithgaurav added a commit that referenced this pull request

Nov 10, 2025
- Created typedef ColumnProcessor for function pointer type
- Added ColumnProcessors namespace with specialized inline processors:
  * ProcessInteger, ProcessSmallInt, ProcessBigInt, ProcessTinyInt, ProcessBit
  * ProcessReal, ProcessDouble
  * ProcessChar, ProcessWChar, ProcessBinary (handle LOBs, NULL, zero-length)
- Added ColumnInfoExt struct to pass metadata efficiently
- Build columnProcessors array once during cache_column_metadata
- Fast path: Direct function call via columnProcessors[col-1] (no switch)
- Slow path: Fallback switch for complex types (DECIMAL, DATETIME, GUID)
- Eliminates switch evaluation from O(rows × columns) to O(columns)
- All processors use direct Python C API from OPT #1 and OPT #2

bewithgaurav added a commit that referenced this pull request

Nov 10, 2025
Problem:
- All numeric conversions used pybind11 wrappers with overhead:
  * Type detection, wrapper object creation, bounds checking
  * ~20-40 CPU cycles overhead per cell

Solution:
- Use direct Python C API calls:
  * PyLong_FromLong/PyLong_FromLongLong for integers
  * PyFloat_FromDouble for floats
  * PyBool_FromLong for booleans
  * PyList_SET_ITEM macro (no bounds check - list pre-sized)

Changes:
- SQL_INTEGER, SQL_SMALLINT, SQL_BIGINT, SQL_TINYINT → PyLong_*
- SQL_BIT → PyBool_FromLong
- SQL_REAL, SQL_DOUBLE, SQL_FLOAT → PyFloat_FromDouble
- Added explicit NULL handling for each type

Impact:
- Eliminates pybind11 wrapper overhead for simple numeric types
- Direct array access via PyList_SET_ITEM macro
- Affects 7 common numeric SQL types

bewithgaurav added a commit that referenced this pull request

Nov 10, 2025
Problem:
--------
Row creation and assignment had multiple layers of overhead:
1. Per-row allocation: py::list(numCols) creates pybind11 wrapper for each row
2. Cell assignment: row[col-1] = value uses pybind11 operator[] with bounds checking
3. Final assignment: rows[i] = row uses pybind11 list assignment with refcount overhead
4. Fragmented allocation: 1,000 separate py::list() calls instead of batch allocation

For 1,000 rows: ~30-50 CPU cycles × 1,000 = 30K-50K wasted cycles

Solution:
---------
Replace pybind11 wrappers with direct Python C API throughout:

1. Row creation: PyList_New(numCols) instead of py::list(numCols)
2. Cell assignment: PyList_SET_ITEM(row, col-1, value) instead of row[col-1] = value
3. Final assignment: PyList_SET_ITEM(rows.ptr(), i, row) instead of rows[i] = row

This completes the transition to direct Python C API started in OPT #2.

Changes:
--------
- Replaced py::list row(numCols) → PyObject* row = PyList_New(numCols)
- Updated all NULL/SQL_NO_TOTAL handlers to use PyList_SET_ITEM
- Updated all zero-length data handlers to use direct Python C API
- Updated string handlers (SQL_CHAR, SQL_WCHAR) to use PyList_SET_ITEM
- Updated complex type handlers (DECIMAL, DATETIME, DATE, TIME, TIMESTAMPOFFSET, GUID, BINARY)
- Updated final row assignment to use PyList_SET_ITEM(rows.ptr(), i, row)

All cell assignments now use direct Python C API:
- Numeric types: Already done in OPT #2 (PyLong_FromLong, PyFloat_FromDouble, etc.)
- Strings: PyUnicode_FromStringAndSize, PyUnicode_FromString
- Binary: PyBytes_FromStringAndSize
- Complex types: .release().ptr() to transfer ownership

Impact:
-------
- ✅ Eliminates pybind11 wrapper overhead for row creation
- ✅ No bounds checking in hot loop (PyList_SET_ITEM is a macro)
- ✅ Clean reference counting (objects created with refcount=1, transferred to list)
- ✅ Consistent with OPT #2 (entire row/cell management via Python C API)
- ✅ Expected 5-10% improvement (smaller than OPT #3, but completes the stack)

All type handlers now bypass pybind11 for maximum performance.