FEAT: Comprehensive Logging Framework with Python-C++ Bridge & Instrumentation by bewithgaurav · Pull Request #312 · microsoft/mssql-python
- Changed log filename format: timestamp now has no separator (YYYYMMDDHHMMSS) - Added CSV header with metadata: script name, PID, Python version, OS info - Converted log format to CSV: Timestamp, ThreadID, Level, Location, Source, Message - Replaced trace ID with OS native thread ID for debugger compatibility - Updated Python formatter to parse [Python]/[DDBC] tags into Source column - Updated C++ logger_bridge to use makeRecord() for proper file/line attribution - Logs are now easily parseable as CSV for analysis in Excel/pandas - Counter logic for connection/cursor tracking kept internally but not displayed
- Updated LOGGING.md: * Changed log format examples from trace ID to CSV format * Updated filename format (YYYYMMDDHHMMSS with no separators) * Replaced trace ID section with Thread Tracking section * Added CSV parsing examples with pandas * Updated all log output samples to show CSV columns - Updated MSSQL-Python-Logging-Design.md: * Changed file handler config to describe CSV format * Replaced Trace ID System with Thread Tracking System * Updated architecture to reflect OS native thread IDs * Added CSV formatter implementation details * Updated all code examples to use setup_logging() API * Changed log output examples to CSV format - Thread tracking now uses OS native thread IDs (threading.get_native_id()) - CSV columns: Timestamp, ThreadID, Level, Location, Source, Message - File header includes metadata (PID, script name, Python version, etc.) - Easy analysis with pandas/Excel/CSV tools
- CSV format now mentioned only once as optional import capability - Focus on log structure and content, not format - Removed repetitive CSV parsing examples - Single section 'Importing Logs as CSV (Optional)' in LOGGING.md - Brief mention in design doc that format is importable as CSV
Features: - Whitelist log file extensions: .txt, .log, .csv only - Raise ValueError for invalid extensions - Export driver_logger for use in application code - Allow apps to use mssql-python's logger: from mssql_python.logging import driver_logger - Updated documentation with usage examples - Added validation in _setLevel() method Benefits: - Prevents accidental use of wrong file types - Clear error messages for invalid extensions - Unified logging - apps can use same logger as driver - Same format and thread tracking for app logs
- Added 'Executing query:' log at start of execute() method - Removed duplicate log statement that was causing queries to appear twice - executemany() uses existing detailed log (shows parameter set count) - Each query now logged exactly once at DEBUG level - Parameters excluded from basic query log (pending PII review)
… (Linux/macOS) Problem: - Linux/macOS performed double conversion for NVARCHAR columns - SQLWCHAR → std::wstring (via SQLWCHARToWString) → Python unicode - Created unnecessary intermediate std::wstring allocation Solution: - Use PyUnicode_DecodeUTF16() to convert UTF-16 directly to Python unicode - Single-step conversion eliminates intermediate allocation - Platform-specific optimization (Linux/macOS only) Impact: - Reduces memory allocations for wide-character string columns - Eliminates one full conversion step per NVARCHAR cell - Regular VARCHAR/CHAR columns unchanged (already optimal)
… (Linux/macOS) Problem: - Linux/macOS performed double conversion for NVARCHAR columns - SQLWCHAR → std::wstring (via SQLWCHARToWString) → Python unicode - Created unnecessary intermediate std::wstring allocation Solution: - Use PyUnicode_DecodeUTF16() to convert UTF-16 directly to Python unicode - Single-step conversion eliminates intermediate allocation - Platform-specific optimization (Linux/macOS only) Impact: - Reduces memory allocations for wide-character string columns - Eliminates one full conversion step per NVARCHAR cell - Regular VARCHAR/CHAR columns unchanged (already optimal)
Problem: - All numeric conversions used pybind11 wrappers with overhead: * Type detection, wrapper object creation, bounds checking * ~20-40 CPU cycles overhead per cell Solution: - Use direct Python C API calls: * PyLong_FromLong/PyLong_FromLongLong for integers * PyFloat_FromDouble for floats * PyBool_FromLong for booleans * PyList_SET_ITEM macro (no bounds check - list pre-sized) Changes: - SQL_INTEGER, SQL_SMALLINT, SQL_BIGINT, SQL_TINYINT → PyLong_* - SQL_BIT → PyBool_FromLong - SQL_REAL, SQL_DOUBLE, SQL_FLOAT → PyFloat_FromDouble - Added explicit NULL handling for each type Impact: - Eliminates pybind11 wrapper overhead for simple numeric types - Direct array access via PyList_SET_ITEM macro - Affects 7 common numeric SQL types
Problem: -------- Column metadata (dataType, columnSize, isLob, fetchBufferSize) was accessed from the columnInfos vector inside the hot row processing loop. For a query with 1,000 rows × 10 columns, this resulted in 10,000 struct field accesses. Each access involves: - Vector bounds checking - Large struct loading (~50+ bytes per ColumnInfo) - Poor cache locality (struct fields scattered in memory) - Cost: ~10-15 CPU cycles per access (L2 cache misses likely) Solution: --------- Prefetch metadata into tightly-packed local arrays before the row loop: - std::vector<SQLSMALLINT> dataTypes (2 bytes per element) - std::vector<SQLULEN> columnSizes (8 bytes per element) - std::vector<uint64_t> fetchBufferSizes (8 bytes per element) - std::vector<bool> isLobs (1 byte per element) Total: ~190 bytes for 10 columns vs 500+ bytes with structs. These arrays stay hot in L1 cache for the entire batch processing, eliminating repeated struct access overhead. Changes: -------- - Added 4 prefetch vectors before row processing loop - Added prefetch loop to populate metadata arrays (read columnInfos once) - Replaced all columnInfos[col-1].field accesses with array lookups - Updated SQL_CHAR/SQL_VARCHAR cases - Updated SQL_WCHAR/SQL_WVARCHAR cases - Updated SQL_BINARY/SQL_VARBINARY cases Impact: ------- - Eliminates O(rows × cols) metadata lookups - 10,000 array accesses @ 3-5 cycles vs 10,000 struct accesses @ 10-15 cycles - ~70% reduction in metadata access overhead - Better L1 cache utilization (190 bytes vs 500+ bytes) - Expected 15-25% overall performance improvement on large result sets
bewithgaurav
changed the title
FEAT: Logging Framework
FEAT: Comprehensive Logging Framework with Python-C++ Bridge & Instrumentation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters