BulkCopy in `mssql-python` - API Spec & Discussion · microsoft/mssql-python · Discussion #414

BulkCopy API Specification

Issue #18

We’re adding a BulkCopy API (cursor.bulkcopy()) to mssql-python for fast, high-throughput inserts into SQL Server.

Due to all the interest, this discussion thread is here for us to understand from you, about your usage scenarios - feel free to comment your suggestions and propose improvements!

If you’re using (or planning to use) bulk copy, feel free to comment with:

  • Your use case (ETL, migration, backfill, analytics, etc.)
  • Options you ended up using (batch_size, table_lock etc.)
  • Anything that felt unclear or surprising

Overview

The cursor.bulkcopy() method provides bulk data loading to SQL Server, similar to ADO.NET's SqlBulkCopy. It follows the standard DB-API cursor pattern.


API Revisions

Revision 1 (kwargs dict) — Deprecated

cursor.bulkcopy(
    table_name: str,
    data_source: Iterator[Tuple],
    **kwargs
) -> Dict[str, Any]

Limitation: Poor IDE discoverability - users couldn't see available options without reading docs.


Revision 2 (explicit parameters) ✅ Current

Based on community feedback, all options are now explicit parameters for better IDE autocomplete and discoverability.

cursor.bulkcopy(
    table_name: str,
    data_source: Iterator[Tuple],

    # Performance settings
    batch_size: int = 1000,
    timeout: int = 30,

    # Column mapping
    column_mappings: Optional[Union[List[str], List[Tuple[int, str]]]] = None,

    # Bulk copy options
    keep_identity: bool = False,
    check_constraints: bool = False,
    table_lock: bool = False,
    keep_nulls: bool = False,
    fire_triggers: bool = False,
    use_internal_transaction: bool = False
) -> Dict[str, Any]

Changes from Revision 1

Change Before After
Parameter style **kwargs Explicit parameters
IDE support No autocomplete Full autocomplete
column_mappings List[Tuple[Union[str, int], str]] Union[List[str], List[Tuple[int, str]]]

Usage Example

import mssql_python

conn = mssql_python.connect(connection_string)
cursor = conn.cursor()

# Prepare data
data = [
    (1, "Alice", 100.5),
    (2, "Bob", 200.75),
    (3, "Charlie", 300.25),
]

# All options visible in IDE autocomplete
result = cursor.bulkcopy(
    "users",
    data,
    batch_size=5000,
    timeout=60,
    table_lock=True,
    fire_triggers=True
)

print(f"Copied {result['rows_copied']} rows in {result['elapsed_time']:.2f}s")

Parameters Reference

Required Parameters

Parameter Type Description
table_name str Name of the destination table (can include schema: 'dbo.MyTable')
data_source Iterator[Tuple] Iterator yielding tuples of row data

Performance Settings

Parameter Type Default Description
batch_size int None (server optimal) Number of rows per batch
timeout int 30 Operation timeout in seconds (0 = no timeout)

batch_size guidelines:

  • Smaller values (100-500): Lower memory usage, more network round trips
  • Larger values (5000-10000): Higher memory usage, better throughput
  • Recommended: 1000–10000 depending on row size

Column Mapping

Parameter Type Default Description
column_mappings Union[List[str], List[Tuple[int, str]]] None Maps source columns to destination column names

Two formats supported:

Simple Format: List[str]

Just list destination column names in order — position in list = source index.

# Source data: (id, first_name, email)
# Maps: index 0 → UserID, index 1 → FirstName, index 2 → Email

column_mappings = ['UserID', 'FirstName', 'Email']

Advanced Format: List[Tuple[int, str]]

Explicit index mapping — allows skipping or reordering columns.

# Source data: (id, first_name, last_name, email)
# Skip last_name (index 2), reorder as needed

column_mappings = [
    (0, 'UserID'),      # source index 0 → UserID
    (1, 'FirstName'),   # source index 1 → FirstName
    (3, 'Email'),       # source index 3 → Email (skipped index 2)
]

When omitted: Columns are mapped by ordinal position (first source column → first table column, etc.)


Bulk Copy Options

Parameter Type Default Description
keep_identity bool False Preserve source identity values (equivalent to SET IDENTITY_INSERT ON)
check_constraints bool False Enforce table constraints during bulk copy
table_lock bool False Acquire table-level lock (reduces locking overhead)
keep_nulls bool False Preserve explicit NULL values instead of applying column defaults
fire_triggers bool False Fire INSERT triggers during bulk copy
use_internal_transaction bool False Wrap operation in internal transaction (auto-rollback on error)

Return Value

Returns a dictionary containing operation statistics:

{
    'rows_copied': 10000,        # Number of rows successfully copied
    'batch_count': 10,           # Number of batches processed
    'elapsed_time': 1.5,         # Total elapsed time (seconds)
    'rows_per_second': 6666.67   # Throughput
}

Feedback

We're actively developing this API based on community feedback!

cc: @dlevy-msft-sql @saurabh500 @sumitmsft