BulkCopy in `mssql-python` - API Spec & Discussion · microsoft/mssql-python

BulkCopy in `mssql-python` - API Spec & Discussion · microsoft/mssql-python · Discussion #414

BulkCopy API Specification

Issue #18

We’re adding a BulkCopy API (cursor.bulkcopy()) to mssql-python for fast, high-throughput inserts into SQL Server.

Due to all the interest, this discussion thread is here for us to understand from you, about your usage scenarios - feel free to comment your suggestions and propose improvements!

If you’re using (or planning to use) bulk copy, feel free to comment with:

Your use case (ETL, migration, backfill, analytics, etc.)
Options you ended up using (batch_size, table_lock etc.)
Anything that felt unclear or surprising

Overview

The cursor.bulkcopy() method provides bulk data loading to SQL Server, similar to ADO.NET's SqlBulkCopy. It follows the standard DB-API cursor pattern.

API Revisions

Revision 1 (kwargs dict) — Deprecated

cursor.bulkcopy(
    table_name: str,
    data_source: Iterator[Tuple],
    **kwargs
) -> Dict[str, Any]

Limitation: Poor IDE discoverability - users couldn't see available options without reading docs.

Revision 2 (explicit parameters) ✅ Current

Based on community feedback, all options are now explicit parameters for better IDE autocomplete and discoverability.

cursor.bulkcopy(
    table_name: str,
    data_source: Iterator[Tuple],

    # Performance settings
    batch_size: int = 1000,
    timeout: int = 30,

    # Column mapping
    column_mappings: Optional[Union[List[str], List[Tuple[int, str]]]] = None,

    # Bulk copy options
    keep_identity: bool = False,
    check_constraints: bool = False,
    table_lock: bool = False,
    keep_nulls: bool = False,
    fire_triggers: bool = False,
    use_internal_transaction: bool = False
) -> Dict[str, Any]

Changes from Revision 1

Change	Before	After
Parameter style	`**kwargs`	Explicit parameters
IDE support	No autocomplete	Full autocomplete
`column_mappings`	`List[Tuple[Union[str, int], str]]`	`Union[List[str], List[Tuple[int, str]]]`

Usage Example

import mssql_python

conn = mssql_python.connect(connection_string)
cursor = conn.cursor()

# Prepare data
data = [
    (1, "Alice", 100.5),
    (2, "Bob", 200.75),
    (3, "Charlie", 300.25),
]

# All options visible in IDE autocomplete
result = cursor.bulkcopy(
    "users",
    data,
    batch_size=5000,
    timeout=60,
    table_lock=True,
    fire_triggers=True
)

print(f"Copied {result['rows_copied']} rows in {result['elapsed_time']:.2f}s")

Parameters Reference

Required Parameters

Parameter	Type	Description
`table_name`	`str`	Name of the destination table (can include schema: `'dbo.MyTable'`)
`data_source`	`Iterator[Tuple]`	Iterator yielding tuples of row data

Performance Settings

Parameter	Type	Default	Description
`batch_size`	`int`	`None` (server optimal)	Number of rows per batch
`timeout`	`int`	`30`	Operation timeout in seconds (`0` = no timeout)

batch_size guidelines:

Smaller values (100-500): Lower memory usage, more network round trips
Larger values (5000-10000): Higher memory usage, better throughput
Recommended: 1000–10000 depending on row size

Column Mapping

Parameter	Type	Default	Description
`column_mappings`	`Union[List[str], List[Tuple[int, str]]]`	`None`	Maps source columns to destination column names

Two formats supported:

Simple Format: `List[str]`

Just list destination column names in order — position in list = source index.

# Source data: (id, first_name, email)
# Maps: index 0 → UserID, index 1 → FirstName, index 2 → Email

column_mappings = ['UserID', 'FirstName', 'Email']

Advanced Format: `List[Tuple[int, str]]`

Explicit index mapping — allows skipping or reordering columns.

# Source data: (id, first_name, last_name, email)
# Skip last_name (index 2), reorder as needed

column_mappings = [
    (0, 'UserID'),      # source index 0 → UserID
    (1, 'FirstName'),   # source index 1 → FirstName
    (3, 'Email'),       # source index 3 → Email (skipped index 2)
]

When omitted: Columns are mapped by ordinal position (first source column → first table column, etc.)

Bulk Copy Options

Parameter	Type	Default	Description
`keep_identity`	`bool`	`False`	Preserve source identity values (equivalent to `SET IDENTITY_INSERT ON`)
`check_constraints`	`bool`	`False`	Enforce table constraints during bulk copy
`table_lock`	`bool`	`False`	Acquire table-level lock (reduces locking overhead)
`keep_nulls`	`bool`	`False`	Preserve explicit `NULL` values instead of applying column defaults
`fire_triggers`	`bool`	`False`	Fire `INSERT` triggers during bulk copy
`use_internal_transaction`	`bool`	`False`	Wrap operation in internal transaction (auto-rollback on error)

Return Value

Returns a dictionary containing operation statistics:

{
    'rows_copied': 10000,        # Number of rows successfully copied
    'batch_count': 10,           # Number of batches processed
    'elapsed_time': 1.5,         # Total elapsed time (seconds)
    'rows_per_second': 6666.67   # Throughput
}

Feedback

We're actively developing this API based on community feedback!

cc: @dlevy-msft-sql @saurabh500 @sumitmsft