BulkCopy in `mssql-python` - API Spec & Discussion · microsoft/mssql-python · Discussion #414
BulkCopy API Specification
Issue #18
We’re adding a BulkCopy API (cursor.bulkcopy()) to mssql-python for fast, high-throughput inserts into SQL Server.
Due to all the interest, this discussion thread is here for us to understand from you, about your usage scenarios - feel free to comment your suggestions and propose improvements!
If you’re using (or planning to use) bulk copy, feel free to comment with:
- Your use case (ETL, migration, backfill, analytics, etc.)
- Options you ended up using (
batch_size,table_locketc.) - Anything that felt unclear or surprising
Overview
The cursor.bulkcopy() method provides bulk data loading to SQL Server, similar to ADO.NET's SqlBulkCopy. It follows the standard DB-API cursor pattern.
API Revisions
Revision 1 (kwargs dict) — Deprecated
cursor.bulkcopy( table_name: str, data_source: Iterator[Tuple], **kwargs ) -> Dict[str, Any]
Limitation: Poor IDE discoverability - users couldn't see available options without reading docs.
Revision 2 (explicit parameters) ✅ Current
Based on community feedback, all options are now explicit parameters for better IDE autocomplete and discoverability.
cursor.bulkcopy( table_name: str, data_source: Iterator[Tuple], # Performance settings batch_size: int = 1000, timeout: int = 30, # Column mapping column_mappings: Optional[Union[List[str], List[Tuple[int, str]]]] = None, # Bulk copy options keep_identity: bool = False, check_constraints: bool = False, table_lock: bool = False, keep_nulls: bool = False, fire_triggers: bool = False, use_internal_transaction: bool = False ) -> Dict[str, Any]
Changes from Revision 1
| Change | Before | After |
|---|---|---|
| Parameter style | **kwargs |
Explicit parameters |
| IDE support | No autocomplete | Full autocomplete |
column_mappings |
List[Tuple[Union[str, int], str]] |
Union[List[str], List[Tuple[int, str]]] |
Usage Example
import mssql_python conn = mssql_python.connect(connection_string) cursor = conn.cursor() # Prepare data data = [ (1, "Alice", 100.5), (2, "Bob", 200.75), (3, "Charlie", 300.25), ] # All options visible in IDE autocomplete result = cursor.bulkcopy( "users", data, batch_size=5000, timeout=60, table_lock=True, fire_triggers=True ) print(f"Copied {result['rows_copied']} rows in {result['elapsed_time']:.2f}s")
Parameters Reference
Required Parameters
| Parameter | Type | Description |
|---|---|---|
table_name |
str |
Name of the destination table (can include schema: 'dbo.MyTable') |
data_source |
Iterator[Tuple] |
Iterator yielding tuples of row data |
Performance Settings
| Parameter | Type | Default | Description |
|---|---|---|---|
batch_size |
int |
None (server optimal) |
Number of rows per batch |
timeout |
int |
30 |
Operation timeout in seconds (0 = no timeout) |
batch_size guidelines:
- Smaller values (100-500): Lower memory usage, more network round trips
- Larger values (5000-10000): Higher memory usage, better throughput
- Recommended:
1000–10000depending on row size
Column Mapping
| Parameter | Type | Default | Description |
|---|---|---|---|
column_mappings |
Union[List[str], List[Tuple[int, str]]] |
None |
Maps source columns to destination column names |
Two formats supported:
Simple Format: List[str]
Just list destination column names in order — position in list = source index.
# Source data: (id, first_name, email) # Maps: index 0 → UserID, index 1 → FirstName, index 2 → Email column_mappings = ['UserID', 'FirstName', 'Email']
Advanced Format: List[Tuple[int, str]]
Explicit index mapping — allows skipping or reordering columns.
# Source data: (id, first_name, last_name, email) # Skip last_name (index 2), reorder as needed column_mappings = [ (0, 'UserID'), # source index 0 → UserID (1, 'FirstName'), # source index 1 → FirstName (3, 'Email'), # source index 3 → Email (skipped index 2) ]
When omitted: Columns are mapped by ordinal position (first source column → first table column, etc.)
Bulk Copy Options
| Parameter | Type | Default | Description |
|---|---|---|---|
keep_identity |
bool |
False |
Preserve source identity values (equivalent to SET IDENTITY_INSERT ON) |
check_constraints |
bool |
False |
Enforce table constraints during bulk copy |
table_lock |
bool |
False |
Acquire table-level lock (reduces locking overhead) |
keep_nulls |
bool |
False |
Preserve explicit NULL values instead of applying column defaults |
fire_triggers |
bool |
False |
Fire INSERT triggers during bulk copy |
use_internal_transaction |
bool |
False |
Wrap operation in internal transaction (auto-rollback on error) |
Return Value
Returns a dictionary containing operation statistics:
{
'rows_copied': 10000, # Number of rows successfully copied
'batch_count': 10, # Number of batches processed
'elapsed_time': 1.5, # Total elapsed time (seconds)
'rows_per_second': 6666.67 # Throughput
}Feedback
We're actively developing this API based on community feedback!