sqloxide wraps rust bindings for sqlparser-rs into a python package using PyO3.
The original goal of this project was to have a very fast, efficient, and accurate SQL parser I could use for building data lineage graphs across large code bases (think hundreds of auto-generated .sql files). Most existing sql parsing approaches for python are either very slow or not accurate (especially in regards to deeply nested queries, sub-selects and/or table aliases). Looking to the rust community for support, I found the excellent sqlparser-rs crate which is quite easy to wrap in python code.
Free-threaded Python Support
As of v0.61.0, sqloxide ships wheels for free-threaded Python (3.13t, 3.14t). This means you can parse SQL from multiple threads in parallel with the GIL disabled, no locks needed:
import concurrent.futures from sqloxide import parse_sql queries = [f"SELECT col_{i} FROM table_{i}" for i in range(100)] with concurrent.futures.ThreadPoolExecutor(max_workers=8) as pool: results = list(pool.map(lambda q: parse_sql(sql=q, dialect="generic"), queries))
Install a free-threaded Python with uv:
Installation
Wheels are available on PyPI for Linux (x86_64, i686, aarch64), macOS (x86_64, ARM), and Windows, covering Python 3.9 through 3.14 (including free-threaded 3.13t and 3.14t).
Usage
Parsing
Parsing a SQL query is relatively straight forward:
from sqloxide import parse_sql sql = """ SELECT employee.first_name, employee.last_name, call.start_time, call.end_time, call_outcome.outcome_text FROM employee INNER JOIN call ON call.employee_id = employee.id INNER JOIN call_outcome ON call.call_outcome_id = call_outcome.id ORDER BY call.start_time ASC; """ output = parse_sql(sql=sql, dialect='ansi') print(output) >>> [ { "Query": { "ctes": [], "body": { "Select": { "distinct": false, "top": null, "projection": [ { "UnnamedExpr": { "CompoundIdentifier": [ { "value": "employee", "quote_style": null }, { "value": "first_name", "quote_style": null } ] } }, { "UnnamedExpr": { "CompoundIdentifier": [ { "value": "employee", "quote_style": null }, { "value": "last_name", "quote_style": null } ] } }, { "UnnamedExpr": { "CompoundIdentifier": [ { "value": "call", "quote_style": null }, { "value": "start_time", "quote_style": null } ] } }, { # OUTPUT TRUNCATED
Note that you get back what looks like a JSON document but in actual python types, this is a typed AST that matches the sqlparser-rs AST schema.
We can convert this AST back into a SQL query by running:
from sqloxide import restore_ast query = restore_ast(ast=output) print(query)
This reconstruction is helpful if you want to make manual edits to the AST in python.
AST Rewrites
If you want a more structured approach to AST edits, we also expose APIs that allow you to use the visitor pattern to make query modifications.
Here is an example for mutating a subset of the expressions in the query to be SHOUTING UPPERCASE:
from sqloxide import parse_sql, mutate_expressions sql = "SELECT something from somewhere where something = 1 and something_else = 2" def func(x): if "CompoundIdentifier" in x.keys(): for y in x["CompoundIdentifier"]: y["value"] = y["value"].upper() return x ast = parse_sql(sql=sql, dialect="ansi") result = mutate_expressions(parsed_query=ast, func=func) print(result) --- >>> ['SELECT something FROM somewhere WHERE something = 1 AND something_else = 2']
What if you needed to make a structured edit to the table name in the above query? There is also an API for that as well:
from sqloxide import parse_sql, mutate_relations def func(x): return x.replace("somewhere", "anywhere") result = mutate_relations(parsed_query=ast, func=func) print(result) --- >>> ['SELECT something FROM anywhere WHERE something = 1 AND something_else = 2']
These features combined allow for powerful semantic rewrites of queries, if you have any examples you'd like to share please contribue back to the examples/ folder!
Benchmarks
We run 4 benchmarks, comparing to some python native sql parsing libraries:
test_sqloxide- parse query and get a python object back from rusttest_sqlparser- testing sqlparse, query -> ASTtest_mozsqlparser- testing moz-sql-parser, full roundtrip as in the docs, query -> JSONtest_sqlglot- testing sqlglot, query -> AST
To run them on your machine:
------------------------------------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_sqloxide 29.6800 (1.0) 50.4300 (1.0) 30.6219 (1.0) 0.7367 (1.0) 30.4900 (1.0) 0.2390 (1.0) 527;716 32,656.3811 (1.0) 9099 1
test_sqlglot 365.8420 (12.33) 692.8950 (13.74) 377.2422 (12.32) 11.7692 (15.98) 375.7825 (12.32) 4.3145 (18.05) 62;97 2,650.8168 (0.08) 2260 1
test_sqlparser 1,577.7720 (53.16) 9,751.9699 (193.38) 1,651.5547 (53.93) 355.5511 (482.64) 1,620.7315 (53.16) 30.9200 (129.37) 3;60 605.4901 (0.02) 538 1
test_mozsqlparser 2,793.8400 (94.13) 12,358.7790 (245.07) 3,091.8519 (100.97) 960.4173 (>1000.0) 2,937.6310 (96.35) 243.3220 (>1000.0) 4;4 323.4308 (0.01) 316 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Example
The depgraph example reads a bunch of .sql files from disk using glob, and builds a dependency graph of all of the objects using graphviz.
uv run python ./examples/depgraph.py --path {path/to/folder/with/queries}