Designed for Multimodal. Built for Scale.
From agents to models, from search to training, one platform for all your AI data and workloads
Tomorrow's AI is being built on LanceDB today
The AI-Native
Multimodal Lakehouse
AI thrives on more than text. It needs multimodal data. Today’s complex workloads demand more than a database. They need a new foundation built for AI at scale.
Chunking
Vector storage
Model training
Hybrid search
Embedding pipelines
Multimodal data
Ad-hoc scripts
AI Needs Better
Data Infrastructure
Data lakes only handle tabular data, search engines just work with vectors, and neither work well with multimodal data. Researchers using today's infrastructure face more complexity, higher cost, and slower progress.
A Unified Solution
LanceDB provides one place for all your AI data and workloads so your team can move fast from idea to petabyte-scale production.
Storage
Search
Feature Engineering
Analytics
Training
The new columnar standard for multimodal data
Fast scans and random access. Large blob storage. Zero-copy fine-grained data-evolution at petabyte scale.
table.add_columns({
"title_frame": extract_key_frame("video", 0),
"description": img2txt("title_frame"),
"embedding": embed("description")
})Advanced retrieval for AI
Blazing fast hybrid search, filter, and rerank over billions of vectors. Compute-storage separation for up to 100x savings.
(table.search("flying cars", query_type="hybrid")
.where("date > '2025-01-01'")
.reranker("cross_encoder_tuned")
.select(["id"]).limit(10)
.to_pandas())Automated feature engineering
Declarative, distributed and versioned pre-processing for faster feature experimentation and iteration cycles. Native support for LLM-as-UDF.
ds = lance.dataset("s3://bucket/path.lance")
@lance.batch_udf()
def multiply_by_two(x: pa.RecordBatch) -> pa.RecordBatch:
return pa.RecordBatch.from_arrays(
[pc.multiply(x["id"], 2)], ["two"]
)
ds.add_columns(multiply_by_two)Explore, curate, and analyze with ease
High performance SQL for multimodal data.
db.sql("SELECT decode('audio_track', 'wav') "
"FROM table WHERE id in ('1', '5', '324')")Optimized training pipelines
Faster dataloading, global shuffling, and integrated filters for large scale training using pytorch or JAX.
for batch in DataLoader(table.where("video_height>=720").shuffle()):
inputs, targets = batch["description"], batch["title_frame"]
outputs = model(inputs)
...How LanceDB Works
From prototype to production.
For Developers
-
01
Connect to LanceDB
Get started fast with a simple install and intuitive interface.
-
02
Ingest Data
Grow your project to petabyte scale without worrying about infrastructure.
-
03
Build and Index
Streamline your workflow and focus on high-value experimentation.
For Enterprises
-
01
Choose Deployment Model
Unlock the value in your sales calls, decks, contracts, and more.
-
02
Data Lake Compatible
Keep you data private and secure. Works with your existing data lake.
-
03
Build and Scale
Unlock massive scalability and unmatched price-performance.
Built for Enterprise Scale
Highest search QPS on a single table
Massive scalability at a fraction of the cost
Largest table under management
Enterprise-Grade Compliance
Safety and security guaranteed for your data.
SOC2 SOC2 Type II
GDPR GDPR compliant
HIPAA HIPAA compliant
Trusted By The Best
"Lance has been a significant enabler for our multimodal data workflows. Its performance and feature set offer a dramatic step up from legacy formats like WebDataset and Parquet. Using Lance has freed up considerable time and energy for our team, allowing us to iterate faster and focus more on research."
"Law firms, professional service providers, and enterprises rely on Harvey to process a large number of complex documents in a scalable and secure manner. LanceDB’s search/retrieval infrastructure has been instrumental in helping us meet those demands."
Gabriel Pereyra, Co-Founder
"Lance transformed our model training pipeline at Runway. The ability to append columns without rewriting entire datasets, combined with fast random access and multimodal support, lets us iterate on AI models faster than ever. For a company building cutting-edge generative AI, that speed of iteration is everything."
Kamil Sindil, Head of Engineering
Start Your Multimodal
Transformation Today
Designed for Multimodal Data. Built for Production Scale.