BiocPy: Facilitate Bioconductor Workflows in Python
BiocPy brings Bioconductor's core data structures and analysis tools to the Python ecosystem. These structures, including BiocFrame and GenomicRanges, serve as essential and foundational data structures, acting as the building blocks for extensive and complex representations. For example, container classes like SummarizedExperiment, SingleCellExperiment, and MultiAssayExperiment represent single or multi-omic experimental data and metadata.
Core Packages
For a complete list of packages, visit our GitHub organization.
Data Structures
| Package | Description | PyPI | Links |
|---|---|---|---|
| BiocFrame | Bioconductor-like data frames | GitHub | Docs | |
| IRanges | Interval arithmetic operations | GitHub | Docs | Bioconductor | |
| GenomicRanges | Genomic location analysis | GitHub | Docs | Bioconductor |
Containers
| Package | Description | PyPI | Links |
|---|---|---|---|
| SummarizedExperiment | Genomic experiments container | GitHub | Docs | Bioconductor | |
| SingleCellExperiment | Single-cell genomics container | GitHub | Docs | Bioconductor | |
| SpatialExperiment | Spatial transcriptomics container | GitHub | Docs | Bioconductor | |
| SpatialFeatureExperiment | Extends Spatial transcriptomics container | GitHub | Docs | Bioconductor | |
| MultiAssayExperiment | Multi-omics data framework | GitHub | Docs | Bioconductor |
R Interoperability
| Package | Description | PyPI | Links |
|---|---|---|---|
| rds2py | Read RDS files directly in Python | GitHub | Docs | |
| BiocUtils | Common utilities mirroring R's base functionality | GitHub | Docs | |
| mopsy | Matrix operations with R-like syntax | GitHub | Docs | |
| pyBiocFileCache | Resource caching system | GitHub | Docs | Bioconductor | |
| txdb | Genome annotations from TxDB objects | GitHub | Docs | |
| orgdb | Access OrgDb objects | GitHub | Docs |
Delayed Operations
| Package | Description | PyPI | Links |
|---|---|---|---|
| DelayedArray | Delayed operations in Python | GitHub | Docs | Bioconductor | |
| HDF5Array | HDF5-backed arrays | GitHub | Docs | Bioconductor | |
| TileDBArray | TileDB-backed arrays | GitHub | Docs | Bioconductor |
Get Started
All packages in the BiocPy are published to BiocPy PyPI org. Install the core packages using the biocpy wrapper:
Individual packages can be installed separately. See each package's documentation for specific installation instructions.
Environments
We provide conda/mamba configuration files to create environments containing most BiocPy (& friends) packages. Check out the environments repository for more information.
Friends of BiocPy
BiocPy integrates with several analysis tools and frameworks
Analysis Tools
- libscran: Multi-model single-cell analysis in R, Python and JavaScript.
- SingleR-inc: Cell type annotation for single-cell data.
Data Management
- ArtifactDB: Language-agnostic access to data across computational environments.
- tatami-inc: Read various matrix representations through a common interface.
Model Training
- CellArr: TileDB-based genomic data storage with AI/ML dataloaders.
Contributing
We welcome contributions! Check out our developer guide to get started.
