GitHub - bioscala/bioscala: Bioinformatics for the Scala programming language

Scalable, functional bioinformatics on the JVM — written in Scala, usable from Scala/Java and friends.

This is a community-driven fork focused on code clarity, small safe refactors, and contributor experience.
This a functional bioinformatics library.


✨ What’s inside (today)

  • Strongly-typed DNA/RNA/Protein sequences (with IUPAC ambiguity & gapped variants)
  • Transcription (DNA → RNA)
  • Translation (RNA → amino acids) with BioJava interop
  • Parsers/Writers:
    • Iterator-based FASTA reader/writer
    • Iterator-based PAML (PHY) reader
    • Phylip reader/writer (via BioJava)
  • Early alignment utilities and attribute system (immutable, WIP)

Some APIs reflect “classic BioScala” semantics; modernization happens behind shims first to keep PRs tiny.

Why BioScala?

🧬 Functional Design: Immutable data structures and pure functions for reliable, reproducible analysis.

🧩 Modular Architecture: Plug-and-play modules for sequences, alignments, and attributes.

📊 Extensible: Easily add custom functionality or integrate with other tools.

🔬 Research-Ready: Designed with bioinformatics workflows in mind.


Features

Core Functionality

  • DNA/RNA/Protein Sequences: Immutable, type-safe representations with support for gaps and IUPAC symbols.

  • Sequence Alignment: Basic pairwise alignment and sparse alignment tools.

  • Transcription & Translation: Convert DNA to RNA and RNA to protein sequences.

  • Attributes: Attach metadata (e.g., IDs, descriptions) to sequences and alignments.

Quick Start

Installation

Since BioScala is a work in progress and not yet published on Maven Central, you’ll need to clone the repository and publish it locally:

  1. Clone the Repository:
Copy
git clone https://github.com/bioscala/bioscala.git
cd bioscala
  1. Publish Locally: Use sbt to publish the library to your local Ivy repository:
  1. Add to Your Project: Add the dependency to your build.sbt:
libraryDependencies += "org.bioscala" %% "bioscala-core" % "0.2.0"

Example: DNA to RNA Transcription

val dnaSequence = new DNASequence("ATGGCCATTGTAATGGGCCGCTGAA")
val rnaSequence = dnaSequence.transcribe()

println(rnaSequence)  // Output: AUGGCCAUUGUAAUGGGCCGCUGAA

Example: Protein Translation

val rnaSequence = new RNASequence("AUGGCCAUUGUAAUGGGCCGCUGAA")
val proteinSequence = SequenceTranslation.translate(rnaSequence.seq)

println(proteinSequence)  // Output: MAIVMGR*

Example: Sparse Alignment

val alignment: List[List[NTSymbol]] = List(
  List(A, C, G, T, Gap),
  List(A, Gap, G, T, C)
)

val (filteredAlignment, removedColumns) = SparseAlignment.removeSparseRows(alignment, minSymbols = 2)

println(filteredAlignment)  // Output: List(List(A, C, G, T, Gap), List(A, Gap, G, T, C))

Documentation

Core Concepts

  • Sequences: Immutable lists of nucleotides or amino acids.

  • Alignments: Lists of sequences with gap support.

  • Attributes: Metadata attached to sequences or alignments.

API Reference

  • Alignment: Tools for working with sequence alignments.

  • Attribute: Managing metadata and properties.

  • Chemistry: Representing nucleotides, amino acids, and codons.

  • Nucleotide: Core DNA and RNA sequence handling.

  • Sequence: High-level sequence abstractions.

Please refer to these links for a richer documentation:


Community

BioScala is an open-source project, and we welcome contributions from the community! Here’s how you can get involved:

  • 🐛 Report Bugs: Issue Tracker

  • 💡 Suggest Features: I will share the Medium post here.

  • 👩‍💻 Contribute Code: Contributing Guide

  • 💬 Join our Discord to participate in discussions.


Credits & license

Original author/maintainer: Pjotr Prins

Interop: BioJava for translation and IO helpers

License: BSD (see LICENSE).


Citing BioScala

If you use BioScala in your research, please cite:

@software{bioscala,
  author = {BioScala Team},
  title = {BioScala: A Functional Bioinformatics Library},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/bioscala/bioscala}}
}

License

BioScala is released under the BSD 3-Clause License , ensuring freedom for academic, commercial, and personal use.


Acknowledgments

BioScala is made possible by the contributions of developers like you. Special thanks to:

  • The Scala community for building a powerful and expressive language.

  • The open-source bioinformatics community for inspiring this project.


Join Us

BioScala is more than a library—it’s a community-driven effort to make bioinformatics more accessible and functional. Whether you’re a seasoned bioinformatician or a curious beginner, we welcome you to the BioScala community.

🌟 Star this repo to show your support. 🚀 Fork and contribute to shape the future of BioScala.

Let’s build the future of bioinformatics, together.


BioScala Team

On GitHub and Discord