Variation APIs

Variation APIs that bring together Ensembl Variation, VCF file format, GFF3+GVF file format, samtools, Picard, GATK, etc.

Several similar file specifications exist for dealing with sequence variation, including:

Some support for these file specifications is already present in various bioinformatics libraries (and in fact biojava3 already provides GFF3 support); it would be desirable to pull these together behind a set of common APIs in biojava3.

Approach

  • Consider existing open source VCF and GVF implementations (Genotype Analysis Toolkit, GATK, VCFTools, Picard, GVF-Parser, etc.)
  • Design APIs for common entities (Allele, Genotype, Haplotype, etc.)
  • Create adaptors to third party implementations or implement support directly in Biojava3

Suggested for GSoC 2013