Variation APIs
Variation APIs that bring together Ensembl Variation, VCF file format, GFF3+GVF file format, samtools, Picard, GATK, etc.
Several similar file specifications exist for dealing with sequence variation, including:
- VCF (Variant Call Format) is a text file format used by the 1000 Genomes project and others for representing variation against a reference sequence.
- The [Genome Variation Format](http://www.sequenceontology.org/resources/gvf_1.02.html http://www.sequenceontology.org/resources/gff3.html) (GVF) is a text file format for describing sequence variants at nucleotide resolution relative to a reference genome. GVF is a type of GFF3 file with additional pragmas and attributes specified.
*samtools
*picard - GATK
Some support for these file specifications is already present in various bioinformatics libraries (and in fact biojava3 already provides GFF3 support); it would be desirable to pull these together behind a set of common APIs in biojava3.
Approach
- Consider existing open source VCF and GVF implementations (Genotype Analysis Toolkit, GATK, VCFTools, Picard, GVF-Parser, etc.)
- Design APIs for common entities (Allele, Genotype, Haplotype, etc.)
- Create adaptors to third party implementations or implement support directly in Biojava3
Suggested for GSoC 2013