Introduction
POPBAM is a tool to perform evolutionary or population-based analyses of next-generation sequencing data. POPBAM takes a BAM file as its input and can compute many widely used evolutionary genetics measures in sliding windows across a genome. The motivation for developing POPBAM is to provide the community with a fundamental suite of evolutionary analyses tools that would otherwise be tedious to implement. Since POPBAM works directly with BAM files, there are no intermediary steps necessary.
To enable POPBAM to perform population-level analyses, it is first necessary to modify the input BAM file header. Users must add the "PO" tag to the header line for each read group. The "PO" tag can be any string, as long as the string is identical between samples from the same population. One example may be that a BAM file has three read groups (R21, R22, and R25). The R22 and R25 read groups are from two different lines of Drosophila melanogaster called "MEL01" and "MEL02", while the third read group, R21, is from a single line of D. simulans called "SIM01". Below is an example of the BAM header including the "PO" tag:
@RG ID:R22 SM:MEL01 PO:MEL @RG ID:R25 SM:MEL02 PO:MEL @RG ID:R21 SM:SIM01 PO:SIM
Functionality
POPBAM currently has six main commands:
snp | Call and output consensus single nucleotide polymorphisms (SNPs) for each sample. |
fasta | Output all samples in a BAM file as a fasta-formatted sequence alignment. |
tree | Calculate neighbor-joining (NJ) trees of all samples in the BAM file. |
nucdiv | Cacluate nucleotide diversity within and between each population. |
ld | Calculate Wall's B, a measure of linkage disequilibrium. |
fspec | Calculate summaries of the frequency spectrum of mutations, including Tajima's D statistic and Fay and Wu's H statisitc. |
Dependencies and limitations
POPBAM is a C++ program that requires the SAMtools library (libbam) and headers to compile correctly. These files are available from the SAMtools project page. Furthermore, POPBAM does not index, sort, merge, or fundamentally manipulate BAM files, and it is therefore intended to be used in conjunction with other programs such as SAMtools or Picard.
We sincerely hope the community will provide feedback about improving and extending the functionality of the POPBAM program. A discussion forum for POPBAM users and developers is provided here.
Getting POPBAM
POPBAM is hosted by SourceForge.net. The project page is here. The source code is available from the download page. You can check out the latest source code using the subversion command:
Publication
- Garrigan, D. (2011). POPBAM: Tools for Evolutionary Analysis of BAM files. Bioinformatics, submitted.