Introduction

POPBAM is a tool to perform evolutionary or population-based analyses of next-generation sequencing data. POPBAM takes a BAM file as its input and can compute many widely used evolutionary genetics measures in sliding windows across a genome. The motivation for developing POPBAM is to provide the community with a fundamental suite of evolutionary analyses tools that would otherwise be tedious to implement. Since POPBAM works directly with BAM files, there are no intermediary steps necessary.

To enable POPBAM to perform population-level analyses, it is first necessary to modify the input BAM file header. Users must add the "PO" tag to the header line for each read group. The "PO" tag can be any string, as long as the string is identical between samples from the same population. One example may be that a BAM file has three read groups (R21, R22, and R25). The R22 and R25 read groups are from two different lines of Drosophila melanogaster called "MEL01" and "MEL02", while the third read group, R21, is from a single line of D. simulans called "SIM01". Below is an example of the BAM header including the "PO" tag:

@RG  ID:R22  SM:MEL01  PO:MEL
@RG  ID:R25  SM:MEL02  PO:MEL
@RG  ID:R21  SM:SIM01  PO:SIM

Functionality

POPBAM currently has six main commands:

snp Call and output consensus single nucleotide polymorphisms (SNPs) for each sample.
fasta Output all samples in a BAM file as a fasta-formatted sequence alignment.
tree Calculate neighbor-joining (NJ) trees of all samples in the BAM file.
nucdiv Cacluate nucleotide diversity within and between each population.
ld Calculate Wall's B, a measure of linkage disequilibrium.
fspec Calculate summaries of the frequency spectrum of mutations, including Tajima's D statistic and Fay and Wu's H statisitc.
Analyses can be performed in sliding windows across user-specified regions. POPBAM will output the analyses for each window in a tab-delimited format that is amenable to post-processing with the R Project forStatistical Computing and indexing with the Tabix program.

Dependencies and limitations

POPBAM is a C++ program that requires the SAMtools library (libbam) and headers to compile correctly. These files are available from the SAMtools project page. Furthermore, POPBAM does not index, sort, merge, or fundamentally manipulate BAM files, and it is therefore intended to be used in conjunction with other programs such as SAMtools or Picard.

We sincerely hope the community will provide feedback about improving and extending the functionality of the POPBAM program. A discussion forum for POPBAM users and developers is provided here.

Getting POPBAM

POPBAM is hosted by SourceForge.net. The project page is here. The source code is available from the download page. You can check out the latest source code using the subversion command:


Publication