Estimation of phage species trees using gene/species tree reconciliation
Brett Babec 1, Luis Pedro Coelho 1, Ben J. Woodcroft 1
1 Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
Estimating the evolutionary history of phages is challenging as they share few genes. Unlike prokaryotes, no gene is present in all phages, so alternative frameworks are necessary to establish their species tree. Traditional approaches use direct genome similarity (average nucleotide identity) or gene sharing networks, failing to reveal ancestral states, or are reliant on single gene trees which assume perfect vertical inheritance and accurate tree reconstruction. Here, we propose a novel method ‘Plover’ that utilizes phylogenetic signals from many individual gene trees, where the genes need only be present in a subset of phages. We impose a model of gene origination, transfer, and loss to probabilistically coalesce (‘reconcile’) these partial signals into a species tree which includes all phages.
Plover takes nucleotide sequences as input, identifies open reading frames (ORFs), and clusters similar ORFs across species under the assumption they are derived from a common ancestor. It then constructs individual gene trees from the clustered ORFs and reconciles them into a unified species tree. To ensure scalability, Plover employs agglomerative hierarchical clustering to progressively reconcile sub-clades of the species tree, offering greater efficiency than processing the entire tree at once. Plover was validated using two bacterial datasets (642 Gammaproteobacteria and 5,293 Bacilli), where it generated species trees congruent with accepted taxonomy.
Applying Plover to phage genomes produced phylogenies largely consistent with current ICTV phage taxonomy. Some discrepancies suggest certain taxonomic groupings should be reconsidered. Plover offers a robust framework for improving phage taxonomy, guiding epidemiological strategies, and advancing our understanding of global evolution.