PlantCaduceus

Zong-Yan Liu | Aug 22, 2024 min read

Revolutionizing Cross-Species Plant Genomics 🌿

PlantCaduceus is an advanced platform designed to model plant genomes across species at single nucleotide resolution. By leveraging pre-trained DNA language models, PlantCaduceus aims to capture the evolutionary conservation of plant genomes, enabling powerful cross-species insights. This approach significantly accelerates the identification of functional genomic elements, ultimately supporting the fields of agriculture, plant breeding, and genomic research.

Cross-Species Modeling at Its Finest

PlantCaduceus builds upon Caduceus and Mamba architectures to analyze 16 Angiosperm genomes, covering evolutionary histories spanning 160 million years. This extensive dataset allows PlantCaduceus to understand genome complexities and identify crucial genetic variations. With tools like masked language modeling, PlantCaduceus offers a robust way to improve genome annotations and make valuable biological predictions, helping researchers pinpoint causal mutations, as in the case of the sweet corn mutation.

Pioneering Zero-Shot Genomic Analysis

By integrating a zero-shot learning approach, PlantCaduceus predicts deleterious mutations without needing specific training datasets for each plant species. This innovative method shows a three-fold enrichment of rare alleles compared to traditional approaches, demonstrating PlantCaduceus’s effectiveness in prioritizing important genetic mutations. This potential makes PlantCaduceus an invaluable resource for cross-species genomic research, bringing novel insights to evolutionary biology and improving plant resilience in agriculture.

For more details, visit PlantCaduceus.

How to cite PlantCaduceus

@article {Zhai2024.06.04.596709,
  author = {Zhai, Jingjing and Gokaslan, Aaron and Schiff, Yair and Berthel, Ana and Liu, Zong-Yan and Miller, Zachary R and Scheben, Armin and Stitzer, Michelle C and Romay, Cinta and Buckler, Edward S. and Kuleshov, Volodymyr},
  title = {Cross-species plant genomes modeling at single nucleotide resolution using a pre-trained DNA language model},
  elocation-id = {2024.06.04.596709},
  year = {2024},
  doi = {10.1101/2024.06.04.596709},
  URL = {https://www.biorxiv.org/content/early/2024/06/05/2024.06.04.596709},
  eprint = {https://www.biorxiv.org/content/early/2024/06/05/2024.06.04.596709.full.pdf},
  journal = {bioRxiv}
}