Arrêt de service programmé du vendredi 10 juin 16h jusqu’au lundi 13 juin 9h. Pour en savoir plus
Accéder directement au contenu Accéder directement à la navigation
Article dans une revue

Scalable long read self-correction and assembly polishing with multiple sequence alignment

Pierre Morisse 1 Camille Marchet 2 Antoine Limasset 2 Thierry Lecroq 3 Arnaud Lefebvre 3
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
3 TIBS - LITIS - Equipe Traitement de l'information en Biologie Santé
LITIS - Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes
Abstract : Abstract Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT.
Type de document :
Article dans une revue
Liste complète des métadonnées

https://hal-cnrs.archives-ouvertes.fr/hal-03210290
Contributeur : Admin Hal Ur1 Connectez-vous pour contacter le contributeur
Soumis le : mercredi 26 mai 2021 - 14:03:43
Dernière modification le : lundi 4 avril 2022 - 09:28:27
Archivage à long terme le : : vendredi 27 août 2021 - 19:41:45

Fichier

s41598-020-80757-5.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Citation

Pierre Morisse, Camille Marchet, Antoine Limasset, Thierry Lecroq, Arnaud Lefebvre. Scalable long read self-correction and assembly polishing with multiple sequence alignment. Scientific Reports, Nature Publishing Group, 2021, 11 (1), pp.1-13. ⟨10.1038/s41598-020-80757-5⟩. ⟨hal-03210290⟩

Partager

Métriques

Consultations de la notice

129

Téléchargements de fichiers

40