RECANATI Antoine

< Back to ILB Patrimony
Affiliations
  • 2017 - 2018
    Ecole normale supérieure Paris
  • 2017 - 2018
    Département d'Informatique de l'Ecole Normale Supérieure
  • 2017 - 2018
    Communauté d'universités et établissements Université de Recherche Paris Sciences et Lettres
  • 2017 - 2018
    Sciences mathematiques de paris centre
  • 2017 - 2018
    Apprentissage statistique et parcimonie
  • 2018
  • Relaxations of the seriation problem and applications to de novo genome assembly.

    Antoine RECANATI
    2018
    In a sequencing experiment, we can only “read” small fragments (reads) of DNA due to physical limitations, whose location on the genome is unknown. De novo assembly aims to put them together to retrieve the full DNA sequence, like a jigsaw puzzle. The OLC approach computes pairwise Overlaps between reads to find their Layout, and then derive a Consensus sequence. The layout can be cast as an instance of the Seriation combinatorial problem, seeking to reorder a set of elements based on their pairwise similarity, such that similar elements are nearby. In a noiseless setting, a spectral method can solve Seriation efficiently. Still, it often fails on noisy, real DNA data. Notably, assembly is challenged by repeated genomic regions (repeats) causing distant fragments to be similar. Most assembly engines follow hierarchical, greedy schemes, including modules dedicated to detect and disambiguate repeats while constructing the output sequence. We explore a simpler approach using Seriation to lay out all reads at once. Our first contribution is to show that the spectral method can be seamlessly integrated in an OLC framework, yielding competitive results compared to standard methods on real data. However, due to repeats, the method can only find fragmented assemblies (with a few large assembled fragments), i.e., it does not succeed to lay- out all the reads together at once. In our second contribution, we extend the spectral method using a multidimensional spectral embedding. It provides a unifying framework for seriation and circular seriation, a variant searching for a cyclic ordering of the data. This method significantly improves the robustness of the original algorithm on noisy data, and yields single contig assembly of bacterial genomes. As a third contribution, we introduce the Robust Seriation framework, formalizing the task of seriation on corrupted data. We outline the relation between (robust) seriation and other combinatorial problems, particularly for stylized matrices modeling DNA sequencing data. We pro- pose dedicated algorithms that experimentally improve robustness on synthetic and real data, although they turn out to be more sensitive than the method constituting our second contribution. In a fourth contribution, we introduce the problem of Seriation with Duplications, which is motivated by the application of assembling cancer genome from spatial conformation (Hi-C) data. We propose an alternated minimization algorithm that can utilize methods designed to solve Robust Seriation, and evaluate it on toy data.
  • Relaxations of the seriation problem and applications to de novo genome assembly.

    Antoine RECANATI, Alexandre d ASPREMONT, Jean philippe VERT, Alexandre d ASPREMONT, Jean philippe VERT, Dominique LAVENIER, Stephane VIALETTE, Thomas BRULS, Fajwel FOGEL, Dominique LAVENIER, Stephane VIALETTE
    2018
    DNA sequencing technologies can only read short fragments, whose position on the genome is unknown. De novo assembly aims to reconstitute a whole DNA sequence by putting these fragments together like a puzzle. In the OLC (overlap-layout-consensus) approach, we calculate the overlap between fragments in order to rearrange them, and then extract a consensus sequence. The reordering can be written as a combinatorial seriation problem, where we reorder comparable elements between them, so that two adjacent elements are similar. This problem is efficiently solved by a spectral algorithm in the absence of noise, but the real genomic data is different. In particular, regions of the genome are similar although distant (repeated sequences), making assembly problematic. Assembly methods employ hierarchical and gluttonous algorithms to disambiguate repeated sequences. We propose here a purified approach where we rearrange all fragments "at once" via seriation resolution. Our first contribution shows that the use of the spectral method for rearrangement integrates well with the OLC scheme, producing results of similar quality to standard methods. However, due to the repeated sequences, this method produces fragmented assemblies (typically in a few subsequences instead of one). The second contribution is an extension of the spectral method related to dimension reduction under distance conservation, encompassing the problems of seriation and circular seriation (a variant where elements can be ordered in a cycle) in a unified framework. This extension makes the algorithm robust to noise and solves the fragmentation problem of the previous assembly. Our third contribution formalizes robust seriation, where one wishes to reorder noisy data. We describe connections with other combinatorial problems, in particular for matrices modeling real DNA data. We propose adapted al- gorithms, experimentally improving the robustness on synthetic and real data, although less clearly than the second contribution. The fourth contribution presents the problem of seriation with duplication, motivated by the assembly of cancerous genomes via spatial conformation data, which we attempt to solve with an alternating projection algorithm based in part on robust seriation methods, on synthetic data.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr