FOGEL Fajwel

< Back to ILB Patrimony
Affiliations
No affiliations identified.
  • 2018
  • Relaxations of the seriation problem and applications to de novo genome assembly.

    Antoine RECANATI, Alexandre d ASPREMONT, Jean philippe VERT, Alexandre d ASPREMONT, Jean philippe VERT, Dominique LAVENIER, Stephane VIALETTE, Thomas BRULS, Fajwel FOGEL, Dominique LAVENIER, Stephane VIALETTE
    2018
    DNA sequencing technologies can only read short fragments, whose position on the genome is unknown. De novo assembly aims to reconstitute a whole DNA sequence by putting these fragments together like a puzzle. In the OLC (overlap-layout-consensus) approach, we calculate the overlap between fragments in order to rearrange them, and then extract a consensus sequence. The reordering can be written as a combinatorial seriation problem, where we reorder comparable elements between them, so that two adjacent elements are similar. This problem is efficiently solved by a spectral algorithm in the absence of noise, but the real genomic data is different. In particular, regions of the genome are similar although distant (repeated sequences), making assembly problematic. Assembly methods employ hierarchical and gluttonous algorithms to disambiguate repeated sequences. We propose here a purified approach where we rearrange all fragments "at once" via seriation resolution. Our first contribution shows that the use of the spectral method for rearrangement integrates well with the OLC scheme, producing results of similar quality to standard methods. However, due to the repeated sequences, this method produces fragmented assemblies (typically in a few subsequences instead of one). The second contribution is an extension of the spectral method related to dimension reduction under distance conservation, encompassing the problems of seriation and circular seriation (a variant where elements can be ordered in a cycle) in a unified framework. This extension makes the algorithm robust to noise and solves the fragmentation problem of the previous assembly. Our third contribution formalizes robust seriation, where one wishes to reorder noisy data. We describe connections with other combinatorial problems, in particular for matrices modeling real DNA data. We propose adapted al- gorithms, experimentally improving the robustness on synthetic and real data, although less clearly than the second contribution. The fourth contribution presents the problem of seriation with duplication, motivated by the assembly of cancerous genomes via spatial conformation data, which we attempt to solve with an alternating projection algorithm based in part on robust seriation methods, on synthetic data.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr