New methods for image classification, image retrieval and semantic correspondence.

Authors
  • SAMPAIO DE REZENDE Rafael
  • PONCE Jean
  • BACH Francis
  • CORD Matthieu
  • PONCE Jean
  • BACH Francis
  • PEREZ Patrick
  • JURIE Frederic
  • PERRONNIN Florent
Publication date
2017
Publication type
Thesis
Summary The image representation problem is at the heart of the vision domain. The choice of image representation changes depending on the task we want to study. An image retrieval problem in large databases requires a compressed global representation, while a semantic segmentation problem requires a partitioning map of its pixels. Statistical learning techniques are the main tool for building these representations. In this manuscript, we address the learning of visual representations in three different problems: image retrieval, semantic matching and image classification. First, we study the Fisher vector representation and its dependence on the Gaussian mixture model employed. We introduce the use of several Gaussian mixture models for different types of backgrounds, e.g., different scene categories, and analyze the performance of these representations for classification purposes and the impact of the scene category as a latent variable. Our second approach proposes an extension of the SVM pipeline representation. We first show that replacing the SVM loss function with the square loss yields similar results at a fraction of the computational cost. We call this model the "square-loss exemplar machine", or SLEM in English. We introduce a variant of SLEM with cores that has the same computational advantages but with improved performance. We present experiments that establish the performance and efficiency of our methods using a wide variety of basic representations and image search datasets. Finally, we propose a deep neural network for the semantic matching problem. We use object boxes as matching elements to build an architecture that simultaneously learns appearance and geometric consistency. We propose new geometric coherence scores adapted to the neural network architecture. Our model is trained on pairs of images obtained from key points in a reference dataset and evaluated on multiple datasets, outperforming recent deep learning architectures and previous methods based on handcrafted features. We conclude the thesis by highlighting our contributions and suggesting possible future research directions.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr