JAKUBOWICZ Jeremie

< Back to ILB Patrimony
Affiliations
  • 2012 - 2019
    Services repartis, architectures, modélisation, validation, administration des réseaux
  • 2012 - 2017
    Télécom ParisTech
  • 2012 - 2017
    Institut Mines-Télécom
  • 2012 - 2013
    Laboratoire traitement et communication de l'information
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • Deep learning versus conventional machine learning for detection of healthcare-associated infections in French clinical narratives.

    Sara RABHI, Jeremie JAKUBOWICZ, Marie helene METZGER
    Methods of Information in Medicine | 2019
    Objective: The objective of this article was to compare the performances of health care-associated infection (HAI) detection between deep learning and conventional machine learning (ML) methods in French medical reports. Methods: The corpus consisted in different types of medical reports (discharge summaries, surgery reports, consultation reports, etc.). A total of 1,531 medical text documents were extracted and deidentified in three French university hospitals. Each of them was labeled as presence (1) or absence (0) of HAI. We started by normalizing the records using a list of preprocessing techniques. We calculated an overall performance metric, the F1 Score, to compare a deep learning method (convolutional neural network [CNN]) with the most popular conventional ML models (Bernoulli and multi-naïve Bayes, k-nearest neighbors, logistic regression, random forests, extra-trees, gradient boosting, support vector machines). We applied the hyperparameter Bayesian optimization for each model based on its HAI identification performances. We included the set of text representation as an additional hyperparameter for each model, using four different text representations (bag of words, term frequency–inverse document frequency, word2vec, and Glove). Results: CNN outperforms all other conventional ML algorithms for HAI classification. The best F1 Score of 97.7% ± 3.6% and best area under the curve score of 99.8% ± 0.41% were achieved when CNN was directly applied to the processed clinical notes without a pretrained word2vec embedding. Through receiver operating characteristic curve analysis, we could achieve a good balance between false notifications (with a specificity equal to 0.937) and system detection capability (with a sensitivity equal to 0.962) using the Youden's index reference. Conclusions: The main drawback of CNNs is their opacity. To address this issue, we investigated CNN inner layers' activation values to visualize the most meaningful phrases in a document. This method could be used to build a phrase-based medical assistant algorithm to help the infection control practitioner to select relevant medical records. Our study demonstrated that deep learning approach outperforms other classification learning algorithms for automatically identifying HAIs in medical reports.
  • Applications of artificial intelligence in e-commerce and finance.

    Yang JIAO, Walid BEN AMEUR, Amel BOUZEGHOUB, Jeremie JAKUBOWICZ, Bruno GOUTORBE, Arthur CHARPENTIER, Romuald ELIE
    2018
    Artificial Intelligence is present in every aspect of our lives in the era of Big Data. It has led to revolutionary changes in various sectors, including e-commerce and finance. In this thesis, we present four applications of AI that improve existing goods and services, enable automation, and dramatically increase the efficiency of many tasks in both fields. First, we improve the product search service offered by most e-commerce sites by using a novel term weighting system to better evaluate the importance of terms in a search query. Next, we build a predictive model on daily sales using a time series forecasting approach and leverage the predicted results to rank product search results to maximize a company's revenue. Next, we propose the difficulty of online product classification and analyze winning solutions, consisting of state-of-the-art classification algorithms, on our real-world dataset. Finally, we combine the skills previously learned from time series-based sales prediction and classification to predict one of the most challenging but attractive time series: inventory. We perform an in-depth study on each stock in the S&P 500 Index using four state-of-the-art classification algorithms and report very promising results.
  • Big data-driven optimization in transportation and communication networks.

    Longbiao CHEN, Thi mai trang NGUYEN, Gang PAN, Jeremie JAKUBOWICZ, Guy PUJOLLE, Igor monteiro MORAES, Shijian LI, Anelise MUNARETTO, Marco FIORE
    2018
    The evolution of metropolitan structures has created various types of urban networks. Among which two types of networks are of great importance for our daily life: transportation networks corresponding to human mobility in physical space and communication networks supporting human interactions in digital space. The rapid expansion in scope and scale of these two networks raises fundamental research questions about how to optimize these networks. Some of the primary objectives include on-demand resource provisioning, anomaly detection, energy efficiency and quality of service. Despite differences in design and implementation technologies, transportation networks and communication networks share common fundamental structures, and exhibit similar dynamic spatio-temporal characteristics. As a result, there are common challenges in optimizing these two networks: traffic profiling, mobility prediction, traffic aggregation, node clustering and resource allocation. To achieve the optimization goals and research challenges, various analytical models, optimization algorithms, and simulation systems have been proposed and widely studied across several disciplines. These analytical models are often validated by simulation and could lead to suboptimal results in deployment. With the emergence of the Internet, a massive volume of urban network data can be collected. Recent advances in Big Data analysis techniques have provided researchers with great potential to understand this data. Motivated by this trend, the objective of this thesis is to explore a new paradigm of data-driven network optimization. We address the above scientific challenges by applying data-driven methods for network optimization. We propose two data-driven algorithms for network traffic clustering and user mobility prediction, and apply these algorithms to optimization in transportation and communication networks. First, by analyzing the large-scale traffic datasets of the two networks, we propose a graph-based clustering algorithm to better understand the traffic similarities and traffic variations between different areas and times. Based on this, we apply the traffic clustering algorithm to the following two network optimization applications: 1. A dynamic traffic clustering for on-demand planning of bike-sharing networks. In this application, we dynamically cluster bike stations with similar traffic patterns to obtain more stable and predictable grouped (clustered) traffic demands, so that overcrowded stations in the network can be predicted and dynamic demand-based network planning can be performed. Evaluation results using real data from New York City and Washington, D.C. show that our solution accurately predicts overloaded clusters [.]
  • Complementary base station clustering for cost-effective and energy-efficient cloud-RAN.

    Longbiao CHEN, Linjin LIU, Xiaoliang FAN, Johnthan LI, Cheng WANG, Gang PAN, Jeremie JAKUBOWICZ, Thi mai trang NGUYEN
    2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) | 2017
    No summary available.
  • Understanding bike trip patterns leveraging bike sharing system open data.

    Longbiao CHEN, Xiaojuan MA, Thi mai trang NGUYEN, Gang PAN, Jeremie JAKUBOWICZ
    Frontiers of Computer Science | 2017
    Bike sharing systems are booming globally as a green and flexible transportationmode, but the flexibility also brings difficulties in keeping the bike stations balanced with enough bikes and docks. Understanding the spatio-temporal bike trip patterns in a bike sharing system, such as the popular trip origins and destinations during rush hours, is important for researchers to design models for bike scheduling and station management. However, due to privacy and operational concerns, bike trip data are usually not publicly available in many cities. Instead, the station feeds about real-time bike and dock number in stations are usually public, which we refer to as bike sharing system open data. In this paper, we propose an approach to infer the spatio-temporal bike trip patterns from the public station feeds. Since the number of possible trips (i.e., origin-destination station pairs) is much larger than the number of stations, we define the trip inference as an ill-posed inverse problem. To solve this problem, we identify the sparsity and locality properties of bike trip patterns, and propose a sparse and weighted regularization model to impose both properties in the solution. We evaluate our method using real-world data fromWashington, D.C. and New York City. Results show that our method can effectively infer the spatio-temporal bike trip patterns and outperform the baselines in both cities.
  • Fine-Grained Urban Event Detection and Characterization Based on Tensor Cofactorization.

    Longbiao CHEN, Jeremie JAKUBOWICZ, Dingqi YANG, Daqing ZHANG, Gang PAN
    IEEE Transactions on Human-Machine Systems | 2017
    Understanding the irregular crowdmovement and social activities caused by urban events such as city festivals and concerts can benefit event management and city planning. Although various urban data can be exploited to detect such irregularities, the crowd mobility data (e.g., bike trip records) are usually in a mixed state with several basic patterns (e.g., eating, working, and recreation), making it difficult to separate concurrent events happening in the same region. The social activity data (e.g., social network check-ins) are usually oversparse, hindering the fine-grained characterization of urban events. In this paper, we propose a tensor cofactorization-based data fusion framework for fine-grained urban event detection and characterization leveraging crowd mobility data and social activity data. First, we adopt a nonnegative tensor cofactorization approach to decompose the crowd mobility tensor into several basic patterns, with the help of the auxiliary social activity tensor.We then use a multivariate-outlier-detectionbased method to identify irregularities from the decomposed basic patterns and aggregate them to detect and characterize the associated urban events.We evaluate the performance of our framework using real-world bike trip data and check-in data from New York City and Washington, DC, respectively. Results show that by fusing the two types of urban data, our method achieves fine-grained urban event detection and characterization in both cities and consistently outperforms the baselines.
  • Reranking Strategies Based on Fine-Grained Business User Events Benchmarked on a Large E-commerce Data Set.

    Yang JIAO, Bruno GOUTORBE, Matthieu CORNEC, Jeremie JAKUBOWICZ, Christelle GRAUER, Sebastien ROMANO, Maxime DANINI
    E-Commerce and Web Technologies | 2017
    No summary available.
  • Complementary Base Station Clustering for Cost-Effective and Energy-Efficient Cloud-RAN.

    Longbiao CHEN, Thi mai trang NGUYEN, Gang PAN, Jeremie JAKUBOWICZ, Linjin LIU, Xiaoliang FAN, Johnthan LI, Cheng WANG
    14th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC 2017) | 2017
    The fast growing mobile network data traffic poses great challenges for operators to increase their data processing capacity in base stations in an efficient manner. With the emergence of Cloud Radio Access Network (Cloud-RAN), the data processing units can now be centralized in a data center and shared among several base stations. By clustering base stations with complementary traffic patterns to the same data center, the deployment cost and energy consumption can be reduced. In this paper, we propose a two-phase framework to find optimal base station clustering schemes in a city-wide Cloud-RAN. First, we design a traffic profile for each base station, and propose an entropy-based metric to characterize the complementarity among base stations. Second, we build a graph model to represent the complementarity as link weight, and propose a distance-constrained clustering algorithm to find optimal base station clustering schemes. We evaluate the performance of our framework using two months of real-world mobile network traffic data in Milan, Italy. Results show that our framework effectively reduces 12.88% of deployment cost and 9.45% of energy consumption compared with traditional architectures, and outperforms the baseline method.
  • A Multiresolution Analysis Framework for the Statistical Analysis of Incomplete Rankings.

    Eric SIBONY, Stephan CLEMENCON, Jeremie JAKUBOWICZ
    2016
    Though the statistical analysis of ranking data has been a subject of interest over the past centuries, especially in economics, psychology or social choice theory, it has been revitalized in the past 15 years by recent applications such as recommender or search engines and is receiving now increasing interest in the machine learning literature. Numerous modern systems indeed generate ranking data, representing for instance ordered results to a query or user preferences. Each such ranking usually involves a small but varying subset of the whole catalog of items only. The study of the variability of these data, i.e. the statistical analysis of incomplete rank-ings, is however a great statistical and computational challenge, because of their heterogeneity and the related combinatorial complexity of the problem. Whereas many statistical methods for analyzing full rankings (orderings of all the items in the catalog) are documented in the dedicated literature, partial rankings (full rankings with ties) or pairwise comparisons, only a few approaches are available today to deal with incomplete ranking, relying each on a strong specific assumption. It is the purpose of this article to introduce a novel general framework for the statistical analysis of incomplete rankings. It is based on a representation tailored to these specific data, whose construction is also explained here, which fits with the natural multi-scale structure of incomplete rankings and provides a new decomposition of rank information with a multiresolu-tion analysis interpretation (MRA). We show that the MRA representation naturally allows to overcome both the statistical and computational challenges without any structural assumption on the data. It therefore provides a general and flexible framework to solve a wide variety of statistical problems, where data are of the form of incomplete rankings.
  • A stochastic proximal point algorithm for total variation regularization over large scale graphs.

    Adil SALIM, Pascal BIANCHI, Walid HACHEM, Jeremie JAKUBOWICZ
    2016 IEEE 55th Conference on Decision and Control (CDC) | 2016
    No summary available.
  • Random Pairwise Gossip on $\text{CAT}(\kappa)$ Metric Spaces.

    Anass BELLACHEHAB, Jeremie JAKUBOWICZ
    IEEE Transactions on Automatic Control | 2016
    In the context of sensor networks, gossip algorithms are a popular, well esthablished technique, for achieving consensus when sensor data is encoded in linear spaces. Gossip algorithms also have several extensions to non linear data spaces. Most of these extensions deal with Riemannian manifolds and use Riemannian gradient descent. This paper, instead, exhibits a very simple metric property that do not rely on any differential structure. This property strongly suggests that gossip algorithms could be studied on a broader family than Riemannian manifolds. And it turns out that, indeed, (local) convergence is guaranteed as soon as the data space is a mere CAT(κ) metric space. We also study convergence speed in this setting and establish linear rates for CAT(0) spaces, and local linear rates for CAT(κ) spaces with κ > 0. Numerical simulations on several scenarii, with corresponding state spaces that are either Riemannian manifolds - as in the problem of positive definite matrices consensus - or bare metric spaces - as in the problem of arms consensus - validate the results. This shows that not only does our metric approach allows for a simpler and more general mathematical analysis but also paves the way for new kinds of applications that go beyond the Riemannian setting.
  • Dynamic cluster-based over-demand prediction in bike sharing systems.

    Longbiao CHEN, Daqing ZHANG, Leye WANG, Dingqi YANG, Xiaojuan MA, Shijian LI, Zhaohui WU, Gang PAN, Thi mai trang NGUYEN, Jeremie JAKUBOWICZ
    Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing | 2016
    Bike sharing is booming globally as a green transportation mode, but the occurrence of over-demand stations that have no bikes or docks available greatly affects user experiences. Directly predicting individual over-demand stations to carry out preventive measures is difficult, since the bike usage pattern of a station is highly dynamic and context dependent. In addition, the fact that bike usage pattern is affected not only by common contextual factors (e.g., time and weather) but also by opportunistic contextual factors (e.g., social and traffic events) poses a great challenge. To address these issues, we propose a dynamic cluster-based framework for over-demand prediction. Depending on the context, we construct a weighted correlation network to model the relationship among bike stations, and dynamically group neighboring stations with similar bike usage patterns into clusters. We then adopt Monte Carlo simulation to predict the over-demand probability of each cluster. Evaluation results using real-world data from New York City and Washington, D.C. show that our framework accurately predicts over-demand clusters and outperforms the baseline methods significantly.
  • Robust Distributed Consensus Using Total Variation.

    Walid BEN AMEUR, Pascal BIANCHI, Jeremie JAKUBOWICZ
    IEEE Transactions on Automatic Control | 2016
    Consider a connected network of agents endowed with local cost functions representing private objectives. Agents seek to find an agreement on some minimizer of the aggregate cost, by means of repeated communications between neighbors. Consensus on the average over the network, usually addressed by gossip algorithms, is a special instance of this problem, corresponding to quadratic private objectives. Consensus on the median, or more generally, consensus on a given quantile, is also a special instance of this problem. In this paper we show that optimizing the aggregate cost function regularized by a total variation (TV) term has appealing properties. First, it can be done very naturally in a distributed way, yielding algorithms that are efficient on numerical simulations. Secondly, the optimum for the regularized cost is shown to be also the optimum for the initial aggregate cost function under assumptions that are simple to state. Finally, these algorithms are robust to unreliable agents that keep injecting some false value in the network. This is remarkable enough, and is not the case, for instance, of gossip algorithms that are entirely ruled by unreliable agents as detailed in the paper.
  • Multi-resolution analysis of ranking data.

    Eric SIBONY, St?phan CL?MEN?ON, J?r?mie JAKUBOWICZ, St?phane MALLAT, Eyke HULLERMEIER, Jonathan HUANG, Miche?le SEBAG, Risi KONDOR, Devavrat SHAH
    2016
    This theme introduces a multi-resolution analysis framework for ranking data. Initiated in the 18th century in the context of elections, ranking data analysis has attracted major interest in many areas of scientific literature: psychometrics, statistics, economics, operations research, machine learning or computational social choice among others. It has also been revitalized by modern applications such as recommender systems, where the goal is to inform the preferences of users to offer them the best personalized suggestions. In these contexts, users express their preferences only on small subsets of objects varying within a large catalog. However, the analysis of such incomplete rankings poses an important challenge, both from a statistical and computational point of view, pushing industrial actors to use methods that exploit only a part of the available information. This theme introduces a new representation for the data, which overcomes by construction this double challenge. Although it is based on combinatorics and algebraic topology results, its numerous analogies with multiresolution analysis make it a natural and efficient framework for the analysis of incomplete classifications. Since it makes no assumptions on the data, it already provides state-of-the-art estimators for small catalogs of objects and can be combined with many regularization procedures for large catalogs. For all these reasons, we believe that this multi-resolution representation opens the way to many future developments and applications.
  • MRA-based Statistical Learning from Incomplete Rankings Stéphan Clémençon.

    Eric SIBONY, Stephan CLEMENCON, Jeremie JAKUBOWICZ
    MRA-based Statistical Learning from Incomplete Rankings Stéphan Clémençon | 2015
    Statistical analysis of rank data describing preferences over small and variable subsets of a potentially large ensemble of items {1,. . , n} is a very challenging problem. It is motivated by a wide variety of modern applications, such as recommender systems or search engines. However , very few inference methods have been documented in the literature to learn a ranking model from such incomplete rank data. The goal of this paper is twofold: it develops a rigorous mathematical framework for the problem of learning a ranking model from incomplete rankings and introduces a novel general statistical method to address it. Based on an original concept of multi-resolution analysis (MRA) of incomplete rank-ings, it finely adapts to any observation setting, leading to a statistical accuracy and an algorith-mic complexity that depend directly on the complexity of the observed data. Beyond theoretical guarantees, we also provide experimental results that show its statistical performance.
  • Inferring bike trip patterns from bike sharing system open data.

    Longbiao CHEN, Jeremie JAKUBOWICZ
    2015 IEEE International Conference on Big Data (Big Data) | 2015
    Understanding bike trip patterns in a bike sharing system is important for researchers designing models for station placement and bike scheduling. By bike trip patterns, we refer to the large number of bike trips observed between two stations. However, due to privacy and operational concerns, bike trip data are usually not made publicly available. In this paper, instead of relying on time-consuming surveys and inaccurate simulations, we attempt to infer bike trip patterns directly from station status data, which are usually public to help riders find nearby stations and bikes. However, the station status data do not contain information about where the bikes come from and go to, therefore the same observations on stations might correspond to different underlying bike trips. To address this challenge, We conduct an empirical study on a sample bike trip dataset to gain insights about the inner structure of bike trips. We then formulate the trip inference problem as an ill-posed inverse problem, and propose a regularization technique to incorporate the a priori information about bike trips to solve the problem. We evaluate our method using real-world bike sharing datasets from Washington, D.C. Results show that our method effectively infers bike trip patterns.
  • MRA-based statistical learning from incomplete rankings.

    Eric SIBONY, Stephan CLEMENCON, Jeremie JAKUBOWICZ
    ICML 2015 : 32nd International Conference on Machine Learning | 2015
    Statistical analysis of rank data describing preferences over small and variable subsets of a potentially large ensemble of items 1, ., n is a very challenging problem. It is motivated by a wide variety of modern applications, such as recommender systems or search engines. However, very few inference methods have been documented in the literature to learn a ranking model from such incomplete rank data. The goal of this paper is twofold: it develops a rigorous mathematical framework for the problem of learning a ranking model from incomplete rankings and introduces a novel general statistical method to address it. Based on an original concept of multi-resolution analysis (MRA) of incomplete rankings, it finely adapts to any observation setting, leading to a statistical accuracy and an algorithmic complexity that depend directly on the complexity of the observed data. Beyond theoretical guarantees, we also provide experimental results that show its statistical performance.
  • Random Pairwise Gossip on $$CAT(\kappa )$$ C A T ( κ ) Metric Spaces.

    Anass BELLACHEHAB, Jeremie JAKUBOWICZ
    Geometric Science of Information | 2015
    In the context of sensor networks, gossip algorithms are a popular, well established technique, for achieving consensus when sensor data are encoded in linear spaces. Gossip algorithms also have several extensions to non linear data spaces. Most of these extensions deal with Riemannian manifolds and use Riemannian gradient descent. This paper, instead, studies gossip in a broader CAT(k) metric setting, encompassing, but not restricted to, several interesting cases of Riemannian manifolds. As it turns out, convergence can be guaranteed as soon as the data lie in a small enough ball of a mere CAT(k) metric space. We also study convergence speed in this setting and establish linear rates of convergence.
  • Distributed consensus for metamorphic systems using a gossip algorithm for CAT(0) metric spaces.

    Anass BELLACHEHAB, Jeremie JAKUBOWICZ
    MAXENT 2014 : 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering | 2015
    We present an application of distributed consensus algorithms to metamorphic systems. A metamorphic system is a set of identical units that can self-assemble to form a rigid structure. For instance, one can think of a robotic arm composed of multiple links connected by joints. The system can change its shape in order to adapt to different environments via reconfiguration of its constituting units. We assume in this work that several metamorphic systems form a network: two systems are connected whenever they are able to communicate with each other. The aim of this paper is to propose a distributed algorithm that synchronizes all the systems in the network. Synchronizing means that all the systems should end up having the same configuration. This aim is achieved in two steps: (i) we cast the problem as a consensus problem on a metric space and (ii) we use a recent distributed consensus algorithm that only make use of metrical notions.
  • Sensing the Pulse of Urban Activity Centers Leveraging Bike Sharing Open Data.

    Longbiao CHEN, Dingqi YANG, Jeremie JAKUBOWICZ, Gang PAN, Daqing ZHANG, Shijian LI
    2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom) | 2015
    Understanding social activities in Urban Activ- ity Centers can benefit both urban authorities and citizens. Traditionally, monitoring large social activities usually incurs significant costs of human labor and time. Fortunately, with the recent booming of urban open data, a wide variety of human digital footprints have become openly accessible, providing us with new opportunities to understand the social dynamics in the cities. In this paper, we resort to urban open data from bike sharing systems, and propose a two-phase framework to identify social activities in Urban Activity Centers based on bike sharing open data. More specifically, we first detect bike usage anomalies from the bike trip data, and then identify the potential social activities from the detected anomalies using a proposed heuristic method by considering both spatial and temporal constraints. We evaluate our framework based on the large-scale real-world dataset collected from the bike sharing system of Washington, D.C. The results show that our framework can efficiently identify social activities in different types of Urban Activity Centers and outperforms the baseline approach. In particular, our framework can identify 89% of the social activities in the major Urban Activity Centers of Washington, D.C.
  • An entropy-based term weighting scheme and its application in e-commerce search engines.

    Yang JIAO, Matthieu CORNEC, Jeremie JAKUBOWICZ
    International Symposium on Web Algorithms | 2015
    Term weighting schemes are commonly used in information retrieval field to extract the most relevant terms of documents. The main contribution of this paper consists in defining a new term weighting scheme based on entropy. We believe that this scheme is particularly well adapted to compare queries from e-commerce sites. These queries have their own specificities. They tend to be short and a large proportion of them are unique queries, i.e. have no historical record. We claim that widely used weighting schemes, such as tf-idf, are not well-adapted to this kind of queries. This claim is backed up by numerical experiments where the proposed entropy-based approach is incorporated into a collabora-tive filtering framework. In this framework, well suited to e-commerce search engines, we found out, on real e-commerce purchase data, that the proposed weighting scheme outperforms the tf-idf weighting scheme.
  • Multiresolution analysis of incomplete rankings with applications to prediction.

    Eric SIBONY, Stephan CLEMENCON, Jeremie JAKUBOWICZ
    2014 IEEE International Conference on Big Data (Big Data) | 2014
    Data representing preferences of users are a typical example of the Big Datasets modern technologies, such as e- commerce portals, now permit to collect, in an explicit or implicit fashion. Such data are highly complex, insofar as the number of items n for which users may possibly express their preferences is explosive and the collection of items or products a given user actually examines and is capable of comparing is highly variable and of extremely low cardinality compared to n. It is the main purpose of this paper to promote a new representation of preference data, viewed as incomplete rankings. In contrast to alternative approaches, the very nature of preference data is preserved by the "multiscale analysis" we propose, identifying here "scale" with the set of items over which preferences are expressed, whose construction relies on recent results in algebraic topology. The representation of preference data it provides shares similarities with wavelet multiresolution analysis on a Euclidean space and can be computed at a reasonable cost given the complexity of the original data. Beyond computational and theoretical advantages, the "wavelet like" transform is shown to compress preference data into relatively few basis coefficients and thus facilitates statistical tasks such as distribution estimation or prediction. This is illustrated here by very encouraging empirical work based on popular benchmark real datasets.
  • Random pairwise gossip on CAT(0) metric spaces.

    Anass BELLACHEHAB, Jeremie JAKUBOWICZ
    53rd IEEE Conference on Decision and Control | 2014
    In the context of sensor networks, gossip algo-rithms are a popular, well established technique for achieving consensus when sensor data is encoded in Euclidean spaces. The algorithm also has several extensions to nonlinear data spaces. Most of these extensions deal with Riemannian data spaces and use gradient descent techniques to derive a generalization of the gossip algorithm. However, their heavy reliance on calculus tools hides the simple metric properties that make gossip algorithms work. This paper studies the pairwise gossip algorithm from a purely metric perspective. Focusing on the essential property that makes pairwise midpoint gossip converge on Riemannian manifolds, this paper argues that the notion of C AT (0) space is perfectly suited to the analysis of gossip algorithms. The transition from a Riemannian to a purely metrical framework has three virtues: it simplifies the proofs, provably preserves convergence speed of pairwise gossip, yet in a more general framework, and paves the way for new applications, as illustrated by our numerical experiments on robots arms viewed as points in the metric graph associated with the free group.
  • Multiresolution Analysis of Incomplete Rankings.

    Stephan CLEMENCON, Jeremie JAKUBOWICZ, Eric SIBONY
    2014
    Incomplete rankings on a set of items {1, ., n} are orderings of the form a_{1} \prec . \prec a_{k}, with {a_{1}, ., a_{k}} \subset {1, ., n} and k < n. Though they arise in many modern applications, only a few methods have been introduced to manipulate them, most of them consisting in representing any incomplete ranking by the set of all its possible linear extensions on {1, ., n}. It is the major purpose of this paper to introduce a completely novel approach, which allows to treat incomplete rankings directly, representing them as injective words over {1, ., n}. Unexpectedly, operations on incomplete rankings have very simple equivalents in this setting and the topological structure of the complex of injective words can be interpretated in a simple fashion from the perspective of ranking. We exploit this connection here and use recent results from algebraic topology to construct a multiresolution analysis and develop a wavelet framework for incomplete rankings. Though purely combinatorial, this construction relies on the same ideas underlying multiresolution analysis on a Euclidean space, and permits to localize the information related to rankings on each subset of items. It can be viewed as a crucial step toward nonlinear approximation of distributions of incomplete rankings and paves the way for many statistical applications, including preference data analysis and the design of recommender systems.
  • Scoring anomalies: a M-estimation formulation.

    Stephan CLEMENCON, Jeremie JAKUBOWICZ
    AISTATS 2013: 16th International Conference on Artificial Intelligence and Statistics | 2013
    It is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormal-ity/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estimation techniques, observations being viewed as all the more "abnormal" as they are located far in the tail(s) of the underlying probability distribution. In a wide variety of applications , it is desirable to dispose of a scalar valued "scoring" function allowing for comparing the degree of abnormality of multi-variate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance criterion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estimation of this criterion is tackled and preliminary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical performance measure are established.
  • Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization.

    Pascal BIANCHI, Jeremie JAKUBOWICZ
    IEEE Transactions on Automatic Control | 2013
    We introduce a new framework for the convergence analysis of a class of distributed constrained non-convex optimization algorithms in multi-agent systems. The aim is to search for local minimizers of a non-convex objective function which is supposed to be a sum of local utility functions of the agents. The algorithm under study consists of two steps: a local stochastic gradient descent at each agent and a gossip step that drives the network of agents to a consensus. Under the assumption of decreasing stepsize, it is proved that consensus is asymptotically achieved in the network and that the algorithm converges to the set of Karush-Kuhn-Tucker points. As an important feature, the algorithm does not require the double-stochasticity of the gossip matrices. It is in particular suitable for use in a natural broadcast scenario for which no feedback messages between agents are required. It is proved that our results also holds if the number of communications in the network per unit of time vanishes at moderate speed as time increases, allowing potential savings of the network's energy. Applications to power allocation in wireless ad-hoc networks are discussed. Finally, we provide numerical results which sustain our claims.
  • Asynchronous peer-to-peer consensus in varieties.

    Anass BELLACHEHAB, Pascal BIANCHI, Jeremie JAKUBOWICZ
    15èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications (AlgoTel) | 2013
    The objective of this paper is to propose a generalization of the asynchronous distributed peer-to-peer consensus algorithm of \cite{boyd:gossip} for data belonging to a Riemmannian variety. We show its convergence in the case where the variety is simply connected and without focal points. The interest of this algorithm is illustrated in the case where a network of sensors seeks to find a consensus on discrete probability laws. However, the field of applications covered by this algorithm goes far beyond this framework.
  • A Total Variation based approach for robust consensus in distributed networks.

    Walid BEN AMEUR, Pascal BIANCHI, Jeremie JAKUBOWICZ
    52nd IEEE Conference on Decision and Control | 2013
    Consider a connected network of agents endowed with local cost functions representing private objectives. Agents seek to find an agreement on some minimizer of the aggregate cost, by means of repeated communications between neighbors. This paper investigates the case where some agents are unreliable in the sense that they permanently inject some false value in the network. We introduce a new relaxation of the initial optimization problem. We show that the relaxed problem is equivalent to the initial one under some regularity conditions which are characterized. We propose two iterative distributed algorithms for finding minimizers of the relaxed problem. When all agents are reliable, these algorithms converge to the sought consensus provided that the above regularity conditions are satisfied. In the presence of misbehaving agents, we show in simple scenario that our algorithms converge to a solution which remains in the vicinity of the sought consensus. Unlike standard distributed algorithms, our approach turns out to be unsensitive to large perturbations. Numerical experiments complete our theoretical results.
  • A stochastic opinion dynamics model with multiple contents.

    Julio cesar LOUZADA PINTO, Tijani CHAHED, Jeremie JAKUBOWICZ
    52nd IEEE Conference on Decision and Control | 2013
    We introduce a new model of opinion dynamics in which agents interact with each other about several distinct opinions/contents. In most of the literature about opinion dynamics, agents perform convex combinations of other agents' opinions. In our framework, a competition between opinions takes place: agents do not exchange all their opinions with each other, they only communicate about the opinions they like the most. Our model uses scores to take this competition into account: each agent maintains a list of scores for each opinion held. Opinions are selected according to their scores (the higher its score, the more likely an opinion is to be expressed) and then transmitted to neighbors. Once an agent receives an opinion it gives more credit to it, i.e. a higher score to this opinion. Under this new framework, we derive a convergence result which holds under mild assumptions on the way information is transmitted by agents and leads to consensus in a particular case. We also provide some numerical results illustrating the formation of consensus under different topologies (complete and ring graphs) and different initial conditions (random and biased towards a specific content).
  • Scoring anomalies : a M-estimation formulation.

    Stephan CLEMENCON, Jeremie JAKUBOWICZ
    AISTATS 2013 : 16th international conference on Artificial Intelligence and Statistics | 2013
    It is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormality/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estimation techniques, observations being viewed as all the more "abnormal" as they are located far in the tail(s) of the underlying probability distribution. In a wide variety of applications, it is desirable to dispose of a scalar valued "scoring" function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance criterion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estimation of this criterion is tackled and preliminary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical performance measure are established.
  • Distributed stochastic approximation: the cost of non-bistochasticity.

    Gemma MORRAL, Pascal BIANCHI, Gersende FORT, Jeremie JAKUBOWICZ
    XXIVème Colloque Gretsi | 2013
    We are interested in the problem of stochastic approximation in a distributed context. The iterative algorithm under study consists of two steps: a local stochastic approximation step performed by each agent, and a gossip step consisting of weighted averages of the local estimates. The matrix of weights used in the gossip step is assumed to be stochastic per row. We relax the bistochasticity assumption to have less restrictive communication protocols and characterize the resulting performance degradation.
  • Aircraft classification with a low resolution infrared sensor.

    Sidonie LEFEBVRE, Sidonie ALLASSONNIERE, Jeremie JAKUBOWICZ, Thomas LASNE, Eric MOULINES
    Machine Vision and Applications | 2013
    Existing computer simulations of aircraft infrared signature (IRS) do not account for dispersion induced by uncertainty on input parameters, such as aircraft aspect angles and meteorological conditions. As a result, they are of little use to quantify the detection performance of IR optronic systems: in this case, the scenario encompasses a lot of possible situations that must indeed be considered, but cannot be individually simulated. In this paper, we focus on low resolution infrared sensors and we propose a methodological approach for predicting simulated IRS dispersion of an aircraft, and performing a classification of different aircraft on the resulting set of low resolution infrared images. It is based on a quasi-Monte Carlo survey of the code output dispersion, and on a maximum likelihood classification taking advantage of Bayesian dense deformable template models estimation. This method is illustrated in a typical scenario, i.e., a daylight air-to-ground full-frontal attack by a generic combat aircraft flying at low altitude, over a database of 30,000 simulated aircraft images. Assuming a spatially white noise background model, classification performance is very promising, and appears to be more accurate than more classical state of the art techniques (such as kernel-based support vector classifiers).
  • Adaptive Learning Vector Quantization for Online Parametric Estimation.

    Pascal BIANCHI, Jeremie JAKUBOWICZ
    IEEE Transactions on Signal Processing | 2013
    This paper addresses the problem of parameter estimation in a quantized and online setting. A sensing unit collects random vector-valued samples from the environment. These samples are quantized and transmitted to a central processor which generates an online estimate of the unknown parameter. This paper provides a closed-form expression of the excess mean square error (MSE) caused by quantization in the high-rate regime i.e., when the number of quantization levels is supposed to be large. Next, we determine the quantizers which mitigate the excess MSE. The optimal quantization rule unfortunately depends on the unknown parameter. To circumvent this issue, we introduce a novel adaptive learning vector quantization scheme which allows to simultaneously estimate the parameter of interest and select an efficient quantizer.
  • Random Pairwise Gossip on Hadamard manifolds.

    Anass BELLACHEHAB, Jeremie JAKUBOWICZ
    2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) | 2013
    No summary available.
  • On-Line Learning Gossip Algorithm in Multi-Agent Systems with Local Decision Rules.

    Pascal BIANCHI, Stephan CLEMENCON, Jeremie JAKUBOWICZ, Gemma MORRAL
    2013
    No summary available.
  • TV regularization for robust consensus in distributed systems.

    Walid BEN AMEUR, Pascal BIANCHI, Jeremie JAKUBOWICZ
    XXIVème Colloque Gretsi | 2013
    Consider a network of agents whose objective is to find a consensus on the minimizer of some unknown function. An important special case is obtained when this minimizer is the average of values locally observed by the agents. We are interested in distributed algorithms allowing to estimate the solution through exchanges between the agents, supposedly connected by a graph. The literature is rich in distributed algorithms solving such a problem. However, these algorithms are very sensitive to possible deficient behaviors of some agents (malicious agents or "stubborn" agents refusing to update their estimate). Our goal is to propose robust algorithms. We introduce a relaxation of the initial problem based on total variation. We show that the solutions of the initial problem and the relaxed problem coincide, under certain verifiable regularity conditions. We provide two distributed algorithms leading to the solution. Finally, we test the robustness of our algorithms in the presence of a stubborn agent continuously injecting an inconsistent estimate into the network: we show that, independently of the magnitude of the inconsistent estimate, the estimates are kept in a neighborhood of the desired consensus. Numerical results confirm the robustness of our algorithms in this paper.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr