Anomaly Ranking in a High Dimensional Space: The Unsupervised TreeRank Algorithm.

Authors

CLEMENCON S.
BASKIOTIS N.
VAYATIS N.

Publication date

2016

Publication type

Book Chapter

Summary Ranking unsupervised data in a multivariate feature space

X \subset R^{d}

, d ≥ 1 by degree of abnormality is of crucial importance in many applications (e.g., fraud surveillance, monitoring of complex systems/infrastructures such as energy networks or aircraft engines, system management in data centers). However, the learning aspect of unsupervised ranking has only received attention in the machine-learning community in the past few years. The Mass-Volume (MV) curve has been recently introduced in order to evaluate the performance of any scoring function

s : X \to R

with regard to its ability to rank unlabeled data. It is expected that relevant scoring functions will induce a preorder similar to that induced by the density function f(x) of the (supposedly continuous) probability distribution of the statistical population under study. As far as we know, there is no efficient algorithm to build a scoring function from (unlabeled) training data with nearly optimal MV curve when the dimension d of the feature space is high. It is the major purpose of this chapter to introduce such an algorithm which we call the Unsupervised TreeRank algorithm. Beyond its description and the statistical analysis of its performance, numerical experiments are exhibited in order to provide empirical evidence of its accuracy.

Publisher

Springer International Publishing

See the publication

Topics of the publication

No themes identified

Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr