Binarsity: a penalization for one-hot encoded features.

Authors

ALAYA Mokhtar z.
BUSSY Simon
GAIFFAS Stephane
GUILLOUX Agathe

Publication date

2017

Publication type

Other

Summary This paper deals with the problem of large-scale linear supervised learning in settings where a large number of continuous features are available. We propose to combine the well-known trick of one-hot encoding of continuous features with a new penalization called binarsity. In each group of binary features coming from the one-hot encoding of a single raw continuous feature, this penalization uses total-variation regularization together with an extra linear constraint to avoid collinearity within groups. Non-asymptotic oracle inequalities for generalized linear models are proposed, and numerical experiments illustrate the good performances of our approach on several datasets. It is also noteworthy that our method has a numerical complexity comparable to standard L1 penalization.

See the publication

Topics of the publication

Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr