BAUDRY Marc

< Back to ILB Patrimony
Topics of productions
Affiliations
  • 1998 - 1999
    Université Rennes 1
  • 2003
  • 2002
  • 1999
  • 1995
  • Development of an automatic speech synthesis system from standard vowelized Arabic text.

    Sofiane BALOUL, Marc BAUDRY
    2003
    The work of this thesis is a contribution to the study and development of a speech synthesis system from vowelized standard Arabic text based on the diphone. This contribution intervenes at different levels of this system: construction of the acoustic base, syntactic analysis, grapheme-phoneme conversion and prosody generation. The implemented morpho-syntactic analysis is based on the use of a partial lexicon, default labeling and the propagation of contextual deductions. It allows the splitting of the text into non-recursive sections (intermediate between word and sentence). The syntax-prosody interface then distributes the pauses and generates the prosodic parameters of pitch and duration. All of this processing is integrated into the multilingual text-to-speech system from Elan Speech.
  • Proposal of an adaptive analysis/synthesis scheme in the time-frequency plane based on entropy criteria: application to transform-based audio coding.

    Gilles GONON, Marc BAUDRY, Silvio MONTRESOR
    2002
    The adapted representations contribute to the study and processing of the information carried by the signals by allowing a different relevant analysis for each signal. This thesis work deals with the development of a representation using successively temporal and frequency segmentations adapted to the signal, which is more flexible than existing solutions. This scheme is applied in a perceptual encoder by high fidelity transform. The signal is first temporally segmented. The criterion used is based on a local entropy estimator, which provides an index of variations, conducive to an automatic segmentation separating transient and stationary areas. The temporal slices thus delimited are then decomposed into wavelet packets and a search for the best basis allows the frequency adaptation of the representation. An extension of the best basis search is proposed to increase the dictionary of available bases compared to the dyadic case. At the end of this analysis the signal is localized in atoms of the time-frequency plane. An original architecture coder including our representation is then presented, as well as the details of its implementation. This encoder is evaluated by subjective tests comparing the compressed sounds to the originals and to the MPEG1-III standard for a bit rate of 96 kbit/s. The results show that the use of the adapted representation scheme in an encoder is competitive with standard encoder solutions while many improvements are possible.
  • Options: applications to resource and environmental economics.

    Marc BAUDRY, Christian MOUTON
    1999
    Many environmental policies operate on the principle of pollution thresholds above which environmental protection measures are triggered. However, economic theory says little about their validity. The purpose of this paper is to fill this gap. The recognition of the role played by technological change in pollution control is the main thread of the proposed response. The major characteristic of these changes is their irreversibility. This is particularly important when the pollution considered is a stock pollution: a dynamic approach to the problem is therefore proposed. It is coupled with a stochastic approach intended to integrate the uncertainty on the evolution of the pollution and its effects. The first part of the work is devoted to the presentation and discussion of the tools used. Initially developed to deal with investment choices, the recent theory of <> has proven to be a particularly adequate tool for analyzing issues of irreversibility and uncertainty in the field of resources and the environment. A synthesis of the notion of real option and those of quasi-option value and option price, which are older and specific to the economics of resources and the environment, is proposed. It allows us to better dissociate the effects of irreversibility from those of uncertainty. Once these precisions have been made, a model justifying the use of pollution thresholds from an economic point of view is developed throughout the second part. The model differs from the usual models of real options by the fact that the decision considered here, the technological change, affects the evolution of the variable of interest of the problem, the pollution. The technological aspects and the interest of an environmental tax are more particularly studied. An application to the case of the greenhouse effect illustrates the point.
  • Study of the parametrization of the speech signal from wavelet representations.

    Christophe GERARD, Marc BAUDRY
    1995
    The parametrization step consists in representing the signal by a reduced, relevant and robust set of parameters. Compared to the short term fourier transform, wavelet representations present interesting properties to parametrize the speech signal. The purpose of our work is to determine the contribution of wavelet representations in speech recognition. In order to validate our parametrizations in existing recognition systems, we set ourselves in the framework of fixed size frame analysis. The Morlet wavelet is particularly well adapted to the processed signal, because of its adaptable frequency distribution and its minimal time-frequency localization according to the uncertainty principle. The realized parameterizations are constituted by a single energy coefficient in each frequency band, and for each analysis window. Several variants have been tested: average or maximum coefficient, discrete or continuous wavelet decomposition, logarithmic or psychoacoustic frequency scale, synchronous or asynchronous maximums, spectral or pseudo-spectral domain. The conclusion of our study allows us to establish that the wavelet parameterizations implemented are, at most, as robust as the mfcc (mel frequency cepstrum coefficients). More precisely, it appears that the operating framework used is too restrictive to highlight the expected contribution of wavelet representations in the parametrization framework. Even if improvements can be brought to the realized parametrizations, the preferred operating framework of wavelet representations remains the variable time analysis, which will require the development of recognition systems with specific architectures.
  • Detection and identification of occluders using the wavelet transform.

    Francois MALBOS, Marc BAUDRY
    1995
    The work presented in this thesis is part of the acoustic-phonetic decoding of speech. In this context, two steps have been dissociated: the detection and the recognition of French occluders using the wavelet transform. In the detection stage, we tried to approximate the bar of explosion of the occluders by an impulse. The validation of this model is done by analyzing the correlation functions between the modulus of the wavelet transform of the speech signal and that of the analyzing wavelet. For deaf occluders (respectively sonorous), a detection rate equal to 89.5% (respectively 67.6%) is associated with a false alarm rate of 10.5% (respectively 32.4%). The interest of our detection system is twofold. On the one hand, it allows the localization of the explosion bar with an error of between 0.2 ms and 1 ms depending on the frequency structure of the occlusion. On the other hand, it allows to measure the more or less impulsive character of the occlusive. Although robust, the detection system is less efficient for noisy signals. A prior reduction of the background noise level does not systematically improve the detection rates. Using blast bar detection, the recognition system is based on the statistical analysis of the average of the wavelet coefficients over a time support equal to one millisecond. Three analyses have been evaluated: discriminant analysis, segmentation trees and maximum likelihood trees. The discriminant analysis is characterized by an identification rate higher than 74% for contextual recognition. Moreover, it allows to recognize 70% of the false detections of the detection module. Due to the evaluation and performance of each of these methods, only the recognition rates of this analysis are compared to those of ten systems described in the literature. For a 99% confidence interval, seven of them show non-significantly different performance. This comparison shows that the high frequency smoothing of the wavelet transform is not a major handicap for the recognition of deaf occluders as one might have supposed.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr