Inferring phonemic classes from CNN activation maps using clustering techniques - Algorithmes Parallèles et Optimisation Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Inferring phonemic classes from CNN activation maps using clustering techniques

Résumé

Today's state-of-art in speech recognition involves deep neu-ral networks (DNN). These last years, a certain research effort has been invested in characterizing the feature representations learned by DNNs. In this paper, we focus on convolutional neu-ral networks (CNN) trained for phoneme recognition in French. We report clustering experiments performed on activation maps extracted from the different layers of a CNN comprised of two convolution and sub-sampling layers followed by three dense layers. Our goal was to get insights into phone separability and phonemic categories inferred by the network, and how they vary according to the successive layers. Two directions were explored with both linear and non-linear clustering techniques. First, we imposed a number of 33 classes equal to the number of context-independent phone models for French, in order to assess the phoneme separability power of the different layers. As expected, we observed that this power increases with the layer depth in the network: from 34% to 74% in F-measure from the first convolution to the last dense layers, when using spectral clustering. Second, optimal numbers of classes were automatically inferred through inter-and intra-cluster measure criteria. We analyze these classes in terms of standard French phonological features.
Fichier principal
Vignette du fichier
pellegrini_17161.pdf (262.29 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01474886 , version 1 (23-02-2017)

Identifiants

  • HAL Id : hal-01474886 , version 1
  • OATAO : 17161

Citer

Thomas Pellegrini, Sandrine Mouysset. Inferring phonemic classes from CNN activation maps using clustering techniques. Annual conference Interspeech (INTERSPEECH 2016), Sep 2016, San Francisco, United States. pp. 1290-1294. ⟨hal-01474886⟩
176 Consultations
599 Téléchargements

Partager

Gmail Facebook X LinkedIn More