Text mining with constrained tensor decomposition - CICS Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Text mining with constrained tensor decomposition

Résumé

Text mining, as a special case of data mining, refers to the estimation of knowledge or parameters necessary for certain purposes, such as unsupervised clustering by observing various documents. In this context, the topic of a document can be seen as a hidden variable, and words are multi-view variables related to each other by a topic. The main goal in this paper is to estimate the probability of topics, and conditional probability of words given topics. To this end, we use non negative Canonical Polyadic (CP) decomposition of a third order moment tensor of observed words. Our computer simulations show that the proposed algorithm has better performance compared to a previously proposed algorithm, which utilizes the Robust tensor power method after whitening by second order moment. Moreover, as our cost function includes the non negativity constraint on estimated probabilities, we never obtain negative values in our estimated probabilities , whereas it is often the case with the power method combined with deflation. In addition, our algorithm is capable of handling over-complete cases, where the number of hidden variables is larger than that of multi-view variables, contrary to deflation-based techniques. Further, the method proposed therein supports a larger over-completeness compared to modified versions of the tensor power method, which has been customized to handle over-complete case.
Fichier principal
Vignette du fichier
SobhCJB19_LOD28.pdf (502.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02084803 , version 3 (01-07-2019)

Identifiants

  • HAL Id : hal-02084803 , version 3

Citer

Elaheh Sobhani, Pierre Comon, Christian Jutten, Massoud Babaie-Zadeh. Text mining with constrained tensor decomposition. LOD 2019 - 5th International Conference on Machine Learning, Optimization, and Data Science, Sep 2019, Certosa di Pontignano, Siena, Italy. ⟨hal-02084803⟩
375 Consultations
357 Téléchargements

Partager

Gmail Facebook X LinkedIn More