Optimal Probabilistic Generation of XML Documents - Département Informatique et Réseaux Accéder directement au contenu
Article Dans Une Revue Theory of Computing Systems Année : 2015

Optimal Probabilistic Generation of XML Documents

Résumé

We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the likelihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider a continuation-test generator that performs, while generating documents, tests of schema satisfiability; these tests prevent from generating a document violating the constraints but, as we will see, they are computationally expensive. We also study a restart generator that may generate an invalid document and, when this is the case, restarts and tries again. Finally, we consider the injection of data values into the structure, to obtain a full XML document. We study different approaches for generating these values.
Fichier principal
Vignette du fichier
abiteboul2013optimal-1.pdf (747.56 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01261950 , version 2 (23-12-2015)
hal-01261950 , version 1 (26-01-2016)

Identifiants

  • HAL Id : hal-01261950 , version 2

Citer

Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, Pierre Senellart. Optimal Probabilistic Generation of XML Documents. Theory of Computing Systems, 2015, 57 (4), pp.806-842. ⟨hal-01261950v2⟩
172 Consultations
207 Téléchargements

Partager

Gmail Facebook X LinkedIn More