Prediction by quantization of a conditional distribution
Résumé
Given a pair of random vectors $(X,Y)$, we consider the problem of approximating $Y$ by $\bc(X)=\{\bc_1(X),\dots,\bc_M(X)\}$ where $\bc$ is a measurable set-valued function.We give meaning to the approximation by using the principles of vector quantization which leads to the definition of a multifunction regression problem.The formulated problem amounts at quantizing the conditional distributions of $Y$ given $X$.We propose a nonparametric estimate of the solutions of the multifunction regression problem by combining the method of $M$-means clustering with the nonparametric smoothing technique of $k$-nearest neighbors.We provide an asymptotic analysis of the estimate and we derive a convergence rate for the excess risk of the estimate.The proposed methodology is illustrated on simulated examples and on a speed-flow traffic data set emanating from the context of road traffic forecasting.
Origine : Fichiers produits par l'(les) auteur(s)