, Youtube-8m: A large-scale video classification benchmark, 2016.
Sequential deep learning for human action recognition, 2011. ,
DOI : 10.1007/978-3-642-25446-8_4
URL : https://hal.archives-ouvertes.fr/hal-01354493
Glimpse clouds: Human activity recognition from unstructured feature points, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01713109
Interaction networks for learning about objects, relations and physics, 2016. ,
Temporal relational reasoning in videos, 2018. ,
Quo vadis, action recognition? a new model and the kinetics dataset, 2017. ,
Scaling egocentric vision: The epic-kitchens dataset, 2018. ,
Long-term recurrent convolutional networks for visual recognition and description, 2015. ,
DOI : 10.21236/ada623249
URL : http://www.dtic.mil/dtic/tr/fulltext/u2/a623249.pdf
Lintel: Python video decoding, 2018. ,
Comparing machines and humans on a visual categorization test, Proceedings of the National Academy of Sciences of the United States of America, vol.108, pp.17621-17626, 2011. ,
DOI : 10.1073/pnas.1109168108
URL : http://www.pnas.org/content/108/43/17621.full.pdf
From lifestyle vlogs to everyday interactions, 2018. ,
The "something something" video database for learning and evaluating visual common sense, 2017. ,
DOI : 10.1109/iccv.2017.622
URL : http://arxiv.org/pdf/1706.04261
Ava: A video dataset of spatio-temporally localized atomic visual actions, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01764300
Mask r-cnn, 2017. ,
DOI : 10.1109/tpami.2018.2844175
Deep residual learning for image recognition, 2016. ,
DOI : 10.1109/cvpr.2016.90
URL : http://arxiv.org/pdf/1512.03385
Long short-term memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997. ,
Compositional attention networks for machine reasoning, 2018. ,
Largescale video classification with convolutional neural networks, 2014. ,
DOI : 10.1109/cvpr.2014.223
URL : http://www.cs.cmu.edu/~rahuls/pub/cvpr2014-deepvideo-rahuls.pdf
, Not-so-CLEVR: Visual relations strain feedforward neural networks, 2018.
DOI : 10.1098/rsfs.2018.0011
Adam: A method for stochastic optimization, p.ICML, 2015. ,
Visual genome: Connecting language and vision using crowdsourced dense image annotations, International Journal of Computer Vision (IJCV), vol.123, pp.32-73, 2017. ,
DOI : 10.1007/s11263-016-0981-7
URL : https://link.springer.com/content/pdf/10.1007%2Fs11263-016-0981-7.pdf
Predicting deeper into the future of semantic segmentation, 2017. ,
DOI : 10.1109/iccv.2017.77
URL : https://hal.archives-ouvertes.fr/hal-01494296
2d/3d pose estimation and action recognition using multitask deep learning, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01815703
, Moments in time dataset: one million videos for event understanding, 2018.
Learning visual reasoning without strong priors, ICML Machine Learning in Speech and Language Processing Workshop, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01648684
Seeing the arrow of time, 2014. ,
Faster R-CNN: Towards real-time object detection with region proposal networks, 2015. ,
DOI : 10.1109/tpami.2016.2577031
URL : http://arxiv.org/pdf/1506.01497
Imagenet large scale visual recognition challenge, IJCV, vol.115, issue.3, pp.211-252, 2015. ,
DOI : 10.1007/s11263-015-0816-y
URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf
A simple neural network module for relational reasoning, 2017. ,
, NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis, 2016.
DOI : 10.1109/cvpr.2016.115
URL : http://arxiv.org/pdf/1604.02808
Action recognition using visual attention, ICLR Workshop, 2016. ,
Two-stream convolutional networks for action recognition in videos, 2014. ,
An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data, 2016. ,
25 years of CNNs: Can we compare to human abstraction capabilities, 2016. ,
Relational neural expectation maximization: Unsupervised discovery of objects and their interactions, 2018. ,
Lattice long shortterm memory for human action recognition, 2017. ,
DOI : 10.1109/iccv.2017.236
URL : http://arxiv.org/pdf/1708.03958
Learning spatiotemporal features with 3d convolutional networks, 2015. ,
DOI : 10.1109/iccv.2015.510
URL : http://arxiv.org/pdf/1412.0767
Graph attention networks, 2018. ,
Action Recognition by Dense Trajectories, 2011. ,
DOI : 10.1109/cvpr.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818
, Visual interaction networks: Learning a physics simulator from video, 2017.
Rethinking spatiotemporal feature learning for video understanding, 2017. ,