MetaPose:
Fast 3D Pose from Multiple Views without 3D Supervision

Ben Usman    Andrea Tagliasacchi    Kate Saenko    Avneesh Sud

Boston University    Google Research    Simon Fraser University    MIT-IBM Watson AI Lab   

In CVPR 2022

Paper | Code | Demo Videos




MetaPose accurately estimates 3D human poses, takes into account multi-view uncertainty, and uses only 2D supervision for training! It is faster and more accurate, especially with fewer cameras.


Abstract


In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date. We show how to train a neural model to perform this task with high precision and minimal latency overhead. The proposed model takes into account joint location uncertainty due to occlusion from multiple views, and requires only 2D keypoint data for training. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines on the well-established Human3.6M dataset, as well as the more challenging in-the-wild Ski-Pose PTZ dataset.

[slide deck link]


Citation


@inproceedings{usman2021metapose,
    author    = {Usman, Ben and Tagliasacchi, Andrea and Saenko, Kate and Sud, Avneesh},
    title     = {MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022}
}