Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data.
In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without human annotations. Because this representation encodes 3D geometry, using it in a semi-supervised setting makes it easier to learn a mapping from it to 3D human pose. As evidenced by our experiments, our approach improves over other semi-supervised methods while using as little as 1% of the labeled data.
https://arxiv.org/abs/1804.01110
By Helge Rhodin, Mathieu Salzmann and Pascal Fua