3D Multi-Pigeon Pose Estimation and Tracking

Urs Waldmann1,2*, Alex Hoi Hang Chan2,3,4*, Hemal Naik2,4,5, Máté Nagy2,3,4,6,7, Iain D. Couzin2,3,4, Oliver Deussen1,2, Bastian Goldluecke1,2 and Fumihiro Kano2,3,4

* Contributed Equally
1 Department of Computer and Information Science, University of Konstanz, Germany
2 Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Germany.
3 Department of Collective Behavior, Max Planck Institute of Animal Behavior, Konstanz, Germany.
4 Department of Biology, University of Konstanz, Germany.
5 Department of Ecology of Animal Societies, Max Planck Institute of Animal Behavior, Konstanz, Germany.
6 Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary.
7 MTA-ELTE 'Lendület' Collective Behaviour Research Group, Hungarian Academy of Sciences, Budapest, Hungary.


2023/10/13 We have officially launched our project site, and launched our hugging face demo for 2D posture estimation of pigeons in any environments [Hungging Face].


3D-MuPPET Framework

Markerless methods for animal posture tracking have been developing recently, but frameworks and benchmarks for tracking large animal groups in 3D are still lacking. To overcome this gap in the literature, we present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple-views. We train a pose estimator to infer 2D keypoints and bounding boxes of multiple pigeons, then triangulate the keypoints to 3D. For correspondence matching, we first dynamically match 2D detections to global identities in the first frame, then use a 2D tracker to maintain correspondences accross views in subsequent frames. We achieve comparable accuracy to a state of the art 3D pose estimator for Root Mean Square Error (RMSE) and Percentage of Correct Keypoints (PCK). We also showcase a novel use case where our model trained with data of single pigeons provides comparable results on data containing multiple pigeons. This can simplify the domain shift to new species because annotating single animal data is less labour intensive than multi-animal data. Additionally, we benchmark the inference speed of 3D-MuPPET, with up to 10 fps in 2D and 1.5 fps in 3D, and perform quantitative tracking evaluation, which yields encouraging results. Finally, we show that 3D-MuPPET also works in natural environments without model fine-tuning on additional annotations. To the best of our knowledge we are the first to present a framework for 2D/3D posture and trajectory tracking that works in both indoor and outdoor environments.

Additional Results

3D Pose Estimation and Tracking of Multiple Pigeons in Captive Environments

Using the 3D-POP dataset, we trained 2D keypoint detection models, then triangulated postures into 3D. We also show that models trained on single pigeon data also work well with multi-pigeon data.

This video shows 3D keypoints from triangulation, reprojected to a single camera view.

This video shows 3D pose estimations of 10 foraging pigeons.

Pigeons in outdoor environments. Using the segment anything model, we trained a 2D keypoint detector with masked pigeons from captive data, then applied the model to pigeon videos outdoors for 3D tracking in the wild.

Cite us

      title={3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking},
      author={Waldmann, Urs and Chan, Alex Hoi Hang and Naik, Hemal and Nagy, M{\'a}t{\'e} and Couzin, Iain D and Deussen, Oliver and Goldluecke, Bastian and Kano, Fumihiro},
      journal={arXiv preprint arXiv:2308.15316},