[Human Pose Estimation] AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

꾸준희 2023. 5. 23. 00:36

728x90

GitHub : https://github.com/zhezh/occlusion_person

GitHub - zhezh/occlusion_person: A dataset for multiview 3D human pose estimation with detailed occlusion labels, powered by Unr

A dataset for multiview 3D human pose estimation with detailed occlusion labels, powered by UnrealCV - GitHub - zhezh/occlusion_person: A dataset for multiview 3D human pose estimation with detaile...

github.com

Paper : https://arxiv.org/abs/2010.13302

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

Occlusion is probably the biggest challenge for human pose estimation in the wild. Typical solutions often rely on intrusive sensors such as IMUs to detect occluded joints. To make the task truly unconstrained, we present AdaFuse, an adaptive multiview fus

arxiv.org

이번에 읽을 논문은 2020년에 소개된 "AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild" 논문입니다. occlusion joint 문제를 해결하기 위해 일반적인 방법에서 종종 IMU와 같은 intrusive sensor를 사용하곤 하는데, 이를 이러한 sensor를 이용하지 않고 visible view를 활용하여 occluded view의 feature를 향상시킬 수 있도록 하는 AdaFuse, an adaptive multiview fusion 방법을 제안합니다. AdaFuse 방법의 keypoint는 heatmap representation의 sparsity를 탐색하여 2가지 view tkdldml point-point correspondence을 결정하는 것입니다. 또한 "bed" view로 인해 good view가 손상될 가능성을 줄이기 위해 각 camera view에 대한 adaptive fusion weight를 적용합니다.

제안하는 방식은 아래와 같습니다. multiview image를 입력으로 받아 각 view에 대한 2d heatmap을 출력하고, epipolar geometry를 통해 이를 fusion 시킵니다. 즉, occluded view는 다른 view로 부터 부족한 부분을 채우게 됩니다.

appearance embedding, geometry embedding은 다음과 같이 수행됩니다.

실험결과는 다음과 같습니다.

여기서 맨 마지막 줄은 failure case를 나타냅니다.

그리고 이 논문에서는 Occlusion-Person Dataset을 생성했습니다.

GitHub : https://github.com/zhezh/occlusion_person

GitHub - zhezh/occlusion_person: A dataset for multiview 3D human pose estimation with detailed occlusion labels, powered by Unr

github.com

이 데이터에서는 기존 데이터와 달리 occluded joint 정보가 포함되어 있습니다. 약 20.3% 포함되어있으며, 여기서 occlusion은 human-object occlusion을 뜻합니다.

아래 그림을 보시면 occluded joint를 빨간색으로 표시해두었습니다. 이러한 데이터를 만들기 위해서 UnrealCV 방법을 이용하여 3d 모델에서 multiview image와 depth map을 추출합니다. 그 다음 서로 다른 옷을 입은 13명의 사람이 living room, bedroom, office에 배치됩니다. 그 다음 CMU Motion Capture database에 따라 pose들이 생성되고, joint를 일부러 가리기 위해 소파와 책상과 같은 object를 사용합니다. 그 다음 depth map을 렌더링 하기 위해 8개의 카메라가 배치되고(높이 0.9 and 2.3m, 반경 2m 원에 의해 45도 마다 배치됨), 15개의 3d joint를 생성합니다. 그 다음 occlusion label을 camera coordinate system의 3d joint와 depth map의 값을 비교하여 occlusion 여부를 판단합니다. 두 값의 차이가 30cm 보다 작으면 가려지지 않았다고 판단하게 됩니다.

728x90

저작자표시 비영리 (새창열림)