AI Research Topic/Human Pose Estimation

[Paper Review] YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

๊พธ์ค€ํฌ 2023. 2. 14. 01:32
728x90
๋ฐ˜์‘ํ˜•

 

Paper : https://openaccess.thecvf.com/content/CVPR2022W/ECV/papers/Maji_YOLO-Pose_Enhancing_YOLO_for_Multi_Person_Pose_Estimation_Using_Object_CVPRW_2022_paper.pdf

 

GitHub : https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose

 

GitHub - TexasInstruments/edgeai-yolov5: YOLOv5 ๐Ÿš€ in PyTorch > ONNX > CoreML > TFLite

YOLOv5 ๐Ÿš€ in PyTorch > ONNX > CoreML > TFLite. Contribute to TexasInstruments/edgeai-yolov5 development by creating an account on GitHub.

github.com

 

 

 

์ด๋ฒˆ์— ์†Œ๊ฐœํ•  ๋…ผ๋ฌธ์€ YOLO ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ด์šฉํ•œ Human Pose Estimation ๋ฐฉ๋ฒ•์ธ YOLO-Pose ์ž…๋‹ˆ๋‹ค. CVPR 2022์— ์†Œ๊ฐœ๋˜๊ธฐ๋„ ํ•˜์˜€๋„ค์š”!

 

 

๋…ผ๋ฌธ ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ชจ๋ธ์€ single forward pass์—์„œ ์—ฌ๋Ÿฌ๋ช…์˜ ์‚ฌ๋žŒ๊ณผ 2d pose์— ๋Œ€ํ•œ bbox๋ฅผ ๊ฒ€์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋ฏ€๋กœ bottom-up, top-down ๋ฐฉ์‹์˜ ์žฅ์ ์„ ๋ชจ๋‘ ๊ฐ€์ง„๋‹ค๊ณ  ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ ‘๊ทผ๋ฐฉ์‹์€ ๊ฒ€์ถœ๋œ keypoint๋“ค์„ skeleton์œผ๋กœ ๊ทธ๋ฃนํ™” ํ•˜๊ธฐ ์œ„ํ•œ post-processing์ด ํ•„์š” ์—†๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์™œ๋ƒ๋ฉด anchor์™€ ์—ฐ๊ฒฐ๋œ keypoint๋Š” ์ด๋ฏธ ๊ทธ๋ฃนํ™” ๋˜์–ด์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์ธ ๊ทธ๋ฃนํ™” ์ž‘์—…์ด ํ•„์š” ์—†๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. post-processing์ด ์•„์˜ˆ ์—†๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๊ณ , object detection์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํ‘œ์ค€ NMS๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•˜๋„ค์š”. ๋˜ํ•œ COCO validation ์„ธํŠธ์—์„œ SOTA๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. 

 

 

๊ธฐ์กด ๋ฌธ์ œ

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด heatmap ๊ธฐ๋ฐ˜ two-stage ๋ฐฉ์‹์ด end-to-end๋กœ ํ•™์Šต์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ณ , ํ•™์Šต ์‹œ evaluation metric์— ์ตœ์ ํ™” ๋˜์ง€ ์•Š์€ L1 Loss์— ์˜์กดํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. 

 

 

Main Contribution

 

  • scale variation ๋ฐ occlusion๊ณผ ๊ฐ™์€ ์ฃผ์š” ๋ฌธ์ œ๋Š” ๊ณตํ†ต์ ์ธ ๋ฌธ์ œ์ด๊ธฐ ๋•Œ๋ฌธ์— object detection์— ๋”ฐ๋ผ multi-person pose estimation ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‘ ๋ถ„์•ผ๋ฅผ ํ•˜๋‚˜๋กœ ํ†ตํ•ฉํ•˜๊ธฐ ์œ„ํ•œ first step ์ด๋ผ๊ณ  ํ•˜๋„ค์š”. 

 

  • heatmap์ด ์—†๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์€ pixel level NMS, adjustment, refinement, line-integral, various grouping algorithm ๋“ฑ ์„ ํฌํ•จํ•˜๋Š” ๋ณต์žกํ•œ post-processing ๋Œ€์‹  object detection์˜ post-processing์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

  • box detection์—์„œ keypoint๋กœ IoU loss ๊ฐœ๋…์„ ํ™•์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. OKS(object keypoint similarity)๋Š” evaluation์—์„œ๋งŒ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํ•™์Šต ์‹œ์—๋„ loss๋กœ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. OKS loss๋Š” scale-invariant ํ•˜๋ฉฐ ๋ณธ์งˆ์ ์œผ๋กœ ๋‹ค๋ฅธ keypoint์— ๊ฐ ๋‹ค๋ฅธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

  • ~4x ์ •๋„ ์ ์€ ์ปดํ“จํŒ…์œผ๋กœ SOTA AP50์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด coco test-dev2017 ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ Yolo5m6-pose๋Š” AP50 ๊ธฐ์ค€ 283.0 GMACS์—์„œ 89.4์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ SOTA DEKR๊ณผ ๋น„๊ตํ•˜์—ฌ 66.3 GMACS์„ ๋‹ฌ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

  • joint detection ๋ฐ pose estimation framework๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. pose estimation์€ object detection network์—์„œ ๊ฑฐ์˜ ๋ฌด๋ฃŒ๋กœ ์ œ๊ณต๋œ๋‹ค๊ณ  ํ‘œํ˜„ํ•˜๊ณ  ์žˆ๋„ค์š” ใ…‹ใ…‹

 

  • EfficientHRNet๊ณผ ๊ฐ™์€ real-time ์ค‘์‹ฌ ๋ชจ๋ธ ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ํ›จ์”ฌ ๋›ฐ์–ด๋‚˜๊ณ , ๋ณต์žก์„ฑ์ด ๋‚ฎ์€ ๋ณ€ํ˜•๋œ ๋ชจ๋ธ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. 

 

 

YOLO-Pose

 

YOLO-Pose ๋ชจ๋ธ์€ YOLOv5 ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ์ด ๋ชจ๋ธ์€ ์ฃผ๋กœ anchor ๋‹น 85๊ฐœ์˜ ์š”์†Œ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” box head๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 80๊ฐœ์˜ ํด๋ž˜์Šค๋ฅผ ๊ฒ€์ถœํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. 80๊ฐœ ํด๋ž˜์Šค์— ๋Œ€ํ•œ bounding box, object score, confidence score๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฐ grid์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ์–‘์ด ๋‹ค๋ฅธ 3๊ฐœ์˜ anchor๊ฐ€ ์žˆ๊ณ , pose estimation์˜ ๊ฒฝ์šฐ ๊ฐ ์‚ฌ๋žŒ์ด 17๊ฐœ์˜ ์—ฐ๊ด€๋œ keypoint๋ฅผ ๊ฐ–๋Š” single class person detection ๋ฌธ์ œ๋กœ ์ •์˜๋˜๋ฉฐ, ๊ฐ keypoint๋Š” location๊ณผ confidence๋กœ ์‹๋ณ„๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ anchor์™€ ๊ด€๋ จ๋œ 17๊ฐœ์˜ keypoint์— ๋Œ€ํ•ด 51๊ฐœ์˜ ์š”์†Œ๊ฐ€ ์กด์žฌํ•˜๋ฉฐ, ๊ฐ anchor์— ๋Œ€ํ•ด keypoint head๋Š” 51๊ฐœ์˜ ์š”์†Œ๋ฅผ ์˜ˆ์ธกํ•˜๊ฒŒ ๋˜๊ณ , box head๋Š” 6๊ฐœ์˜ ์š”์†Œ๋ฅผ ์˜ˆ์ธกํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. n๊ฐœ์˜ keypoint๊ฐ€ ์žˆ๋Š” anchor์˜ ๊ฒฝ์šฐ overall prediction vector๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค. 

 

 

 

keypoint confidence๋Š” ํ•ด๋‹น keypoint์˜ visibility flag๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. keypoint๊ฐ€ ๋ณด์ด๊ฑฐ๋‚˜(visible) ๊ฐ€๋ ค์ง€๋ฉด(occluded) GT confidence๋Š” 1๋กœ ์„ค์ •๋˜๊ณ , ์ด๋ฏธ์ง€ ๋‚ด์— ์—†์œผ๋ฉด 0์œผ๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค. ์ถ”๋ก  ์‹œ confidence๊ฐ€ 0.5 ๋ณด๋‹ค ํฌ๋‹ค๋ฉด ํ•ด๋‹น keypoint๋Š” ์œ ํšจํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•˜๋ฉฐ, ์ด ์™ธ์— ๋‹ค๋ฅธ keypoint๋Š” reject ๋ฉ๋‹ˆ๋‹ค. confidence ๊ฐœ๋…์€ evaluation์—์„œ๋Š” ์‚ฌ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„คํŠธ์›Œํฌ๋Š” ๊ฐ detection ๋œ ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด 17๊ฐœ์˜ keypoint๋ฅผ ๋ชจ๋‘ ์˜ˆ์ธกํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ ์ด๋ฏธ์ง€ ๋ฐ–์— ์œ„์น˜ํ•˜๋Š” keypoint๋ฅผ ํ•„ํ„ฐ๋งํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ํ•„ํ„ฐ๋งํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ dangling keypoint(๋‹ฌ๋ž‘๋‹ฌ๋ž‘ ํ‚คํฌ์ธํŠธ..)๊ฐ€ ์ƒ๊ฒจ skeleton์ด ๋ณ€ํ˜•๋œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. 

 

YOLO-Pose๋Š” CSP-darknet53์„ backbone์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ , backbone์—์„œ ๋‹ค์–‘ํ•œ scale์˜ feature๋ฅผ ํ•ฉ์น˜๊ธฐ ์œ„ํ•ด PANet์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ ํฌ๊ธฐ๊ฐ€ ๋‹ค๋ฅธ 4๊ฐœ์˜ detection head๊ฐ€ ์กด์žฌํ•˜๊ณ , ๋งˆ์ง€๋ง‰์œผ๋กœ box์™€ keypoint๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ 2๊ฐœ์˜ ๋ถ„๋ฆฌ๋œ head๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ด ์ž‘์—…์—์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณต์žก์„ฑ์„ 150 GMACS๋กœ ์ œํ•œํ•˜๊ณ , ๊ทธ ๋‚ด์—์„œ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. 

 

์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ์‚ฌ๋žŒ๊ณผ ์ผ์น˜ํ•˜๋Š” anchor๋Š” bbox์™€ ํ•จ๊ป˜ ์ „์ฒด 2d pose๋ฅผ ์ €์žฅํ•˜๊ณ , bbox ์ขŒํ‘œ๋Š” anchor ์ค‘์‹ฌ์œผ๋กœ ๋ณ€ํ™˜๋˜๋Š” ๋ฐ˜๋ฉด, box scale์€ anchor์˜ ๋†’์ด์™€ ๋„ˆ๋น„์— ๋Œ€ํ•ด normalization ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ keypoint์˜ ์œ„์น˜๋„ anchor center๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ keypoint๋Š” anchor์˜ ๋†’์ด์™€ ๋„ˆ๋น„๋กœ normalization ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. keypoint ์™€ bbox ๋ชจ๋‘ anchor ์ค‘์‹ฌ์œผ๋กœ ์˜ˆ์ธก๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ anchor์˜ ๋„ˆ๋น„์™€ ๋†’์ด์™€ ๋ฌด๊ด€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— YOLOX, FCOS์™€ ๊ฐ™์€ anchor-free ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ ์‰ฝ๊ฒŒ ํ™•์žฅ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. 

 

IoU Based Bounding-box Loss Function

๋Œ€๋ถ€๋ถ„์˜ object detector๋Š” box detection์„ ์œ„ํ•ด distance-base loss ๋Œ€์‹  GIoU, DIoU, CIoU์™€ ๊ฐ™์€ IoU loss์˜ ๋ณ€ํ˜•๋œ ํ˜•ํƒœ๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด๋Ÿฌํ•œ loss๋“ค์€ scale-invariant์ด๋ฉฐ, evaluation metric์„ ์ง์ ‘ ์ตœ์ ํ™” ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” bbox supervision์„ ์œ„ํ•ด CIoU๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ $k^{th}$ anchor์™€ ์ผ์น˜ํ•˜๋Š” GT bbox์˜ ๊ฒฝ์šฐ location $(i, j)$ ๋ฐ scale s์—์„œ anchor ๋ฐ loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค. 

 

 

์—ฌ๊ธฐ์„œ $Box_{pred}^{s, i, j, k}$๋Š” location $(i, j)$ ๋ฐ scale s์—์„œ  $k^{th}$ anchor์˜ predicted box๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๊ฒฝ์šฐ ๊ฐ location์— 3๊ฐœ์˜ anchor๊ฐ€ ์žˆ์œผ๋ฉฐ, prediction์€ 4๊ฐœ์˜ scale๋กœ ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.  

 

Human Pose Loss Function Formulation

OKS๋Š” keypoint๋ฅผ ํ‰๊ฐ€ํ•˜๋Š”๋ฐ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” metric์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ heatmap ๊ธฐ๋ฐ˜ bottom-up ์ ‘๊ทผ ๋ฐฉ๋ฒ•์€ L1 loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ keypoint๋ฅผ ๊ฒ€์ถœํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ L1 loss๋Š” ์ตœ์ ์˜ OKS๋ฅผ ์–ป๋Š”๋ฐ ์ ํ•ฉํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. L1 Loss๋Š” ๊ต‰์žฅํžˆ ๋‹จ์ˆœํ•˜๋ฉฐ ๊ฐ์ฒด์˜ scale์ด๋‚˜ keypoint์˜ type์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. heatmap์€ probability map์œผ๋กœ ์ˆœ์ˆ˜ heatmap ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์—์„œ๋Š” OKS๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. OKS๋Š” keypoint ์œ„์น˜๋ฅผ regression ํ•  ๋•Œ๋งŒ loss function์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Geng  et.  al.  [30] ์—ฐ๊ตฌ์— ์˜ํ•˜๋ฉด OKS loss๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋‹จ๊ณ„์ธ, keypoint regression์„ ์œ„ํ•œ scale normalized L1 Loss๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. anchor center์—์„œ ์ง์ ‘ keypoint๋ฅผ regression ํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ evaluation metric์— ์ตœ์ ํ™”ํ•˜์—ฌ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ IoU loss ๊ฐœ๋…์„ keypoint๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. OKS Loss๋Š” scale-invariant์ด๋ฉฐ, ํŠน์ • keypoint์— ๋” ๋งŽ์€ ์ค‘์š”์„ฑ์„ ๋ถ€์—ฌํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด ์‚ฌ๋žŒ ๋จธ๋ฆฌ์˜ ํ‚คํฌ์ธํŠธ(๋ˆˆ, ์ฝ”, ๊ท€)๋Š” ์‚ฌ๋žŒ ์‹ ์ฒด์˜ ํ‚คํฌ์ธํŠธ(์–ด๊นจ, ๋ฌด๋ฆŽ, ์—‰๋ฉ์ด ๋“ฑ)๋ณด๋‹ค ๋™์ผํ•œ ํ”ฝ์…€ ์ˆ˜์ค€ ์˜ค๋ฅ˜(pixel-level error)์— ๋Œ€ํ•ด ๋” ๋งŽ์€ ํŒจ๋„ํ‹ฐ๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ weighting factor๋Š” redundantly annotated validation image์—์„œ COCO author๊ฐ€ ๊ฒฝํ—˜์ ์œผ๋กœ ์„ ํƒํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฒน์น˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์— ๋Œ€ํ•œ gradient vanishing์œผ๋กœ ์ธํ•ด ์–ด๋ ค์›€์„ ๊ฒช๋Š” vanilla IoU Loss์™€๋Š” ๋‹ฌ๋ฆฌ OKS Loss๋Š” ์ ˆ๋Œ€ ์ •์ฒด๋˜๋Š” ํ˜„์ƒ์ด ์—†๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ OKS Loss๋Š” dIoU Loss์™€ ์œ ์‚ฌํ•˜๋‹ค๊ณ  ํ•˜๋„ค์š”. ๋”ฐ๋ผ์„œ GT bbox๊ฐ€ location $(i, j)$ ๋ฐ scale s์—์„œ anchor์™€ ์ผ์น˜ํ•˜๋ฉด anchor ์ค‘์‹ฌ์„ ๊ธฐ์ค€์œผ๋กœ keypoint๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. OKS๋Š” ๊ฐ keypoint์— ๋Œ€ํ•ด ๊ฐœ๋ณ„์ ์œผ๋กœ ๊ณ„์‚ฐ๋œ ๋‹ค์Œ ํ•ฉ์‚ฐ๋˜์–ด final OKS Loss ๋˜๋Š” keypoint IoU Loss๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. 

 

 

 

๊ฐ keypoint์— ํ•ด๋‹นํ•˜๋Š” keypoint๊ฐ€ ํ•ด๋‹น ์‚ฌ๋žŒ์˜ ๊ฒƒ์ธ์ง€์˜ ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” confidence parameter๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ keypoint์— ๋Œ€ํ•œ visible flag๊ฐ€ GT๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. 

 

 

location $(i, j)$์—์„œ์˜ loss๋Š” GT bbox๊ฐ€ ํ•ด๋‹น anchor์™€ ์ผ์น˜ํ•˜๋Š” ๊ฒฝ์šฐ scale s์˜ $k^th$ anchor์— ๋Œ€ํ•ด ์œ ํšจํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ total loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

 

 

YOLO-Pose์˜ ์žฅ์  ์ค‘ ํ•˜๋‚˜๋Š” keypoint๊ฐ€ ์˜ˆ์ธก๋œ bbox ์•ˆ์— ์žˆ์–ด์•ผํ•œ๋‹ค๋Š” ์ œ์•ฝ์ด ์—†๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ keypoint๊ฐ€ occlusion์œผ๋กœ ์ธํ•ด bbox ์™ธ๋ถ€์— ์žˆ๋Š” ๊ฒฝ์šฐ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ธ์‹ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณดํ†ต top-down ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์—์„œ๋Š” keypoint๊ฐ€ bbox์— ์˜์กด์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ œ์•ฝ์ด ์žˆ์Šต๋‹ˆ๋‹ค. 

 

 

 

 

Experiments

 

COCO ๋ฐ์ดํ„ฐ์„ธํŠธ (val, test-dev)์—์„œ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

 

 

OKS Loss, L1 Loss์— ๋”ฐ๋ฅธ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. OKS Loss๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๋•Œ L1 Loss์— ๋น„ํ•ด AP ๊ธฐ์ค€ ์•ฝ 5% ์ •๋„ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์žˆ๋„ค์š”. ์ž„ํŒฉํŠธ ์žˆ๋Š” ๊ฒฐ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

 

complexity๋ฅผ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. 30 GMACS ์ดํ•˜์—ฌ์•ผ low complexity ๋ผ๊ณ  ํ•˜๋„ค์š”. ํ™•์‹คํžˆ input size์— ๋น„๋ก€ํ•˜๋Š” ๋“ฏ ํ•˜๊ณ , ๋™์ผ input size ๋Œ€๋น„ EfficientHRNet๊ณผ ๋น„๊ตํ•˜์˜€์„ ๋•Œ YOLO๊ฐ€ ๋” ๋ณต์žก๋„๊ฐ€ ๋‚ฎ์œผ๋ฉฐ, AP๋„ ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 

 

Quantization ํ•œ ๊ฒฐ๊ณผ๋„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 

 

๋งˆ์ง€๋ง‰์œผ๋กœ Qualitative result ์ž…๋‹ˆ๋‹ค. 

728x90
๋ฐ˜์‘ํ˜•