[Stereo Vision] 카메라 캘리브레이션의 내부 및 외부 매개변수(intrinsic, extrinsic parameters)

꾸준희

|2020. 12. 18. 02:33

728x90

카메라 캘리브레이션(Camera Calibration)이란?

실 세계는 3차원으로 이루어져있지만 이를 카메라로 촬영하게 되면 2차원 이미지로 투영된다. 이 때 실제 3차원 위치 좌표는 이미지 상에서 어디에 위치하는지 기하학적으로 계산할 때 영상을 찍을 당시의 카메라 위치 및 방향에 의해 결정된다. 하지만 실제 이미지는 사용된 렌즈, 대상과의 거리 등의 내부 요인에 영향을 받기 때문에 3차원 위치 좌표는 영상에 투영된 위치를 구하거나 역으로 영상 좌표로부터 3차원 공간좌표를 복원할 때 이러한 내부 요인을 제거해야 정확한 계산이 가능해진다. 이러한 내부 요인의 파라미터 값을 구하는 과정을 카메라 캘리브레이션이라고 한다.

즉, 사진이나 비디오를 촬영하는 실제 카메라 모델을 단순화 시킨 핀홀(Pinhole) 카메라 모델의 매개변수를 추정하는 작업을 말한다.

여기서 핀홀 카메라 모델이란 아래 그림과 같이 하나의 바늘 구멍(pinhole)을 통해 외부의 상이 이미지로 투영된다는 모델이다. 이 때 바늘 구멍이 렌즈 중심에 해당하며 이곳에서 뒷면의 상이 맺히는 곳까지의 거리가 카메라 초점거리이다.

일반적으로 핀홀 카메라의 매개변수는 카메라 행렬(Camera Matrix)이라 불리는 3 x 4 행렬로 표현된다. 핀홀 카메라의 매개변수를 추정하는 절차를 카메라 캘리브레이션이라고 일컫는 것이다.

이는 주로 스테레오 비전(Stereo Vision) 프로그램에서 사용되며, 두 카메라의 카메라 투영 행렬(camera projection matrix)을 사용하여 두 카메라에 의해 보여지는 한 점의 3차원 월드 좌표를 계산하기 위해 사용된다.

카메라 모델의 매개변수

픽셀 좌표에서 2차원 위치 좌표는 아래와 같이 표현한다.

$ \begin{bmatrix}
u & v & 1
\end{bmatrix}^{T} $

월드 좌표에서 3차원 위치 좌표는 아래와 같이 표현한다.

$ \begin{bmatrix}
x_{w} & y_{w} & z_{w} & 1
\end{bmatrix}^{T} $

핀홀 카메라 모델에서 카메라 행렬은 월드 좌표에서 픽셀 좌표로 투영 매핑(projective mapping)을 나타내기 위해 사용된다.

$ z_{c}\begin{bmatrix}
u\\
v\\
1
\end{bmatrix} = K\begin{bmatrix}
R & T
\end{bmatrix}\begin{bmatrix}
x_{w}\\
y_{w}\\
z_{w}\\
1
\end{bmatrix} $

내부 매개변수(Intrinsic parameters)

내부 행렬(Intrinsic matrix) K는 아래와 같이 5개의 내부 매개변수를 포함하고 있다.

$ K = \begin{bmatrix}
a_{x} & \gamma  & u_{0} & 0\\
0 & a_{y} & v_{0} & 0\\
0 & 0 & 1 & 0 \\
\end{bmatrix} $

5개의 매개 변수는 초점 거리(focal length), 이미지 센서 포맷(image sensor format), 그리고 주점(principal point)이다.

$ a_{x}= f\cdot m_{x} $ 와 $ a_{y}= f\cdot m_{y} $ 는 픽셀 단위의 초점 거리를 뜻한다. 여기서 $ m_{x} $ 와 $ m_{y} $ 는 거리와 픽셀의 관계와 관련된 스케일 계수(scale factor)이고, $ f $는 거리 단위의 초점 거리를 뜻한다.

또한 $ \gamma $ 는 $ x $ 축과 $ y $ 축 사이의 뒤틀림 계수(skew coefficient)를 의미하며, 종종 0이 된다.

$ u_{0} $ 및 $ v_{0} $은 주점을 나타내고, 이상적인 경우 이미지의 중심이 주점이 된다.

외부 매개변수(Extrinsic parameters)

$ \begin{bmatrix}
R_{3\times 3} & T_{3\times 1}\\
0_{1\times 3} & 1
\end{bmatrix}_{4\times 4} $

카메라 외부 파라미터는 카메라 좌표게와 월드 좌표계 사이의 변환 관계를 설명하는 파라미터로서, 두 좌표계 사이의 회전(R, Rotation) 및 평행 이동(T, Translation) 변환으로 표현된다. 카메라 외부 파라미터를 구하기 위해서는 먼저 캘리브레이션 툴 등을 이용하여 카메라 고유의 내부 파라미터들을 구한다. 그 다음 미리 알고있는 또는 샘플로 뽑은 3차원 월드 좌표-2차원 영상좌표 매칭 쌍들을 이용하여 변환 행렬을 구하면 된다.

$ R $, $ T $는 3차원 월드 좌표에서 3차원 카메라 좌표로 좌표계 변환을 정의하는 외부 매개변수이다. 또한 외부 매개변수는 월드 좌표에서 카메라 중심의 위치와 카메라 방향을 정의한다. $ T $ 는 카메라-중심 좌표계(camera-centered coordinate system)의 좌표로 표현된 월드 좌표계의 원점의 위치이다. 이는 때때로 카메라 위치로 잘못 생각되기도 한다. 월드 좌표에서 표현된 카메라 위치 $ C $ 는 $ R $ 이 회전행렬(rotation matrix)일 때 아래와 같이 표현된다.

$ C = -R^{-1}T = -R^{t}T $

카메라 캘리브레이션 하는 방법은 여러 방법이 있지만, 아래 사이트도 시도해 볼 만 하다.

wiki.ros.org/camera_calibration/Tutorials/MonocularCalibration

camera_calibration/Tutorials/MonocularCalibration - ROS Wiki

Note: This tutorial assumes that you have completed the previous tutorials: ROS Tutorials. Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of y

wiki.ros.org

어떤 느낌인지 그림으로 살펴보면 이해가 좀 쉽다...

3차원 인간 자세 추정에서의 카메라 캘리브레이션

3D Human Pose Estimation 분야에서 추정된 3차원 자세 좌표를 2차원 상으로 매핑 할 때 카메라 캘리브레이션에서의 외부 매개변수(extrinsic parameters)를 사용하게 된다.

쓰임의 예시는 다음과 같다.

아래 두 예시는 딥러닝 기반 3D Human Pose Estimation 방법인 "A simple yet effective baseline for 3d human pose estimaiton" 의 구현의 일부이다. 이 방법은 참고로 3차원 자세 추정 문제를 2차원 자세를 추정 한 뒤 3차원으로 "Lifting" 하는 방법이다.

github.com/una-dinosauria/3d-pose-baseline

una-dinosauria/3d-pose-baseline

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17. - una-dinosauria/3d-pose-baseline

github.com

def world_to_camera_frame(P, R, T):
  """
  Convert points from world to camera coordinates
  Args
    P: Nx3 3d points in world coordinates
    R: 3x3 Camera rotation matrix
    T: 3x1 Camera translation parameters
  Returns
    X_cam: Nx3 3d points in camera coordinates
  """

  assert len(P.shape) == 2
  assert P.shape[1] == 3

  X_cam = R.dot( P.T - T ) # rotate and translate

  return X_cam.T

def camera_to_world_frame(P, R, T):
  """Inverse of world_to_camera_frame
  Args
    P: Nx3 points in camera coordinates
    R: 3x3 Camera rotation matrix
    T: 3x1 Camera translation parameters
  Returns
    X_cam: Nx3 points in world coordinates
  """

  assert len(P.shape) == 2
  assert P.shape[1] == 3

  X_cam = R.T.dot( P.T ) + T # rotate and translate

  return X_cam.T

또한 Lightweight OpenPose + Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB 방법에서도 아래와 같이 카메라 캘리브레이션 외부 매개 변수(extrinsics parameters)를 통해 3차원 자세 좌표를 rotation 시키고 있다.

github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch

Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch

Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU. - Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch

github.com

def rotate_poses(poses_3d, R, t):
    R_inv = np.linalg.inv(R)
    for pose_id in range(len(poses_3d)):
        pose_3d = poses_3d[pose_id].reshape((-1, 4)).transpose()
        pose_3d[0:3, :] = np.dot(R_inv, pose_3d[0:3, :] - t)
        poses_3d[pose_id] = pose_3d.transpose().reshape(-1)

    return poses_3d

참고자료 1 : en.wikipedia.org/wiki/Camera_resectioning

Camera resectioning - Wikipedia

This article is about the classic camera calibration. For calibration without any special objects in the scene, see Camera auto-calibration. Camera resectioning is the process of estimating the parameters of a pinhole camera model approximating the camera

en.wikipedia.org

참고자료 2 : darkpgmr.tistory.com/32

카메라 캘리브레이션 (Camera Calibration)

카메라 캘리브레이션 (camera calibration)은 영상처리, 컴퓨터 비전 분야에서 번거롭지만 꼭 필요한 과정중의 하나입니다. 본 포스팅에서는 카메라 캘리브레이션의 개념, 카메라 내부 파라미터, 외

darkpgmr.tistory.com

참고자료 3 : www.cs.cmu.edu/~16385/s17/Slides/11.3_Pose_Estimation.pdf

참고자료 4 : en.wikipedia.org/wiki/Pinhole_camera_model

Pinhole camera model - Wikipedia

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lens

en.wikipedia.org

728x90

저작자표시 비영리

'AI Research Topic > 3D Reconstruction' 카테고리의 다른 글

[3D Reconstruction] CS231A, Computer Vision, From 3D Reconstruction to Recognition (0)	2022.07.16
[Stereo Vision] Kinect v1 과 Kinect v2 비교 (0)	2017.05.24
[Stereo Vision] kinect 설치하기 (Visual Studio 2015, Windows 10, 64 bit, usb 3.0) (0)	2017.01.19
[3D Reconstruction] Trifocal tensor 정리 (0)	2016.12.27
[Stereo Vision] Disparity Map 생성 (12)	2016.11.04