[NVIDIA TAO Toolkit] TAO Toolkit 개요

Development & Tools/Frameworks & Libraries

[NVIDIA TAO Toolkit] TAO Toolkit 개요

꾸준희

|2022. 5. 3. 01:00

728x90

NVIDIA TAO Toolkit을 이용하여 사전 훈련된 NVIDIA 모델에 custom dataset을 적용하여 Computer Vision(이하 CV) 모델을 만들거나 Conversational AI(이하 Conv AI) models을 만들 수 있는 툴킷이다. 비전 분야에서는 주로 object detection, image classification, segmentation, keypoint estimation 등의 모델들을 fine-tuning 할 수 있다.

특히 pre-trained 모델에 새로운 클래스를 추가할 수도 있고, 다양한 케이스에 맞게 다시 학습 시킬 수 있으며, TAO Toolkit은 학습과 관련된 hyperparameter들을 수정하여 custom AI model을 생성할 수 있다. 또한 CV 모델에서 model pruning을 통해 모델의 전체 크기를 줄일 수도 있다. 참고로 pruning은 모델의 전체 정확도에 덜 기여하는 노드를 제거하여 모델의 전체 크기를 크게 줄이고, 메모리 공간을 줄여 추론 처리량을 높일 수 있다고한다. 아래 그림에 따르면 TrafficCamNet, DashCamNet, PeopleNet에서 Unpruned model과 Pruned model의 Inference Throughput(FPS)를 확인 할 수 있는데 거의 반 정도 차이가 난다. 참고로 FPS는 Frame per second로서 초당 처리하는 frame을 뜻하며 값이 높을 수록 처리량이 좋은 것이다.

TAO Toolkit은 NVIDIA Python Package Index에서 호스팅되는 Python Package라고 한다. NVIDIA GPU Accelerated Container Registry(NGC)에서 사용할 수 있는 낮은 수준의 TAO docker와 상호작용하며, 학습하는데 필요한 dependency들이 설치되어있다고 한다. TAO workflow의 결과는 DeepStream, TensorRT, Riva, TAO CV Inference Pipeline을 사용하여 NVIDIA 장치에서 추론을 위하여 배포할 수 있는 모델이다.

TAO application layer는 아래 그림과 같이 모든 하위 수준 NVIDIA 라이브러리를 포함하는 CUDA-X 위에 built 되며, 여기에는 NVIDIA Container Runtime, CUDA, cuDNN, TensorRT 등이 포함된다.

그리고 TAO Toolkit에서는 두 가지 유형의 pre-trained model을 제공한다.

1. Purpose-built pre-trained model

이는 specific task에 대해 수 천개의 입력에 대해 학습된 매우 정확한 모델이며, 이러한 도메인 중심 모델은 바로 추론에 사용하거나 custom dataset에 대한 transfer learning 학습을 위해 TAO Toolkit과 함께 사용할 수 있다. 즉, 도메인에 따라 높은 정확도를 달성하게끔 학습시켜놓은 모델이라고 보면 된다. (e.g. PeopleNet, ... )

2. General purpose vision model

이는 복잡한 모델을 구축하기 위한 starting point 같은 역할을 하는 모델이다. Backbone 및 model architecture에 대한 pre-trained model 이라고 생각하면 된다. (e.g. EfficientNet, YOLO v3/v4 ... )이는 open image dataset에서 학습되며 weight를 초기화 하여 학습을 처음부터 진행하는 것 보다 더 나은 starting point를 제공한다.

아래와 같이 수 많은 분야의 pre-trained models, general purpose model architecture (CV and Conv AI) 등을 제공한다.

각 도메인에 맞게 학습된 모델들의 특징 및 성능은 다음과 같다.

Model Name	Network Architecture	Number of classes	Accuracy	Use Case
TrafficCamNet	DetectNet_v2-ResNet18	4	84% mAP	Detect and track cars.
PeopleNet	DetectNet_v2-ResNet18/34	3	84% mAP	People counting, heatmap generation, social distancing.
DashCamNet	DetectNet_v2-ResNet18	4	80% mAP	Identify objects from a moving object.
FaceDetectIR	DetectNet_v2-ResNet18	1	96% mAP	Detect face in a dark environment with IR camera.
VehicleMakeNet	ResNet18	20	91% mAP	Classifying car models.
VehicleTypeNet	ResNet18	6	96% mAP	Classifying type of cars as coupe, sedan, truck, etc.
PeopleSegNet	MaskRCNN-ResNet50	1	85% mAP	Creates segmentation masks around people, provides pixel
PeopleSemSegNet	Vanilla Unet Dynamic	2	92% mIOU	Creates semantic segmentation masks for people.
License Plate Detection	DetectNet_v2-ResNet18	1	98% mAP	Detecting and localizing License plates on vehicles
License Plate Recognition	Tuned ResNet18	36(US) / 68(CH)	97%(US)/99%(CH)	Recognize License plates numbers
Gaze Estimation	Four branch AlexNet based model	NA	6.5 RMSE	Detects person’s eye gaze
Facial Landmark	Recombinator networks	NA	6.1 pixel error	Estimates key points on person’s face
Heart Rate Estimation	Two branch model with attention	NA	0.7 BPM	Estimates person’s heartrate from RGB video
Gesture Recognition	ResNet18	6	0.85 F1 score	Recognize hand gestures
Emotion Recognition	5 Fully Connected Layers	6	0.91 F1 score	Recognize facial Emotion
FaceDetect	DetectNet_v2-ResNet18	1	85.3 mAP	Detect faces from RGB or grayscale image
BodyPoseNet	Single shot bottom-up	18	56.1% mAP*	Estimates body key points for persons in the image

* New in TAO Toolkit 3.0-21.08 GA

참고로 BodyPoseNet의 정확도는 COCO 데이터세트로 학습된 모델로 측정되었다고 한다.

그리고 각 성능에 대한 trtexec tool로 측정한 결과는 다음과 같다.

Backbone에 대해 pre-trained models 에 대한 정보는 아래와 같다. backbone 및 model architecture에 대한 조합을 보고 사용하면 될 것 같다.

TAO Computer Vision Workflow Overview

CV 분야에서 TAO Toolkit의 Workflow는 다음 그림과 같으며, 요약하면 아래와 같다.

1. pre-trained model 및 데이터세트 준비(w/ augmentation)

2. training (w/ hyperparameter)

3. model evaluation

4. 3번 과정에서 정확도가 높다고 판단되면 model pruning (only CV 모델에서만 지원, Conv AI 모델은 이후 작업 X)

정확도가 높지 않다고 판단되면 hyperparameter를 다시 조정하여 정확도가 높아질 때 까지 학습 반복

5. re-train (pruning은 정확도를 감소 시킬수도 있기 때문에 이를 복구하기 위해 동일한 데이터세트에서 다시 학습)

6. model evaluation

7. 정확도가 pruning 하기 전과 같아진다면 model export

참고로 exported model은 DeepStream, TensorRT 등을 사용하여 모든 NVIDIA GPU에 배포할 수 있는 '.etlt' 형식으로 이루어진다. 참고로 etlt 형식은 tensorrt engine으로 변환 가능 하지만 onnx로는 변환이 안된다고 한다.

또한 export 단계에서 quantization 할 때 INT8 calibration을 생성할 수도 있다. 이는 정확도를 저하시키지 않으면서 FP16 및 FP32 정밀도 보다 2배 이상의 성능을 제공한다고 한다.

(음.. 원래 INT8 정밀도는 FP16, FP32 보다 정확도 손실이 분명 있었는데 ... 모델 가지치기 과정을 통해 정확도에 영향을 미치치 않는 노드를 제거했기 때문에 INT8로 양자화 했을 때 정확도에 영향이 없는 것인가? 😵)

사용자는 NGC의 pre-trained model과 custom dataset을 입력으로 하며, 이 데이터세트는 data converter에 제공된다. data converter에서는 data augmentation이 가능하며, 이는 전체 품질을 향상시키고 overfitting을 방지하기 때문에 학습에서 매우 중요하다. 👍 또한 사용자들은 학습하기 전에 offiline data augmentation도 수행할 수 있다고 한다. 대략 느낌만 보자면 아래와 같이 config에서 값을 바꾸거나 설정을 추가하여 augmentation 할 수 있다.

# Spec file for augment.
spatial_config{
  rotation_config{
    angle: 5.0
    units: "degrees"
  }
  shear_config{
    shear_ratio_x: 0.3
  }
  translation_config{
    translate_x: 8
  }
}
color_config{
  hue_saturation_config{
    hue_rotation_angle: 25.0
    saturation_shift: 1.0
  }
}
# Setting up dataset config.
dataset_config{
  image_path: "image_2"
  label_path: "label_2"
}
output_image_width: 1248
output_image_height: 384
output_image_channel: 3
image_extension: ".png"

TAO Conversational AI Workflow Overview

Conv AI의 workflow는 다음과 같다. 위에서 설명했듯이 여기서는 pruning을 사용할 수 없다는 점과 re-train 과정에 finetune이라는 명령어가 있다는 점을 제외하고는 CV workflow와 유사하다. 또한 export 하는 방식도 살짝 다르다. 사용자는 Riva에 배포할 수 있는 .riva 파일 export를 선택할 수 있으며 infer_onnx 명령에서 사용할 수 있는 .eonnx 파일을 내보낸다. 현재 Conv AI 모델을 TensorRT engine으로 export하는 것은 지원하지 않는다고 한다.

TAO 사용방법에 대한 자세한 사항은 각 도메인 별 가이드를 제공하는 아래 블로그들을 참고하길 바란다.

Learn to Train with PeopleNet and other pre-trained model using TAO.
Learn how to train Instance segmentation model using MaskRCNN with TAO.
Learn how to improve INT8 accuracy using Quantization aware training(QAT) with TAO.
Learn how to Create a real time license plate detection and recognition app
Learn how to Prepare state of the art models for classification and object detection with TAO
Learn more on Building and Deploying Conversational AI models Using the NVIDIA TAO Toolkit
Learn how to train and optimize 2D body pose estimation model with TAO - 2D Pose Estimation Part 1 | 2D Pose Estimation Part 2.

그리고 FAQ 들도 한번 읽어보면 좋을 것 같다. 난 FAQ를 꼭 읽는 편.. 😊

https://docs.nvidia.com/tao/tao-toolkit/text/faqs.html#

Frequently Asked Questions — TAO Toolkit 3.22.02 documentation

docs.nvidia.com

참고자료 1 : https://docs.nvidia.com/tao/tao-toolkit/text/overview.html

Overview — TAO Toolkit 3.22.02 documentation

TAO Toolkit users create custom AI models by modifying the training hyperparameters defined in each spec file. This guide includes sample spec files and paramter definitions for all models supported by TAO. Refer to the corresponding Computer Vision or Con

docs.nvidia.com

참고자료 2 : https://forums.developer.nvidia.com/t/convert-rn-tlt-to-onnx/181237

Convert RN TLT to onnx

Hello, Please is there a way to convert (.tlt / .etlt / .engine) to ONNX for embedded in the jetson_inference soft

forums.developer.nvidia.com

728x90

저작자표시 비영리 (새창열림)

'Development & Tools > Frameworks & Libraries' 카테고리의 다른 글

[TFLite] TensorFlow Lite 개념 (0)	2023.01.18
[ONNX] Brevitas, QAT 모델을 Standard ONNX 모델로 생성하는 라이브러리 (0)	2022.07.04
[PyTorch] PyTorch 모델을 저장하는 방법 및 고려해야할 점 (2)	2022.03.23
[TensorRT] trtexec dumpProfile (0)	2022.03.22
[NVIDIA TAO Toolkit] TAO(Train, Adapt, and Optimize) Toolkit (0)	2022.03.15