'AI Development' 카테고리의 글 목록 (3 Page)

[DeepStream] GTC 2020, Certification (Deep Learning for Intelligent Video Analytics)

GTC 2020 (2020/10/6) NVIDIA Deep Learning Institute Deep Learning for Intelligent Video Analytics Certification

AI Development/TensorRT

[CUDA] Supported CUDA level of GPU and card

보다 더 자세한 사항은 맨 아래 링크를 참고하길 바란다. CUDA SDK 1.0 support for compute capability 1.0 – 1.1 (Tesla)[25] CUDA SDK 1.1 support for compute capability 1.0 – 1.1+x (Tesla) CUDA SDK 2.0 support for compute capability 1.0 – 1.1+x (Tesla) CUDA SDK 2.1 – 2.3.1 support for compute capability 1.0 – 1.3 (Tesla)[26][27][28][29] CUDA SDK 3.0 – 3.1 support for compute capability 1.0 – 2.0 (Tesla, Fermi)[30][31] CUD..

AI Development/GPU | CUDA | PyCUDA

[TensorRT] Implicit vs Explicit

Pytorch 및 TensorFlow 등으로 생성된 deploy 모델의 배치 사이즈를 명시적으로 설정하여 TensorRT 모델을 변환 할 때 TensorRT 7 버전 부터 도입된 빌드 설정 값들 즉, Optimization Profiles 기능을 이용하여 모델을 변환하면 더욱 더 최적화 되어 변환 된다. * Optimization Profiles : docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#opt_profiles Developer Guide :: NVIDIA Deep Learning TensorRT Documentation To optimize your model for inference, TensorRT takes your ne..

AI Development/TensorRT

[TensorRT] Squeeze + Unsqueeze + expand_as 조합의 Pytorch 모델 사용 시 나타날 수 있는 이슈

환경 Pytorch 1.4.0 TensorRT 7.1.3.4 CUDA 10.2 cuDNN 8.0.0 본 포스팅은, Pytorch 모델 forward 일부 구현에서 squeeze + transpose + unsqueeze + expand_as 조합을 사용하여 Pytorch - ONNX - TensorRT 변환을 수행하였을 때 발생할 수 있는 이슈에 대하여 작성한 글이다. 결과적으로 위와 같은 조합을 이용하여 TensorRT 변환 과정에서 -1 이라는 dynamic 한 변수가 중간에 등장하여 변환 결과가 뒤틀리는 현상이 발생하는 것 같다. 이러한 결과는 (Pytorch 결과 == Onnx 결과) != TensorRT 결과 라는 결론을 짓게 된다. 이는 Output Node 가 여러 개 일 때 극명하게 드러날..

AI Development/TensorRT

[TensorRT] ONNX 에서 TensorRT 변환 시 Upsample scale_factor 문제

Pytorch 모델을 이용하여 ONNX 모델로 변환 후, ONNX 모델을 TensorRT 모델로 변환할 시 아래와 같은 에러가 발생 할 때가 있다. [TensorRT] ERROR: Network must have at least one output [TensorRT] ERROR: Network validation failed. 위와 같은 에러는 ONNX parser 를 통해 Network 를 읽고 나서, TensorRT Engine 으로 변환하는 과정에서 지원하지 않는 노드가 있을 때, Network 를 더 이상 읽지 못하고 결과를 반환하여 최소한의 output 이 있어야 한다고 에러를 내뱉는 것이다. TensorRT 로그를 통해 어디서 끊긴지 찾아보면 되는데, ONNX 모델의 노드들과 TensorRT ..

AI Development/TensorRT

[ONNX] Pytorch 모델을 ONNX 모델로 변환하기

ONNX 모델은 여러 다양한 플랫폼과 하드웨어에서 효율적인 추론을 가능하게 한다. 여기서 하드웨어는 리눅스, 윈도우, 맥 뿐만 아니라 여러 CPU, GPU 등의 하드웨어를 뜻한다. ONNX 모델 변환을 위해 필요한 import 문은 다음과 같다. # 필요한 import문 import io import numpy as np from torch import nn import torch.utils.model_zoo as model_zoo import torch.onnx 예제 모델은 아래에서 소개된 모델을 기반으로 한다. “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Networ..

AI Development/ONNX

[TensorRT] TRT_LOGGER 이용해서 로그 확인하기

TensorRT에서 Engine 을 실행하거나, serialize, deserialize 등을 수행 할 때 Log 를 확인할 수 있는 Logger 가 있다. class tensorrt.Logger(self: tensorrt.tensorrt.Logger, min_severity: tensorrt.tensorrt.Logger.Severity = Severity.WARNING) Logger 는 Builder / ICudaEngine / Runtime 에서 사용 할 수 있으며, 파라미터는 다음과 같다. min_severity : The initial minimum severity of this Logger 어느 정도의 로그까지 띄울 것인지에 대한 파라미터이다. 아래와 같은 식으로 사용할 수 있으며 if args..

AI Development/TensorRT

[TensorRT] Onnx 모델을 위한 Custom Plugin 구현 (작성중)

(2020/08/03) 계속 업데이트 중, 구현해보고 정리해서 올릴 예정, 지금은 관련된 내용 수집중 * 주의 할 점은 한달 전에 릴리즈된 TensorRT 7.1 버전으로 해야할 듯 TRT_SOURCE/parsers/onnx/ 에는 Split.hpp, ResizeNearest.hpp 등과 같은 많은 onnx plugin 이 존재하며, REGISTER_TENSORRT_PLUGIN 을 통해 시스템에 자동으로 등록되어 런타임 중에 직접 Onnx 모델을 구문 분석 할 수 있다고 한다. 먼저 가장 쉬운 방법은 builtin_op_importers.cpp 를 이용해서 필요한 함수에 대해 플러그인을 구현하고, onnx parser 를 다시 빌드 하는 것이다. onnx-tensorrt github 에 가서 builti..

AI Development/TensorRT

[Pytorch] 파이토치 시간 측정, How to measure time in PyTorch

Pytorch 에서 CUDA 호출이 비동기식이기 때문에 타이머를 시작 또는 중지 하기 전에 torch.cuda.synchronize() 를 통해 코드를 동기화 시켜주어야 한다. start = torch.cuda.Event(enable_timing=True) end = torch.cuda.Event(enable_timing=True) start.record() z = x + y end.record() # Waits for everything to finish running torch.cuda.synchronize() print(start.elapsed_time(end)) 참고자료 1 : https://discuss.pytorch.org/t/best-way-to-measure-timing/39496 Best..

AI Development/PyTorch

[ONNX] onnx-graphsurgeon 이용하여 plugin 사용하기 - Group Normalization

TensorRT 7.1.2 버전 부터 Group Normalization plugin 을 지원하기 시작했다. 아래 Github 에서 ONNX GraphSurgeon 을 사용할 수 있으며, https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon NVIDIA/TensorRT TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. - NVIDIA/TensorRT github.com python 샘플 코드에 onnx_packnet 을 이용하여 group normalization plugin 을 추가하여 onnx ..

AI Development/ONNX

[ONNX] Netron : ONNX model Visualization

ONNX 모델을 netron Visualization 할 수 있다. https://github.com/lutzroeder/netron lutzroeder/netron Visualizer for neural network, deep learning and machine learning models - lutzroeder/netron github.com Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Core ML (.mlmodel), Caffe (.caffemodel, .prototxt), Caffe2 (predict_net.pb), Darknet (.cfg), MXNet (.model, -symbol.json), Barracuda (.nn..

AI Development/ONNX

[PyCUDA] PyCUDA 2019.1.2 소스 빌드하여 설치하기

CUDA 10.0 에서 CUDA 10.2 로 업데이트 하였을 때, 이미 설치가 되어있었던 pyCUDA 를 import 하였더니 아래와 같은 에러가 떴었다. ImportError: libcurand.so.10.0: cannot open shared object file: No such file or directory 이는 pyCUDA 가 자꾸 삭제된 CUDA 10.0 을 찾는 메세지 였고, linux 명령어인 find / -name "libcurand.so.10.0*" 으로 관련된 파일을 모두 삭제했는데도 불구하고 CUDA 10.0 을 찾는 문제가 발생하였다. 그래서 PyCUDA 를 다시 pip uninstall pycuda 하고 다시 설치하였는데도 불구하고 계속 같은 문제가 발생되어 cuda 10.2 을..

AI Development/GPU | CUDA | PyCUDA

Prev 1 2 3 4 5 6 7 Next

티스토리툴바