728x90
반응형

 

 

TensorRT 5.0.2의 Python Sample 은 yolov3_onnx, uff_ssd 가 있다고 한다. 

 

제일 중요한 Compatibility 는 다음과 같다. 

 

 

TensorRT 5.0.2.6 Compatibility

  • TensorRT 5.0.2 has been tested with cuDNN 7.3.1.
  • TensorRT 5.0.2 has been tested with TensorFlow 1.9.
  • This TensorRT release supports CUDA 10.0 and CUDA 9.0.
  • CUDA 8.0 and CUDA 9.2 are no longer supported. On Windows only, CUDA 10.0 is supported for TensorRT 5.0.1 RC.

 

 

 

 

 

원문은 아래와 같다. 중요한 것만 빨간색으로 표시해보았다. 

 

TensorRT Release 5.0.2

This is the TensorRT 5.0.2 release notes for Desktop users. This release includes fixes from the previous TensorRT 5.0.x releases as well as the following additional fixes. For previous TensorRT 5.0.x release notes, see TensorRT Release Notes.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

PlatformsAdded support for CentOS 7.5, Ubuntu 18.04, and Windows 10.TuringYou must use CUDA 10.0 or later if you are using a Turing GPU.DLA (Deep Learning Accelerator)The layers supported by DLA are Activation, Concatenation, Convolution, Deconvolution, ElementWise, FullyConnected, LRN, Pooling, and Scale. For layer specific constraints, see DLA Supported Layers. AlexNet, GoogleNet, ResNet-50, and LeNet for MNIST networks have been validated on DLA. Since DLA support is new to this release, it is possible that other CNN networks that have not been validated will not work. Report any failing CNN networks that satisfy the layer constraints by submitting a bug via the NVIDIA Developer website. Ensure you log-in, click on your name in the upper right corner, click My account > My Bugs and select Submit a New Bug.

The trtexec tool can be used to run on DLA with the --useDLACore=N where N is 0 or 1, and --fp16 options. To run the MNIST network on DLA using trtexec, issue: ./trtexec --deploy=data/mnist/mnist.prototxt --output=prob --useDLACore=0 --fp16 --allowGPUFallback

trtexec does not support ONNX models on DLA.

Redesigned Python APIThe Python API has gone through a thorough redesign to bring the API up to modern Python standards. This fixed multiple issues, including making it possible to support serialization via the Python API. Python samples using the new API include parser samples for ResNet-50, a Network API sample for MNIST, a plugin sample using Caffe, and an end-to-end sample using TensorFlow.INT8Support has been added for user-defined INT8 scales, using the new ITensor::setDynamicRange function. This makes it possible to define dynamic range for INT8 tensors without the need for a calibration data set. setDynamicRange currently supports only symmetric quantization. A user must either supply a dynamic range for each tensor or use the calibrator interface to take advantage of INT8 support.Plugin RegistryA new searchable plugin registry, IPluginRegistry, is a single registration point for all plugins in an application and is used to find plugin implementations during deserialization.C++ SamplessampleSSDThis sample demonstrates how to perform inference on the Caffe SSD network in TensorRT, use TensorRT plugins to speed up inference, and perform INT8 calibration on an SSD network. To generate the required prototxt file for this sample, perform the following steps:

  1. Download models_VGGNet_VOC0712_SSD_300x300.tar.gz from: https://drive.google.com/file/d/0BzKzrI_SkD1_WVVTSmQxU0dVRzA/view
  2. Extract the contents of the tar file;tar xvf ~/Downloads/models_VGGNet_VOC0712_SSD_300x300.tar.gz
  3. Edit the deploy.prototxt file and change all the Flatten layers to Reshape operations with the following parameters:reshape_param { shape { dim: 0 dim: -1 dim: 1 dim: 1 }
  4. Update the detection_out layer by adding the keep_count output, for example, add:top: "keep_count"
  5. Rename the deploy.prototxt file to ssd.prototxt and run the sample.
  6. To run the sample in INT8 mode, install Pillow first by issuing the $ pip install Pillow command, then follow the instructions from the README.

sampleINT8APIThis sample demonstrates how to perform INT8 Inference using per-tensor dynamic range. To generate the required input data files for this sample, perform the following steps:

Running the sample:

  1. Download the Model files from GitHub, for example:wget https://s3.amazonaws.com/download.onnx/models/opset_3/resnet50.tar.gz
  2. Unzip the tar file:tar -xvzf resnet50.tar.gz
  3. Rename resnet50/model.onnx to resnet50/resnet50.onnx, then copy the resnet50.onnx file to the data/int8_api directory.
  4. Run the sample:./sample_int8_api [-v or --verbose]

Running the sample with a custom configuration:

  1. Download the Model files from GitHub.
  2. Create an input image with a PPM extension. Resize it with the dimensions of 224x224x3.
  3. Create a file called reference_labels.txt. Ensure each line corresponds to a single imagenet label. You can download the imagenet 1000 class human readable labels from here. The reference label file contains only a single label name per line, for example, 0:'tench, Tinca tinca' is represented as tench.
  4. Create a file called dynamic_ranges.txt. Ensure each line corresponds to the tensor name and floating point dynamic range, for example <tensor_name> : <float dynamic range>. In order to generate tensor names, iterate over the network and generate the tensor names. The dynamic range can either be obtained from training (by measuring the min/max value of activation tensors in each epoch) or using custom post processing techniques (similar to TensorRT calibration). You can also choose to use a dummy per tensor dynamic range to run the sample.

Python Samplesyolov3_onnxThis sample demonstrates a full ONNX-based pipeline for inference with the network YOLOv3-608, including pre- and post-processing.uff_ssdThis sample demonstrates a full UFF-based inference pipeline for performing inference with an SSD (InceptionV2 feature extractor) network.IPluginV2A plugin class IPluginV2 has been added together with a corresponding IPluginV2 layer. The IPluginV2 class includes similar methods to IPlugin and IPluginExt, so if your plugin implemented IPluginExt previously, you will change the class name to IPluginV2. The IPlugin and IPluginExt interfaces are to be deprecated in the future, therefore, moving to theIPluginV2 interface for this release is strongly recommended.See the TensorRT Developer Guide for details.

Breaking API Changes

  • The choice of which DLA core to run a layer on is now made at runtime. You can select the device type at build time, using the following methods:IBuilder::setDeviceType(ILayer* layer, DeviceType deviceType) IBuilder::setDefaultDeviceType(DeviceType deviceType) where DeviceType is: { kGPU, //!< GPU Device kDLA, //!< DLA Core };

    The specific DLA core to execute the engine on can be set by the following methods:IBuilder::setDLACore(int dlaCore) IRuntime::setDLACore(int dlaCore)

    The following methods have been added to get the DLA core set on IBuilder or IRuntime objects:int IBuilder::getDLACore() int IRuntime::getDLACore()

    Another API has been added to query the number of accessible DLA cores as follows:int IBuilder::getNbDLACores() Int IRuntime::getNbDLACores()

  • The --useDLA=<int> on trtexec tool has been changed to --useDLACore=<int>, the value can range from 0 to N-1, N being the number of DLA cores. Similarly, to run any sample on DLA, use --useDLACore=<int> instead of --useDLA=<int>.

Compatibility

  • TensorRT 5.0.2 has been tested with cuDNN 7.3.1.
  • TensorRT 5.0.2 has been tested with TensorFlow 1.9.
  • This TensorRT release supports CUDA 10.0 and CUDA 9.0. CUDA 8.0 and CUDA 9.2 are no longer supported. On Windows only, CUDA 10.0 is supported for TensorRT 5.0.1 RC.

Limitations In 5.0.2

  • TensorRT 5.0.2 does not include support for DLA with the INT8 data type. Only DLA with the FP16 data type is supported by TensorRT at this time. DLA with INT8 support is planned for a future TensorRT release.
  • Android is not supported in TensorRT 5.0.2.
  • The Python API is only supported on x86-based Linux platforms.
  • The create*Plugin functions in the NvInferPlugin.h file do not have Python bindings.
  • ONNX models are not supported on DLA in TensorRT 5.0.2.
  • The included resnet_v1_152, resnet_v1_50, lenet5, and vgg19 UFF files do not support FP16 mode. This is because some of the weights fall outside the range of FP16.
  • The ONNX parser is not supported on Windows 10. This includes all samples which depend on the ONNX parser. ONNX support will be added in a future release.
  • Tensor Cores supporting INT4 were first introduced with Turing GPUs. This release of TensorRT 5.0 does not support INT4.
  • The yolov3_onnx Python sample is not supported on Ubuntu 14.04 and earlier.
  • The uff_ssd sample requires tensorflow-gpu for performing validation only. Other parts of the sample can use the CPU version of tensorflow.
  • The Leaky ReLU plugin (LReLU_TRT) allows for only a parameterized slope on a per tensor basis.

Deprecated Features

The following features are deprecated in TensorRT 5.0.2:

  • The majority of the old Python API, including the Lite and Utils API, are deprecated. It is currently still accessible in the tensorrt.legacy package, but will be removed in a future release.
  • The following Python examples are deprecated:
    • caffe_to_trt
    • pytorch_to_trt
    • tf_to_trt
    • onnx_mnist
    • uff_mnist
    • mnist_api
    • sample_onnx
    • googlenet
    • custom_layers
    • lite_examples
    • resnet_as_a_service
  • The detectionOutput Plugin has been renamed to the NMS Plugin.
  • The old ONNX parser will no longer be packaged with TensorRT; instead, use the open-source ONNX parser.
  • The DimensionTypes class is deprecated.
  • The plugin APIs that return INvPlugin are being deprecated and they now return IPluginV2. These APIs will be removed in a future release. Refer to NvInferPlugin.h inside the TensorRT package.
  • The nvinfer1::IPluginFactory, nvuffparser1::IPluginFactory, and nvuffparser1::IPluginFactoryExt plugins are still available for backward compatibility. However, it is still recommended to use the Plugin Registry and implement IPluginCreator for all new plugins.
  • The libnvinfer.a, libnvinfer_plugin.a, and libnvparsers.a libraries have been renamed to libnvinfer_static.a, libnvinfer_plugin_static.a, andlibnvparsers_static.a respectively. This makes TensorRT consistent with CUDA, cuDNN, and other NVIDIA software libraries. It also avoids some ambiguity between dynamic and static libraries during linking.

Known Issues

  • Only AlexNet, GoogleNet, ResNet-50, and MNIST are known to work with DLA. Other networks may work, but they have not been extensively tested.
  • For this TensorRT release, there are separate JetPack L4T and Drive D5L packages due to differences in the DLA library dependencies. In a future release, this should become unified.
  • The static library libnvparsers_static.a requires a special build of protobuf to complete static linking. Due to filename conflicts with the official protobuf packages, these additional libraries are only included in the tar file at this time. The two additional libraries that you will need to link against are libprotobuf.a and libprotobuf-lite.a from the tar file.
  • The ONNX static libraries libnvonnxparser_static.a and libnvonnxparser_runtime_static.a require static libraries that are missing from the package in order to complete static linking. The two static libraries that are required to complete linking are libonnx_proto.a and libnvonnxparser_plugin.a, as well as the protobuf libraries mentioned earlier. You will need to build these two missing static libraries from the open source ONNX project. This issue will be resolved in a future release.
  • The C++ API documentation is not included in the TensorRT zip file. Refer to the online documentation if you want to view the TensorRT C++ API.
  • Most README files that are included with the samples assume that you are working on a Linux workstation. If you are using Windows and do not have access to a Linux system with an NVIDIA GPU, then you can try using VirtualBox to create a virtual machine based on Ubuntu. Many samples do not require any training, therefore the CPU versions of TensorFlow and PyTorch are enough to complete the samples.
  • The TensorRT Developer Guide has been written with Linux users in mind. Windows specific instructions, where possible, will be added in a future revision of the document.
  • If sampleMovieLensMPS crashes before completing execution, an artifact (/dev/shm/sem.engine_built) will not be properly destroyed. If the sample complains about being unable to create a semaphore, remove the artifact by running rm /dev/shm/sem.engine_built.
  • To create a valid UFF file for sampleMovieLensMPS, the correct command is:python convert_to_uff.py sampleMovieLens.pb -p preprocess.pywhere preprocess.py is a script that is shipped with sampleMovieLens. Do not use the command specified by the README.
  • The trtexec tool does not currently validate command-line arguments. If you encounter failures, double check the command-line parameters that you provided.

 

 

 

 

 

참고자료 : https://docs.nvidia.com/deeplearning/sdk/tensorrt-release-notes/tensorrt-5.html

 

TensorRT Release Notes :: Deep Learning SDK Documentation

This is the release candidate (RC) for TensorRT 5.0.1 release notes. This release is for Windows users only. It includes several enhancements and improvements compared to the previously released TensorRT 4.0.1. This preview release is for early testing and

docs.nvidia.com

 

 

728x90
반응형