Using DLA on Nvidia Jetson

Nvidia Jetson devices provide dedicated deep learning acceleration units (DLA) allowing using them to run convolutional neural networks on a dedicated hardware. DLA work completely independent of the GPU and can be used to run a separate network in parallel to the GPU. This allows to run particular networks in parallel on the DLA and GPU, which can be beneficial for performance.

However, DLA have their own constraints which are described in Nvidia documentation and usually slower than the GPU; therefore, they are beneficial for low-demand tasks and mostly suit for running small networks in secondary inference tasks.

To use DLA, you need to specify the target device in the YAML config. Use the following syntax for a inference block.

- element: nvinfer@detector
  name: LPDNet
  model:
    remote:
      url: https://api.ngc.nvidia.com/v2/models/nvidia/tao/lpdnet/versions/pruned_v2.2/zip
    format: onnx
    model_file: LPDNet_usa_pruned_tao5.onnx
    precision: int8
    int8_calib_file: usa_cal_8.6.1.bin
    batch_size: 16
    enable_dla: true   # allocate this model on DLA
    use_dla_core: 1    # use DLA core 1
    input:
      ...
    output:
      ...