Detection Unit ============== To create a detector unit the developer have to write the ``nvinfer@detector`` declaration like it's shown in the following listing: .. code-block:: yaml - element: nvinfer@detector name: Primary_Detector model: format: caffe model_file: resnet10.caffemodel batch_size: 1 precision: int8 int8_calib_file: cal_trt.bin label_file: labels.txt input: scale_factor: 0.0039215697906911373 output: num_detected_classes: 4 layer_names: [conv2d_bbox, conv2d_cov/Sigmoid] Full documentation for the detector unit is located on the unit page: :py:class:`~savant.deepstream.nvinfer.model.NvInferDetector`. You must configure model parameters, input and output. The parameters also may be specified in the `model.config_file` file according to Nvidia DeepStream specification for a detector (`example `_). YAML-based parameters override those configured in `model.config_file`. Batch Size ---------- The batch size is directly connected with inference performance. When handling multiple streams consider setting ``batch_size`` equal to maximum expected number of streams. For Jetson family Nvidia often uses the batch size equal to ``32`` in their benchmarks. The batching is a sophisticated topic: there is no silver bullet for selecting a batch size, especially when inference happens on the second and later steps; experiment with your values to achieve the best possible performance and GPU utilization. Unit Input ---------- The ``input`` block describes how the unit preprocesses the data and selects objects which are the subject of operation. All the ``input`` parameters can be found in the specification for :py:class:`~savant.deepstream.nvinfer.model.NvInferModelInput`. Object Selection ^^^^^^^^^^^^^^^^ ``Input`` defines several parameters which are important to understand. However, here we would like to highlight the ``object`` parameter which specifies a label for objects selected for inference. If the ``object`` parameter is not set, it is assumed to be equal to ``frame``: default object with ROI covering the whole frame (without paddings added). The primary detectors usually work with the default ``object: frame``, while secondary detectors specify labels to select objects of interest. There are other parameters in the ``input`` section which select objects, like ``object_min_width``, ``object_min_height`` and others. Please, keep in mind that the selected objects are passed to the model, and those not selected are ignored, but they continue to persist in metadata, so downstream elements can consume them later. Imagine the situation when you have 2 models: the first one works well with medium-size objects but is very heavyweight while the second works perfectly with large objects and is lightweight. To efficiently utilize resources you may place two inference elements with properly defined input blocks: .. code-block:: yaml - element: nvinfer@detector name: Secondary_Detector_Small model: format: caffe model_file: model1 input: object_min_height: 30 object_max_height: 100 ... - element: nvinfer@detector name: Secondary_Detector_Large model: format: caffe model_file: model2 input: object_min_height: 101 Custom Preprocessing ^^^^^^^^^^^^^^^^^^^^ ``Input`` also allows making custom preprocessing with ``preprocess_object_meta`` and ``preprocess_object_tensor``. To use the former, you have to implement code with the interface :py:class:`~savant.base.input_preproc.BasePreprocessObjectMeta`, the later is an advanced topic which is not covered here. Keep in mind, that all modifications made in preprocess are restored after the unit is done with its processing. Example of preprocess_object_meta: .. code-block:: yaml input: object: object_detector.something preprocess_object_meta: module: something_detector.input_preproc class_name: TopCrop .. literalinclude:: ../../../savant/input_preproc/crop.py :language: Python Unit Output ----------- The ``output`` section describes how the unit processes metadata before passing them to the following unit. The parameters of ``output`` may be found in the specification for :py:class:`~savant.deepstream.nvinfer.model.NvInferObjectModelOutput`. Layer Names ^^^^^^^^^^^ The parameter ``layer_names`` defines the names of the model output layers returned for postprocessing. Converter ^^^^^^^^^ ``Output`` defines an important parameter ``converter`` which is basically a method which makes bounding boxes from a raw tensor. For "standard" detection models supported by DeepStream ``converter`` parameter is not required, however if the model's output cannot be parsed automatically, you have to provide an implementation of :py:class:`~savant.base.converter.BaseObjectModelOutputConverter` to produce boxes for detected objects. Example: .. code-block:: yaml converter: module: savant.converter.yolo_x class_name: TensorToBBoxConverter kwargs: decode: true The converter implementation can be found in the class :py:class:`~savant.converter.yolo_x.TensorToBBoxConverter`. An example of the converter for YOLOv4 listed below. The YOLOv4 model has two output layers: the first represents box definition (incl. ``class_id``), the last is for confidence. When you are writing the converter you must return objects relative to the ROI of the parent object. .. code-block:: python class TensorToBBoxConverter(BaseObjectModelOutputConverter): """`YOLOv4 `_ output to bbox converter.""" def __call__( self, *output_layers: np.ndarray, model: ObjectModel, roi: Tuple[float, float, float, float], ) -> np.ndarray: """Converts detector output layer tensor to bbox tensor. :param output_layers: Output layer tensor :param model: Model definition, required parameters: input tensor shape, maintain_aspect_ratio :param roi: [top, left, width, height] of the rectangle on which the model infers :return: BBox tensor (class_id, confidence, xc, yc, width, height, [angle]) offset by roi upper left and scaled by roi width and height """ boxes, confs = output_layers roi_left, roi_top, roi_width, roi_height = roi # [num, 1, 4] -> [num, 4] bboxes = np.squeeze(boxes) # left, top, right, bottom => xc, yc, width, height bboxes[:, 2] -= bboxes[:, 0] bboxes[:, 3] -= bboxes[:, 1] bboxes[:, 0] += bboxes[:, 2] / 2 bboxes[:, 1] += bboxes[:, 3] / 2 # scale if model.input.maintain_aspect_ratio: bboxes *= min(roi_width, roi_height) else: bboxes[:, [0, 2]] *= roi_width bboxes[:, [1, 3]] *= roi_height # correct xc, yc bboxes[:, 0] += roi_left bboxes[:, 1] += roi_top # [num, num_classes] --> [num] confidences = np.max(confs, axis=-1) class_ids = np.argmax(confs, axis=-1) return np.concatenate( ( class_ids.reshape(-1, 1).astype(np.float32), confidences.reshape(-1, 1), bboxes, ), axis=1, ) Object Filtering ^^^^^^^^^^^^^^^^ Within ``output`` you may also select only necessary objects by specifying their IDs and labels: .. code-block:: yaml output: layer_names: [output_bbox/BiasAdd, output_cov/Sigmoid] num_detected_classes: 3 objects: - class_id: 0 label: person selector: kwargs: min_width: 32 min_height: 32 - class_id: 2 label: face selector: kwargs: confidence_threshold: 0.1 All skipped classes will be permanently excluded from the next steps of the pipeline. The ``selector`` block also allows defining a filter to eliminate unnecessary objects. If unit name is ``Primary_Detector``, then to address selected objects in the following units use ``Primary_Detector.person`` and ``Primary_Detector.face`` labels. The default selector implementation runs NMS and allows selecting objects by specifying ``min_width``, ``min_height``, and ``confidence_threshold``. To create a custom ``selector`` you have to implement :py:class:`~savant.base.selector.BaseSelector`. You may take a look at :py:class:`~savant.selector.detector.BBoxSelector` to get an idea of how to write it.