Complex Model Unit
==================

The complex model unit is used for inferring complex models that perform both detection and determine model attributes simultaneously, i.e.,

.. code-block:: text

    complex model = detection model + attribute model

This element combines both the detection and attribute model units. Below is an example of defining such a unit for inferring a model that detects faces and simultaneously finds facial keypoints.

.. code-block:: yaml

    - element: nvinfer@complex_model
      name: face_detector
      model:
        format: onnx
        onnx-file: retinaface_resnet50.onnx
        batch-size: 16
        precision: fp16
        input:
          object: person_detector.person
          shape: [3, 192, 192]
          offsets: [104.0, 117.0, 123.0]
        output:
          layer_names: ['bboxes', 'scores', 'landmarks']
          converter:
            module: customer_analysis.retinaface_converter
            class_name: RetinafaceConverter
          objects:
            - class_id: 0
              label: face
              selector:
                module: savant.selector.detector
                class_name: BBoxSelector
                kwargs:
                  confidence_threshold: 0.991
                  nms_iou_threshold: 0.4
                  min_height: 70
                  min_width: 90
          attributes:
            - name: landmarks

We will not describe the parameters for the input section, as they are similar to those described in :doc:`30_dm`. The output section is of particular interest, we specify both the ``objects`` section (described in the :doc:`30_dm`) and the ``attributes`` section (described in the :doc:`43_am`).

The converter must be implemented by specifying :py:class:`~savant.deepstream.nvinfer.model.BaseComplexModelOutputConverter` as the parent class. The converter for this example is provided below.

.. code-block:: python

    class RetinafaceConverter(BaseComplexModelOutputConverter):
        def __call__(
            self,
            *output_layers: np.ndarray,
            model: ComplexModel,
            roi: Tuple[float, float, float, float]
        ) -> Tuple[np.ndarray, List[List[Tuple[Any, float]]]]:
            """Converts raw model output tensors to savant format.

            :param output_layers: Model output layer tensors
            :param model: Complex model, required parameters: input tensor shape, maintain_aspect_ratio flag
            :param roi_width: width of the rectangle on which the model infers
            :param roi_height: height of the rectangle on which the model infers
            :return: BBox tensor BBox tensor (class_id, confidence, xc, yc, width, height, [angle])
                offset by roi upper left and scaled by roi width and height,
                and list of attributes values with confidences
            """

            bboxes, scores, landmarks = detector_decoder(
                roi,
                *output_layers,  # bboxes  # scores # landmarks
            )

            bbox_tensor = np.concatenate(
                (
                    np.zeros((len(bboxes), 1)),
                    scores.reshape(-1, 1),
                    bboxes,
                ),
                axis=1,
            )

            attrs = [[(model.output.attributes[0].name, x.tolist(), None)] for x in landmarks]
            return bbox_tensor, attrs

The model used in the example has three outputs. Two are related to detections, and the third returns the coordinates of the facial keypoints for the detected face. The converter processes the first two outputs with the names ``bboxes`` and ``scores`` to obtain the boxes, while the third output with the name ``landmarks`` returns the keypoints, which are returned as attributes for each detected object. Note that the number of boxes and the length of the attribute list for each box must match.

The ``detector_decoder`` is a separate function specifically written to process the outputs of the RetinaNet model and is not provided here, as it does not affect the overall understanding of the principles of writing converters.