Working With Metadata

Units handle two types of data when the pipeline is running: images and their corresponding metadata. Some of the metadata is read-only, some is modifiable, and some metadata can be deleted or added.

Depending on the unit type, metadata interaction occurs in different ways. For example, in model inference items in the input section, which is responsible for model input data, it is specified which objects will be processed by filtering on the metadata fields element_name and label:

input:
  object: person_detector.person

In this example, we indicated to the unit that of all the objects that exist on the frame, we want to select those objects that have element_name==person_detector and label==person in their metadata.

Similarly, the output section specifies how the metadata will be written to the model results, such as attribute names, and can filter based on these values to exclude some of the metadata from the results.

You can obtain full access to all metadata in the Python Function unit. In this element, by implementing the processing you need, you can read the metadata for the frame, and object metadata, change metadata if it is writable, and delete or add new metadata.

For example, you may want to remove all objects that belong to red-colored cars without an identified car plate or make the areas with license plates blurred.

Let’s first understand what metadata categories exist and then describe the API for interacting with them. Different metadata have different access restrictions: some metadata can only be read, others can be further modified by writing new values, and some can only be extended, i.e. you cannot delete or change values, but you can extend, and there are also metadata that you can delete.

There are three types of metadata:

metadata for the entire frame;
metadata for objects on the frame;
metadata for object attributes.

Entire Frame Metadata

Let us first look at what metadata is defined for the frame. The access restrictions for metadata are shown in parentheses:

The source_id [read] attribute is a unique identifier of the video stream source. With this identifier, you can understand which source frame the metadata you receive belongs to. Most often, this identifier is used to be able to store some state, which must be unique for each video stream. In the TrafficMeter example, this property was used to separate the counting of people for different video streams (link).
The frame_num [read] attribute is a frame number for a particular source.
The roi [read, write] attribute stores meta-information about the region of the image that serves as the default input area for the detection units, if no object is specified for the area where detection will be done.
The objects_number [read] represents the total number of objects on the frame.
The tags [read, extend] attribute represents additional tags with information about the frame. The information is represented as an extensible dictionary. These tags can store a variety of information. For example, if you use standard video file-sending adapters, then the relative path of the video file will be available in tags by the key ‘location’. Or if you write a method defining lightness level on the frame, you can, for example, enter three gradations: light, regular, and dark, and add this information as tags with the key illumination. Then use this information somehow in the pipeline.
The pts [read] attribute stores the presentation timestamp. This is the information from the source video stream timestamp.
The duration [read] represents the duration of the frame. It may not be present, then returns None.
The framerate [read] attribute stores number of frames per second in the source video stream. This meta-information is represented as strings. For example: 20/1.

The second type of metadata is object data. All metadata of this type is a single list that you can iterate over.

Per-Object Metadata

The label [read, write] attribute stores an object’s class. Describes what kind of object it is. For example, a car, a person, a flower, etc.
The track_id [read, write] attribute is a unique object identifier used to track objects. If track_id is equal to the max uint64 value, it means the object is not tracked. This corresponds to the DeepStream’s constant.
The element_name [read] attribute is the name of the unit that added this object. If the object is a result of the detection model, element_name is the name of the unit (the name field of the unit defined in the configuration file). For user-created objects, element_name is a mandatory constructor argument.
The confidence [read] attribute is the numeric value denoting the probability that the object of the class is specified in the label field. Typically set by a detector, in cases described in NvDsObjectMeta a special value -0.1 is possible.
The bbox [read, write] attribute is meta-information about the object’s position on the frame. Object position can be set by two types of boxes: an aligned bounding box (sides of the box are parallel to the coordinate axes) and an oriented bounding box (the box can have an angle). The position is set by the coordinates of the center of the box, the width and height of the box, and the rotation angle if it is an oriented bounding box.
The uid [read] attribute is a unique identifier of the box. This identifier is assigned when adding an object to the list of frame objects and does not change throughout the existence of meta-information about the object, in contrast to the track_id which can change for the object.
The parent [read, write] attribute stores a reference to the parent object. This value can be None, if there is no parent object. The parent reference can be used to associate objects with each other. For example, the model can detect a face only within the area related to the human body, which forms the relationship between the “human” and “face” objects. If the model produces both detections for faces and detections for people at the same time, these objects are on the same hierarchy level and a manual association is required.
The is_primary [read] attribute shows whether this metadata structure describes the main frame object. You can read more about the main frame object later, in the context of associating metadata to each other.

Object Attributes

And the third type of metadata represents object attributes:

The element_name [read, write] is the name of the unit that added this attribute. For example, if the attribute is a result of an attribute model, then element_name is the name of the unit (the name field specified in the configuration file). For attributes created manually by the user, the name of the element is a mandatory constructor argument.
The name [read, write] is the name of the attribute. It is necessary for future access to this attribute, given that one element can add more than one attribute to the object. For the attributes created manually by the user, the attribute name is a mandatory constructor argument.
The value [read, write] is the value of the attribute. The value can be a string, a numeric value, or an array of numeric values. For the attributes created manually by the user, the attribute’s value is a mandatory constructor argument.
The confidence [read, write] is a numeric value with the probability that the attribute for the object is true. It is usually obtained as a result of the attribute model inference. For attributes created manually by user, it is an optional argument (by default 1.0).

The different types of metadata are related to each other. Frame metadata allows access to an iterator on objects on that frame, and object metadata allows a list of attributes of that object.

In addition to this hierarchy, there is also a relationship between the metadata of different objects: an object can have a reference to a parent object located on the frame (the parent property).

In Savant, unlike DeepStream, objects usually have a parent, even if they are objects obtained from the detector inference on the whole frame. For the purpose of flexible application of different models (for example, if you need to specify the region of interest or skip the inference by a user condition), Savant always creates one object on the frame equal to the whole frame; the default class label of such pseudo-object is frame. See also Top-Level ROI.

All pipeline models configured without specifying an input object receive this pseudo-object, also called the primary object, as input. Then, in the case of detectors, the resulting objects will have the frame as a parent by default.

To work with metadata, it is necessary to get a frame metadata iterator in the batch from Gst.Buffer. You can see details on how to do this in the code at the link, but Savant simplifies working with GStreamer/DeepStream structures, so the Python Function unit provides a simple API described below.

Frame metadata is of type NvDsFrameMeta. The objects property gives access to the iterator on the meta-information of objects on that frame. For example,

def process_frame(self, buffer: Gst.Buffer, frame_meta: NvDsFrameMeta):
    for obj_meta in frame_meta.objects:
        # use ObjectMeta API to process object metadata
        pass

The add_obj_meta method of frame metadata allows you to add a new object to the frame. This object will be completely similar to the objects obtained as a result of inference of detection models, i.e., it can serve as an input for subsequent processing steps in the pipeline, including other detection models, attribute models, etc.

def add_obj_meta(self, object_meta: ObjectMeta)

The method remove_obj_meta of frame metadata allows removing the object’s metadata from the metadata list.

def remove_obj_meta(self, object_meta: ObjectMeta)

For example, the remove_obj_meta method can be used to disable the detector inference by some condition by removing the main frame object:

def process_frame(self, buffer: Gst.Buffer, frame_meta: NvDsFrameMeta):
    primary_meta_object = None
    for obj_meta in frame_meta.objects:
        if obj_meta.is_primary:
            primary_meta_object = obj_meta
            break
    condition = True

    if condition and primary_meta_object:
        frame_meta.remove_obj_meta(primary_meta_object)

Object metadata is of ObjectMeta type. Initialization of a new ObjectMeta structure to describe a user object is defined as follows:

def __init__(
    self,
    element_name: str,
    label: str,
    bbox: Union[BBox, RBBox],
    confidence: Optional[float] = DEFAULT_CONFIDENCE,
    track_id: int = UNTRACKED_OBJECT_ID,
    parent: Optional['ObjectMeta'] = None,
    attributes: Optional[List[AttributeMeta]] = None,
)

For the new object, be sure to specify the element_name and label attributes described above, and the bbox structure, defining the object’s position on the frame.

The bbox parameter can be one of the two types described above in bbox. To create an aligned bbox, you must specify the coordinates of the center and the size of the bounding box, for example:

from savant_rs.primitives.geometry import BBox
primary_bbox = BBox(
    xc=400,
    yc=300,
    width=200,
    height=100,
)

To create an oriented bbox, in addition to the coordinates of the center and dimensions, you also need to specify the angle of rotation, given in degrees, for example:

from savant_rs.primitives.geometry import RBBox
primary_bbox = RBBox(
    xc=400,
    yc=300,
    width=200,
    height=100,
    angle=45
)

Thus, an example of adding metadata about a new object to the frame is as follows:

from savant_rs.primitives.geometry import BBox
from savant.deepstream.meta.frame import NvDsFrameMeta
from savant.meta.object import ObjectMeta
def process_frame(self, buffer: Gst.Buffer, frame_meta: NvDsFrameMeta):
    new_obj_meta = ObjectMeta(
        element_name='my_element_name',
        label='my_obj_class_label',
        bbox=BBox(
            xc=400,
            yc=300,
            width=200,
            height=100,
        ),
    )
frame_meta.add_obj_meta(new_obj_meta)

It is not necessary to specify any parent, including the primary object, for objects added to the frame manually.

Next, let’s look at the methods of working with object attributes. The methods get_attr_meta and get_attr_meta_list are defined as follows:

def get_attr_meta(self, element_name: str, attr_name: str) -> Optional[AttributeMeta]

def get_attr_meta_list(self, element_name: str, attr_name: str) -> Optional[List[AttributeMeta]]

These methods return an attribute (or list of attributes in case of multi-label classification) with the specified name, created by the specified element, or None in case there is no such attribute.

For example, in the nvidia_car_classification sample, the attributes created by the classifiers are read in the user rendering procedure:

for obj_meta in frame_meta.objects:
    attr_meta = obj_meta.get_attr_meta('Secondary_CarColor', 'car_color')
    if attr_meta is not None:
        # use attr_meta.value to get attribute value

The add_attr_meta method allows adding a new attribute to an object. There is no need for a separate initialization for the metadata structure for the new attribute; all the properties described above are passed as arguments to add_attr_meta.

def add_attr_meta(
    self,
    element_name: str,
    name: str,
    value: Any,
    confidence: float = 1.0,
)

For example, in the traffic_meter sample, the counters resulting from the custom processing are added to the main frame object using arbitrary strings as element_name and name attributes:

primary_meta_object.add_attr_meta(
    'analytics', 'entries_n', self.entry_count[frame_meta.source_id]
)
primary_meta_object.add_attr_meta(
    'analytics', 'exits_n', self.exit_count[frame_meta.source_id]
)