In order for robots to become a bigger part of our society, they must be able to intelligently interface with their dynamic environments. Part of the challenge in achieving this is integrating vast quantities of data from many cameras and sensors. Pre-written blocks of Linux® software make this task easier. This article from Qualcomm explores sample applications for AI inference and video in IoT and robotics applications.
Multi-camera streaming
The command-line application gst-multi-camera-example demonstrates streaming from two camera sensors simultaneously. It can apply side-by-side composition of the video streams to show on a display device, or it can encode and store the streams to files.
The application pipeline looks like this:

The application supports two configurations:
- Composition and display – The qtimmfsrc plugin on camera 0 and camera 1 captures the data from the two camera sensors. qtivcomposer performs the composition, then waylandsink displays the streams side by side on the screen.
- Video encoding – The qtimmfsrc plugin on camera 0 and camera 1 captures the data from the two camera sensors and passes it to the v4l2h264enc plugin. The plugin encodes and compresses the camera streams to H.264 format, then hands them off for parsing and multiplexing using the h264parse and mp4mux plugins, respectively. Finally, the streams are handed off to the filesink plugin, which saves them as files.
Here’s an example of the output from the first configuration: Right side image is monochrome, as second camera sensor on development kit is monochrome.

When would you use this application?
gst-multi-camera-example is a building block for capturing data from two camera sensors, with options for either composing and displaying the video streams or encoding and storing the streams to files. You can use this sample app as the basis for your own camera capture/encoding applications, including dashcams and stereo cameras.
Video wall – Multi-channel video decode and display
The command-line application gst-concurrent-videoplay-composition facilitates concurrent video decode and playback for AVC-coded videos. The app performs composition on multiple video streams coming from files or the network (e.g., IP cameras) for display as a video wall.
The application can take multiple (such as 4 or 8) video files as input, decode all the compressed videos, scale them and compose them as a video wall. The application requires at least one input video file, in MP4 format with an AVC codec.
The application pipeline looks like this for 4 channels:

Each channel uses plugins to perform the following processing:
- Reads compressed video data from a file using filesrc.
- Demultiplexes the file with qtdemux.
- Parses H.264 video streams using h264parse.
- Decodes the streams using v4l2h264dec.
The decoded streams from all channels are then composed together using qtivcomposer and displayed using waylandsink.
Here’s an example of using the app gst-concurrent-videoplay-composition on 4 video streams:

When would you use this application?
With gst-concurrent-videoplay-composition you can decode multiple compressed video streams, then compose them into a video wall; for example, in retail spaces and digital signage. As an edge box for video surveillance, you can capture input from multiple IP cameras and display it in a single screen. In a video conferencing application, you can process and display feeds from multiple people on the call, with each participant streaming a video.
Next steps
You can get those applications or the entire Qualcomm Intelligent Multimedia SDK on GitHub. And then you can start incorporating them to your own applications.
