Multimedia SoC System Solutions

Presented By

Yashu Gosain & Forrest Picket: System Software & SoC Solutions Marketing
Girish Malipeddi: IP Subsystems Marketing
Agenda

> Zynq Ultrascale+ MPSoC and Multimedia blocks
> Software overview
> Multimedia Framework
> Target Reference design
> Platforms
Multimedia Blocks
Next-Generation SoC with Integrated Video Codec

Integrated Video Codec
- UHD 4K (60fps) / 8K (15fps)
- 8 Simultaneous Encode/Decode Streams

Application Processor
- 64-bit Quad-core A53
- Up to 1.5GHz

Real-Time Processor
- 32-bit Dual-core R5
- 128KB TCM w/ ECC

Graphics Processor
- ARM Mali-400/MP2
- 2D/3D Visualization

16nm Programmable Logic
- Any-to-Any Connectivity
- Processor Offloading

High Speed Peripherals
- PCIe Gen2, USB 3.0
- DisplayPort, SATA 3.1
Different classes of Graphics Processing unit

- High Performance Graphics
  - Gaming, 3D Vision, & 4K Display

- General Purpose GPU
  - Data Center Acceleration and High Performance Computing

- Power Optimized Graphics
  - Embedded Graphics

- Hardware Acceleration

- Power-Optimized GPU for Embedded Graphics
- Programmable Logic for Accelerated Compute

OpenCL

Massive Parallelism
## Graphics Processor Unit

### ARM Mali-400 MP2

<table>
<thead>
<tr>
<th>Feature</th>
<th>Benefit</th>
</tr>
</thead>
<tbody>
<tr>
<td>ARM Mali™-400 MP2 up to 667MHz</td>
<td>• Most power-optimized ARM GPU with Full HD support (1080p)</td>
</tr>
<tr>
<td></td>
<td>• Ideal for 2D vector graphics and 3D graphics (e.g., HMI, waveform processing)</td>
</tr>
<tr>
<td></td>
<td>• Supports open standards, e.g., OpenGL ES 1.1 &amp; 2.0</td>
</tr>
<tr>
<td>Native Embedded Linux Support</td>
<td>Out-of-the-box drivers and libraries for graphics support</td>
</tr>
<tr>
<td>Dual Pixel Processors</td>
<td>• Up to 1.3 GPix/s fill rate for smoother transition and frame rate</td>
</tr>
<tr>
<td></td>
<td>• Up to 20 GFLOPS shader rate for complex 3D scenes</td>
</tr>
<tr>
<td>Optimized Memory Interface</td>
<td>Tightly coupled w/memory controller for efficient communication with DisplayPort controller</td>
</tr>
</tbody>
</table>

---

**Full HD (1920x1080) GLmark2 Benchmark**

- **Performance (fps)**
  - GPU: 50x performance boost
- **Power (mW)**
  - Similar power consumption
Video Codec Implementation Strategies

- **Software Solution**
  - Server Class CPU/Cloud Based
  - Cost Effective
  - Flexible
  - Physically Large

- **Soft IP Solution**
  - Programmable Logic
  - Cost Effective
  - Flexible
  - Large Fabric footprint

- **Dedicated Video Codec**
  - Video Codec
  - Cost Effective
  - Highly Integrated
  - Flexible

© Copyright 2018 Xilinx
# Video Codec Unit

## Integrated H.264/H.265 Video Codec Engine

<table>
<thead>
<tr>
<th>Feature</th>
<th>Benefit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Integrated Video Codec Unit</td>
<td>• Up to 4K UHD (60 fps) or 8Kx4K (15 fps)</td>
</tr>
<tr>
<td></td>
<td>• Up to 8 simultaneous streams</td>
</tr>
<tr>
<td></td>
<td>• Flexible memory topology to enable scalable system performance</td>
</tr>
<tr>
<td>Power Management, Performance Monitoring</td>
<td>• Clock gating (codec firmware automatically clock gates unused engines)</td>
</tr>
<tr>
<td></td>
<td>• Measure task execution time, bandwidth, and latency for fast design optimization</td>
</tr>
</tbody>
</table>

![Diagram of Video Codec Unit with connections to Camera, Ethernet, and Display]

- Camera
- Ethernet
- Display
- Memory Controller
- Video Codec Unit
- Encoder
- Decoder
- Ethernet

**Feature**
- Programmable Logic
- Video Codec
- Ethernet

**Benefit**
- Clock gating (codec firmware automatically clock gates unused engines)
- Measure task execution time, bandwidth, and latency for fast design optimization

© Copyright 2018 Xilinx
## Architecture Overview

<table>
<thead>
<tr>
<th>Feature</th>
<th>Benefit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Video Resolution</td>
<td>Upto 4k @ 30 Hz</td>
</tr>
<tr>
<td>Audio Support</td>
<td>2 Channel of 24 bit Audio upto 96 KHz</td>
</tr>
<tr>
<td>Multiple channel</td>
<td>Once channel of Graphics and Video</td>
</tr>
<tr>
<td>Features</td>
<td>• Chroma Keying</td>
</tr>
<tr>
<td></td>
<td>• Alpha Blending</td>
</tr>
<tr>
<td></td>
<td>• Live and Non-live video</td>
</tr>
</tbody>
</table>

![Display Port Diagram](image-url)
## Memory Subsystem

<table>
<thead>
<tr>
<th>Feature</th>
<th>Benefit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Dedicated DDR Memory Controller</td>
<td>Integrated in processing system for lower power usage and reduced latency</td>
</tr>
<tr>
<td>6 AXI Ports For Shared System Access</td>
<td>Multi-ported controller enables PS and PL shared access to common memory</td>
</tr>
<tr>
<td>32/64-bit Configurable Widths w/ECC</td>
<td>Supports varying data widths from processing engines</td>
</tr>
</tbody>
</table>
| 256KB On-Chip Memory (OCM) w/ECC | • Low latency memory decreases cost for additional external memory  
• Shareable by Cortex-A53s, Cortex-R5s, and programmable logic |
| Tightly Coupled Memory (TCM) | Low-latency, deterministic memory access for Cortex-R5s in functional safety applications |

### Supported Interfaces in Processing System

<table>
<thead>
<tr>
<th>Interface</th>
<th>Max DDR Rate (Mb/s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>DDR4</td>
<td>2400*</td>
</tr>
<tr>
<td>LPDDR4</td>
<td>2400</td>
</tr>
<tr>
<td>DDR3</td>
<td>2133</td>
</tr>
<tr>
<td>DDR3L</td>
<td>1866</td>
</tr>
<tr>
<td>LPDDR3</td>
<td>1800</td>
</tr>
</tbody>
</table>

*DDR4 up to 2,667Mb/s in Programmable Logic
# Programmable Logic IPs

## Programmable Logic IPs Video capture and Display

<table>
<thead>
<tr>
<th></th>
<th>HDMI2.0 @6Gbps/lane</th>
<th>MIPI CSI Rx and DSI Tx</th>
<th>SDI</th>
<th>DisplayPort TX</th>
</tr>
</thead>
<tbody>
<tr>
<td>HDMI</td>
<td>4K60 RX and TX</td>
<td>DPHY@ 1.5Gbps/lane</td>
<td>12G-SDI</td>
<td>4K60 in Programmable logic</td>
</tr>
<tr>
<td></td>
<td>RGB and YUV</td>
<td>RAW, RGB and YUV</td>
<td>4K60</td>
<td>4K30 in Programmable PS</td>
</tr>
<tr>
<td>MIPI</td>
<td></td>
<td></td>
<td>YUV</td>
<td></td>
</tr>
<tr>
<td>SDI</td>
<td>12G-SDI</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>DisplayPort</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

## Programmable Logic IPs Video and Image processing

<table>
<thead>
<tr>
<th>Video Processing subsystem</th>
<th>ISP</th>
<th>Video Mixer</th>
<th>Frame Buffer</th>
</tr>
</thead>
<tbody>
<tr>
<td>Scaling, Color space conversion, deinteracing Up to 4K60</td>
<td>Demosaic and GammLUT Up to 4K60</td>
<td>8 Layers of mixing + graphics</td>
<td>Write and Read Frames for Video codec consumption</td>
</tr>
</tbody>
</table>
Software Overview
Multimedia Components
Typical Video Pipeline
Video Support in Linux

- Different solutions, provided by different subsystems:
  - FBDEV: Framebuffer Device
  - DRM/KMS: Direct Rendering Manager / Kernel Mode Setting
  - V4L2: Video For Linux 2

- How to choose one: it depends on your needs
  - Each subsystem provides its own set of features
  - Different levels of complexity
  - Different levels of activity
Video For Linux (V4L2)

Key Feature

➤ Frame-based video pipelines with streaming and/or memory interfaces
  ➤ Video capture devices
  ➤ Video memory to memory devices
  ➤ Video output devices (no graphics)

➤ DMABUF
  ➤ 0-copy buffer sharing

➤ Media controller
  ➤ Describes logical topology and data-flow

➤ Multimedia libraries
  ➤ Gstreamer, OpenCV, OpenMAX
Top View- Capture Pipeline

Capture Pipeline
- HDMI Rx
- SCALER
- FRAME WRITE

Processing Pipeline

Display Pipeline

Source

CPU

DDR Memory

0
a0000000.v_hdmi_rx_ss
/dev/v4l-subdev2
1

0
a0080000.scaler,/dev/v4l-subdev1
1

vcap_hdmi write 0
/dev/video0
Direct Rendering Manager (DRM)

- Introduced to deal with display cards with embedded GPUs
- KMS stands for Kernel Mode Setting and is a sub-part of the DRM API
  - Provide a way to configure the display pipeline of a graphic card (or an embedded system)
Top View of Display Pipeline

Capture Pipeline

Processing Pipeline

Display Pipeline

CPU

DDR Memory

Source

Sink

HDMI Tx
(Programmable Logic)

MIXER

Source
Graphics Software Stack

OpenGLES Application

- EGL
- X11
- fbdev
- Wayland
- SF
- OpenGLES1
- OpenGLES2
- OpenVG

Mali common user library

- MMU
- GP
- PP
- L2 cache
- PMU

Mali kernel driver

- GP0
- PP0
- PP1
- L2 cache
- PMU

Mali 400MP2

Kernel

User

Hardware

© Copyright 2018 Xilinx
VCU Software Stack

- Control Software allows control of the VCU at a low level
  - Direct access to the low level drivers
- GStreamer provides Video Framework at a high level
- Zynq® UltraScale+™ EV devices are true solution-level products from Xilinx
Multimedia Solution
Gstreamer Framework
What is Gstreamer framework?

> **GStreamer** is a pipeline-based multimedia framework for creating streaming media applications

> A Multimedia framework designed to be cross-platform

> Various types of media processing can be realized by describing data flows, called ‘pipelines’, with components, called ‘plugins’.

> Over 200 plugins exist

> Gstreamer operates dynamically at *run time*
Why Gstreamer Framework?

> Multimedia challenges
  >> Creating Multimedia pipeline is complex process.
  >> Lack of reuse of code among different media processing block
  >> Inconsistent APIs among different codecs, Libraries and devices

> Gstreamer open-source collaborative solution for non-trivial media frameworks
  >> allows processing units to be treated generically “Elements” are connected at connection points
  >> Along with related/associated open solutions (e.g. Linux, DRM, ALSA, OMX, V4L2

> Mature Code base and widely used

> Fundamentally the reason is to leverage the huge amount of work – aka “re-use”
GStreamer Framework

Input Protocols
- V4L2src
- ALSAsrc
- Network
- File System

GStreamer Framework core, de(muxer), generic elements

Gstreamer Framework

- Wrapper Plugin Gst_omx
- Wrapper Plugin
- Custom Plugin
- Custom Logic: (Acceleration)
- Video Codec
- Audio Codec (libfdkaac)

Output Protocols
- Kmssink
- ALSAsink
- Network
- File System
Target Reference Designs
VCU TRD (ZCU106 board)
Platform for acceleration
Platform-Based Development

C/C++ Application

SDSoC Environment

Generated

Application
Driver
Interface IPs

Application
Driver

AXI Bus
Connectivity
Programmable Logic (PL)

Application
Driver
Interface IPs

˃ Custom platform = Vivado project + Bootable software image

˃ Available for commonly used development kit and SoMs
reVISION Platforms: Single sensor platform

- Platform Support for Zynq US+ Boards: ZCU102 and ZCU104
- Live capture over HDMI, MIPI, USB
- Display over HDMI or DP
- Neural network support for AlexNet, GoogLeNet, VGG, SSD, and FCN
- OpenCV acceleration support thru Xfopen CV
- Linux sample designs
  - Dense optical flow Lucas-Kanade
  - 2D Filter for sharpening and edge detect
  - Stereo depth vision

ZCU10X

ARM Cortex-A53

SDSoC Application

Gstreamer  Gstreamer  App Stub  Gstreamer

V4L2  OMX  DM* Driver  DRM

Linux

USB3  DDR  Stereo Depth Map

HDMI  DDR  Optical Flow

MIPI  DDR  CNN

ISP/ VPS S*  DDR  Multi-sensor Design

VCU*  DDR  File

DP  DDR  HDMI

GigE
reVISION Platforms: Multi-camera Imaging and Analytics

On-semi MARS: 2MP AR0231 camera MAX96705 GMSL serializer
Avnet MULTI_CAM4-G: 4-camera input MAX9286 GMSL Quad De-serializer

Kit sold by Avnet

Linux drivers for
- AR0231
- MAX96705 Deserliazer
- MAX9286 Serializer

reVISION platform support for Zynq US+ Boards: ZCU102 and ZCU104
- Linux based reference designs with
  - Quad camera capture pipes, OpenCV accelerators and Live Display
- Sample designs showing OpenCV acceleration on quad cameras
  - Optical flow
  - Filter_2D

Optical Flow
Filter 2D

Kit sold by Avnet
Linux drivers for
- AR0231
- MAX96705 Deserliazer
- MAX9286 Serializer

reVISION platform support for Zynq US+ Boards: ZCU102 and ZCU104
- Linux based reference designs with
  - Quad camera capture pipes, OpenCV accelerators and Live Display
- Sample designs showing OpenCV acceleration on quad cameras
  - Optical flow
  - Filter_2D

Optical Flow
Filter 2D

Kit sold by Avnet
Linux drivers for
- AR0231
- MAX96705 Deserliazer
- MAX9286 Serializer

reVISION platform support for Zynq US+ Boards: ZCU102 and ZCU104
- Linux based reference designs with
  - Quad camera capture pipes, OpenCV accelerators and Live Display
- Sample designs showing OpenCV acceleration on quad cameras
  - Optical flow
  - Filter_2D

Optical Flow
Filter 2D