July 6, 2022
Editor’s Note: This content is contributed by Gordon Lau, Systems Architect - Pro Audio/Video, Broadcast, and Consumer at AMD
We have all come to expect high-quality video when viewing video streams for work or play. The delivery of media using IP networks, known as AV-over-IP, helps to quench an insatiable demand for content, equipment, and bandwidth. Whether for live streaming your latest game challenge or attending virtual events, the expectation is that compressed video is delivered with higher detail, greater contrast, and a wide array of lifelike colors. Ultra high-definition (UHD) video as defined by the Ultra HD Forum combines improvements in resolution, wide color gamut (WCG), increased frame rates, higher bit depths, and high dynamic range (HDR) to deliver a more realistic visual experience.
HDR, in particular, is a relatively new feature to the video ecosystem, which focuses on increasing the contrast between the darkest blacks and the brightest whites, resulting in more detailed shadows, brighter reflections, and highlights. But the overall visual effect of HDR is much more compelling, as pictures appear richer, more lifelike, and the improved contrast delivers sharper, more detailed images. Viewers often perceive more of a visual impact due to HDR than an increase in resolution. So, it’s no surprise that both content developers and viewers who have been exposed to the visual benefits of HDR content want this same feature when distributing or consuming content via compressed workflows. This is especially true when transmission bandwidth or storage are real constraints throughout the AV-over-IP distribution chain and HDR can provide a remarkable visual impact while consuming only marginally more bandwidth or storage1.
To understand how HDR can give users the perception of higher detail and higher resolution, one needs to understand the human visual system. Our eyes have evolved over millions of years and are sensitive to a wide dynamic range of light. Think of low-light situations such as star gazing or cooking over the campfire, and the other extreme of watching sailboats on a glimmering waterfront on a sunny day – our eyes take it all in their stride. Light levels are quantified using the unit candela per square meter, also known as a nit. And while our eyes are sensitive from near zero to well over 1000nits, a sunny day can be well over 5000nits with specular highlights in the tens of thousands – compare this to early displays, which were limited to approximately 100nits.
Figure 1 shows the concept of an HDR system at a high level. The challenge of delivering HDR content begins with capturing it. Captured scene light is converted to an electrical representation using an opto-electronic transfer function (OETF). Generally, sensors and cameras are equipped to capture a fair amount of scene light, and this is often managed in post-production, where further processing can optimize color and light levels. That electrical representation is distributed to one or more displays, where it is converted back into scene light using an electro-optical transfer function (EOFT). The original EOFT that was designed for CRTs (cathode ray tubes), used for more than 60 years, is commonly known as gamma.
<Figure 1: Scene Capture, Transmission, and Display>
Today’s mass-market consumer displays are capable of scenes with peak brightness approaching 1200 nits. Combined with numerous other advances in lighting techniques and manufacturing, we now have affordable panels capable of impressive dynamic contrast, with the ability to deliver all the perceptual benefits HDR can bring to content.
As with all new technologies, we are now in a period of transition, where most sources and displays continue to support standard dynamic range (SDR) and the original gamma EOFT, while the market adopts HDR. Has the entire cinema/broadcast/gaming industry agreed upon a single standard to communicate and source and display capabilities? Answer: Sort of. The easy answer is that media delivery is now more complex than ever since there isn’t a singular media and entertainment industry. Cinema has the luxury of the best cameras in the industry and a budget that allows for many hours of post-production, but with the recent complication of OTT/VOD (over-the-top/video on demand) streaming delivery as a primary consumption method. E-sports (video gaming) has quickly become a source of HDR content, with an equally impressive viewing audience growth rate.
Meanwhile, traditional broadcast, sports, and media have quality, latency, and bandwidth restrictions which require unique solutions for each workflow. This is part of the reason why multiple HDR delivery formats exist in the market today, with two different EOTFs – different use cases require different solutions. In order to leverage existing broadcast infrastructure (and regulated bandwidth limits) during this period of transition, the hybrid log gamma (HLG) EOTF was developed to enable backward compatibility to the existing majority of SDR displays with a hybrid gamma curve that can advantage of higher luminance capabilities of newer displays without the complications of additional metadata to produce an image on both SDR and HDR displays.
There are other HDR formats that are based on the perceptual quantizer (PQ) EOTF, which is a non-linear transfer function based on human perception, designed to provide more bits of representation where the eye is most sensitive to light. These PQ-based formats require metadata as defined by SMPTE-2086, which standardizes parameters described such as color primaries, white point, and luminance range of the original mastering display – potentially allowing all viewers to replicate the same viewing conditions. Each format, whether HLG or PQ based, delivers HDR content with tradeoffs in content fidelity, workflow impacts, metadata requirements as well as royalty obligations that need to be considered when deploying HDR systems and workflows.
Streaming has enabled high-quality media distribution and consumption to happen virtually anywhere on the planet, and has resulted in even more demand which continues to grow according to Sandvine’s 2022 Global Internet Phenomena Report. Streaming adds an additional layer of complexity as bandwidth is not only a physical limitation, but an additional cost as well. Viewed in this manner, HDR is a unique feature for streaming content because, given the same bandwidth and video compression settings, HDR can add perceived detail and resolution impact with a negligible change to bandwidth1, even with the additional metadata required by some formats.
The Zynq UltraScale+ MPSoC EV family of devices has been enabling equipment manufacturers with a cost-effective, low-power, high-performance single-chip solution that combines a full-featured multicore Arm® processing subsystem, an embedded real-time 4kp60 4:2:2 10-bit video codec unit (VCU) capable of simultaneous H.264/H.265 encoding/decoding, and flexible high-performance programmable logic for audio, video and custom interfaces. On top of this silicon platform AMD-Xilinx has built a driver and application stack based on common industry tools and frameworks such as Linux, V4L2 and GStreamer to enable customers to quickly evaluate functionality and to rapidly develop a customized solution.
The Zynq UltraScale+ MPSoC device is flexible enough to be both a single-chip capture/encode or decode/display device that supports existing HDR formats such as HLG, HDR10, and others. For HLG EOTF HDR formats that do not require metadata, the appropriate colorimetry information is extracted from the physical connection such as SDI or HDMI, and the required data is saved in the encoded bitstream within standardized video usability information (VUI) fields as defined by ITU for H.264 or H.265 bitstreams. For PQ EOTF HDR formats, there are key parameters defined by SMPTE-2086 that are captured from the physical connection such as color primaries, transfer characteristics and elements associated with the mastering display such as display color volume (MDCV) and content light level (CLL).
These are stored in the appropriate fields in standardized VUI and supplemental enhancement information (SEI) fields of the compressed bitstream so that the data can be distributed, decoded, and extracted for the ultimate display element(s) of the system as shown in figure 2. AMD-Xilinx supports transporting of any required HDR metadata via an open, standardized mechanism, allowing for proper interoperability and flexibility for future HDR formats.
<Figure 2: Conceptual flow of PQ HDR metadata within AMD-Xilinx multimedia stack>
You’ll want an easy vehicle to show all the components of programmable hardware, processor systems, and software working together. AMD-Xilinx has created targeted reference designs (TRDs), which provide all the required sources, project files, and recipes to recreate reference designs that operate at performance (4kp60) with industry-standard interfaces such as SDI and HDMI 2.0 so one can evaluate the system components together and rapidly move to customization. There are two TRD variants to show examples of both HLG and PQ type HDR formats implemented on the Zynq UltraScale+ MPSoC ZCU106 Evaluation Kit that you can try today. Go to the Wiki page to get the latest TRDs.
PL DDR HLG SDI Audio Video Capture and Display
HLG/non-HLG video + 2/8 channels audio capture and display via SDI with VCU encoding from PS DDR and decoding from PL DDR
PL DDR HDR10 HDMI Video Capture and Display
HDMI design to showcase encoding with PS DDR and decoding with PL DDR. Supports HDR10 static metadata for HDMI, as well as DCI4K.
The adaptable hardware and software architecture of the Zynq UltraScale+ MPSoC enables multimedia system developers to implement new HDR formats as the ecosystem evolves, so they will always be ready for the future!