Optimal DDR4 System with Data Bus Inversion

Hing Yan (Thomas) To, (Xilinx Inc.)
Changyi Su (Xilinx Inc.), Juan Wang (Xilinx Inc.)
Dmitry Klokotov (Xilinx Inc.), Lizhi Zhu (Xilinx Inc.), John Schmitz (Xilinx Inc.)
Penglin Niu (Xilinx Inc.), Yong Wang (Xilinx Inc.)
Hing Yan (Thomas) To  
*Technical Director, Xilinx Inc.*

tto@xilinx.com

Thomas is a Technical Director in System Memory Signal Integrity & Device Power Group at Xilinx, Inc. Prior to joining Xilinx, Thomas was with NVIDIA Advanced Technology Group focused on high speed (32GTs) circuits & system channel designs and supported different test chips for different advanced process nodes such as 20nm SOC & 16nm FINFET process. Before NVIDIA, Thomas worked for Intel for more than 16 years covered and led many different types of system memory IO development such as Sandy Bridge Server DDR IO and covered many different system memory technology ranging from DDR1 to DDR4. Thomas received his PhD degree in Electrical Engineering from the Ohio State University in 1995 & he has over 37 patents in the fields of mixed signal IO circuits and system memory configurations as well as high speed clocking for high speed memory designs.
Outline

- High Performance Computing Performance Requirement Trend
- Typical Power Distribution in Computing System Example
- System Memory Power Improvement Approach
  - Technology Process Node Scaling Trend
  - IO Voltage Scaling Trend
  - DDR4 IO signaling
- Data Bus Inversion (DBI) in DDR4 Interface
  - DQ bus data Functional View with DBI enabled
  - DDR4 System Power Improvement Example
  - DDR4 IO Interface Training & Calibration with DBI
- Power Noise Improvement with DBI
- Experimental Data Margin Validation and Results
- Summary & Conclusions
Computation Requirement Trend

→ Computing Performance Requirement increases exponentially.
→ Expected to maintain similar or lower the Power Envelope.
→ Traditionally CPU has been the dominated component.
→ System Memory becomes a factor as CPU power improves relatively.
System Memory Power Improvement Approach

- Technology Process Node Scaling Trends
  - Improving Process Technology improves speed, power and memory density.

- IO Voltage Scaling Trends
  - Scaling down the IO voltage improves IO power.

- IO signaling Improvements
  - IO Signaling can improve IO power
DRAM introduced with new Process Technology Node every year.
DRAM Power Improvement between DDR3 and DDR4

→ DDR4 device improves power from DDR3 device
→ DDR IO Voltage has been scaling down from generation to generation.
→ Scaling rate is slowing down.
Only Logic Low in DDR4 dissipates DC power.
Even with Power Reduction w.r.t. DDR3, RD/WD/Term Power still a large portion.
DDR4 can enable DBI to further improve IO power opportunistically.
Data From Core

Controller with DBI Enabled capability

DBI Functional View

Channel

DRAM

DQ & DQS

DBI#

\[ DBI^[k] = func(DQ(7:0)[k]) \quad \forall \ k \in \mathbb{N}_0 \]

\[ func(\eta) = sum_{logic_{low}}(\eta) > 4 \]
DBI Functional Burst Length View

Data From Core

Controller with DBI Enabled capability

Channel

DRAM

DQ & DQS

DBI#

Data From CORE

Bus Data

<table>
<thead>
<tr>
<th>DBI#</th>
<th>L</th>
<th>H</th>
<th>L</th>
<th>L</th>
<th>H</th>
<th>L</th>
<th>H</th>
<th>H</th>
<th>H</th>
</tr>
</thead>
<tbody>
<tr>
<td>DQ0</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>L→L</td>
<td>L→L</td>
</tr>
<tr>
<td>DQ1</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>L→H</td>
<td>L→H</td>
<td>L→H</td>
<td>L→H</td>
<td>L→H</td>
<td>H→H</td>
</tr>
<tr>
<td>DQ2</td>
<td>L→H</td>
<td>H→H</td>
<td>H→L</td>
<td>H→L</td>
<td>H→H</td>
<td>H→L</td>
<td>H→L</td>
<td>H→L</td>
<td>L→L</td>
</tr>
<tr>
<td>DQ3</td>
<td>H→L</td>
<td>H→H</td>
<td>L→H</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
</tr>
<tr>
<td>DQ4</td>
<td>H→L</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
</tr>
<tr>
<td>DQ5</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
<td>H→H</td>
</tr>
<tr>
<td>DQ6</td>
<td>L→H</td>
<td>L→L</td>
<td>H→L</td>
<td>L→H</td>
<td>L→H</td>
<td>L→H</td>
<td>L→H</td>
<td>L→H</td>
<td>H→H</td>
</tr>
<tr>
<td>DQ7</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>L→H</td>
<td>H→H</td>
<td>H→H</td>
</tr>
</tbody>
</table>

DBI#[3]=func(DQ_controller(7:0)[3])
System Power Comparison Set Up

Test Programs (Traffic Gen) with different Rd%--Wr% ratio

FPGA

Test Programs
TG_a
TG_m

Write %

Read %

DRAM

Test Programs with No DBI
TG_a
TG_m

Test Programs with DBI
TG_a
TG_m

VS
Read & Write Percentage Ratio for Relative Power Comparison

- Analyze the relative power improvement with different work loads.
Relative Power Improvement with DBI

System with DBI enabled shows relative power improvement.
Improved amount varies with Read and Write % ratio
DBI need Calibration

→ DBI bit need to be calibrated together with other DQ bits
Step Function Representation of with DQ Pattern

\[ DQ[0](t) = DQ[0]_r(t - r_1T) - DQ[0]_f(t - f_1T) + \ldots DQ[0]_r(t - r_iT) - DQ[0]_f(t - f_iT) + \ldots \]

\[ DQ[7](t) = DQ[7]_r(t - r_1T) - DQ[7]_f(t - f_1T) + \ldots DQ[7]_r(t - r_iT) - DQ[7]_f(t - f_iT) + \ldots \]

\[ DQS(t) = DQS_r\left(t - r_1\left(T - \frac{T}{2}\right)\right) - DQS_f\left(t - f_1\left(T - \frac{T}{2}\right)\right) + \ldots + DQS_r\left(t - r_i\left(T - \frac{T}{2}\right)\right) - DQS_f\left(t - f_i\left(T - \frac{T}{2}\right)\right) + \ldots \]
**DQ Eye Reference to DQS**

\[ DQ_{DQS} \text{Eye}(t) = \{ y(t + k_i T) | 0 \leq t \leq T, \ \forall \ k_i \in \mathbb{N}_0 \}, \ i = [r, f] \]

\[ \Rightarrow \text{Based on the rise and fall unit step response & their combinations:-} \]
\[ \Rightarrow \text{Construct calibration pattern & to search for worst case jitter and eye height.} \]
DBI bit Calibration with DQ

\[
DBI[k] = func(DQ(7:0)[k]) \quad \forall \ k \in \mathbb{N}_0
\]

\[
func(\eta) = sum_{logic\_low}(\eta) > 4
\]

\[
DBI[k] = func(rand\_cal(DQ(7:0))[k]) \quad \forall \ k \in \mathbb{N}_0
\]

→ Make sure all DQ bits will have toggling coverage.
Power Noise Improvement with DBI enabled

→ PDN Impedance ($Z_{pdn}$) is a function of frequency
→ Jitter is a function of $Z_{pdn}$ and step current load characteristic.
→ Average step current reduced by enabling DBI.
→ Voltage Droop performance improves.
Validation Methods:
- Direct measurement of DQ Eye at DRAM inputs.
- Write and Read Eye Shmoo.
- Compare with and without DBI enabled.
Direct Write Eye Measurement at DRAM

→ Write Eye measurement shows a 5% UI jitter improvement.
→ Validation extended to create functional Read and Write Eye shmoo next.
Read and Write Shmoo Set Up
Read Eye Shmoo without DBI Enabled
Read Eye Shmoo with and without DBI Enabled
Write Eye Shmoo without DBI Enabled
Write Eye Shmoo with and without DBI Enabled
Eye Shmoo Comparison

→ Eye width improvement observed
→ Improvement amount are different.
→ Write improved by 11%
→ Read improved by 7%
→ Different improvement implies different step current impact
→ Different PDN between DRAM unit and controller PHY.
Summary and Conclusions

- Computing Performance requirements drive the need to reduce system power.
- System memory Power became one of the major factor to the total system power.
- Traditional improvement methods, such as scaling process node and IO voltage, slow down.
- DDR4 IO introduced DBI function to opportunistically reduce the IO power.
- Power improvement amount varies with Write and Read Ratio.
- DBI reduced the average step current in memory system, hence improved channel margin.
- Experimental data showed the Channel Jitter improvement differs between Write and Read direction.
Thank you!

---

QUESTIONS?