AI Engine Intrinsics User Guide  (AIE) r2p22
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages
Peak Cancellation Crest Factor Reduction (PC-CFR)

Overview

These are the intrinsic functions used for implementing a peak cancellation based crest factor reduction (PC-CFR) application. The functionality for this application is split between AIE and programmable logic (PL), where the PL carries out the peak detections and AIE computes the aggregate cancellation signal for the detected peaks. The cancellation signal samples computed by the AIE are subtracted in the PL from the delayed original signal, to cancel the peaks.

The AIE computes the cancellation signal samples by scaling the cancellation pulse (CP) coefficients (which are stored in the AIE memory) for different peaks and summing them up. The two input stream interfaces of the AI Engine are used to receive the following information from the PL: 1) Metadata for LUT indices to read CP coefficients + configuration information for the vectorized mul/mac operations, 2) Complex scaling factors for the detected peaks. The output stream interface of the AI Engine is employed to send the computed cancellation signal samples to the PL.

Typically the AIE program computing the aggregate cancellation signal for N detected peaks comprises the following steps :

Functions

void split (int a, unsigned n, int &d0, unsigned &d1)
 Intrinsic used to split the 32 bit input data into two resulting variables at the n-th bit. More...
 

CFR Multiplication Intrinsics

v8cacc48 mul8_cfr (v16cint16 xbufa, v16cint16 xbufb, int rev_xstart, int xrot, v8cint16 zbuf, unsigned int zstart)
 Complex multiply intrinsic function for cancellation signal calculations in peak-cancellation crest factor reduction algorithm. More...
 
v8cacc48 mac8_cfr (v8cacc48 acc, v16cint16 xbufa, v16cint16 xbufb, int rev_xstart, int xrot, v8cint16 zbuf, unsigned int zstart)
 Complex multiply intrinsic function for cancellation signal calculations in peak-cancellation crest factor reduction algorithm. More...
 

Function Documentation

v8cacc48 mac8_cfr ( v8cacc48  acc,
v16cint16  xbufa,
v16cint16  xbufb,
int  rev_xstart,
int  xrot,
v8cint16  zbuf,
unsigned int  zstart 
)

Complex multiply intrinsic function for cancellation signal calculations in peak-cancellation crest factor reduction algorithm.

PARAMETERS

Input/OutputType Valid bitsComments
acc v8cacc48 All Running accumulation vector (8 x cint48 lanes). Only in mac variant.
xbufa v16cint16All First input buffer of 16 complex samples of type cint16
xbufb v16cint16All Second input buffer of 16 complex samples of type cint16
rev_xstart int 5b LSB MSB : Flag for backwards input selection / 4b LSB : select starting point within input data.
xrot int 2b LSB Selects which 256b lanes of 8 complex samples from bufa and bufb to use. This must be a compile time constant.
zbuf v8cint16 All Buffer of scaling factors for each qualified peak
zstart int 3b LSB Selects which of the 8 scaling factor values is used. This must be a compile time constant.
return valuev8cacc48 All Resulting accumulation vector (8 x cint48 lanes)
Note
Parameters 'xrot' and 'zstart' must be compile time constants.

The input data provided by xbufa and xbufb can be seen as a concatenation of 8 cancellation pulse (CP) coefficients of type cint16 from xbufa followed by the next 8 coefficients from xbufb as selected by xrot. The resulting 16 samples will be referred to as "CP" in this document. The CP coefficients are loaded to xbufa and xbufb from memory. zbuf contains the 8 scaling factor values, one for each qualified peak. The zstart parameter is used to select the scaling factor for each mul operation.

DATA SELECTORS

xrot:

Selects the first or second set of 8 values to be used from both buffers A and B :

xrot value selection in xbufa selection in xbufb
0x0 coefficients 0 to 7 coefficients 0 to 7
0x1 coefficients 8 to 15coefficients 0 to 7
0x2 coefficients 0 to 7 coefficients 8 to 15
0x3 coefficients 8 to 15coefficients 8 to 15

Examples :

If you have updated previously updated xbufa with upd_w(0) (values 0 to 7 have been replaced), and xbufb with upd_w(1) (values 8 to 15), you would chose xrot=0x2

CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
xbufa(0)xbufa(1)xbufa(2)xbufa(3)xbufa(4)xbufa(5)xbufa(6)xbufa(7)xbufb(8)xbufb(9)xbufb(10)xbufb(11)xbufb(12)xbufb(13)xbufb(14)xbufb(15)

If you have xrot=0x1 :

CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
xbufa(8)xbufa(9)xbufa(10)xbufa(11)xbufa(12)xbufa(13)xbufa(14)xbufa(15)xbufb(0)xbufb(1)xbufb(2) xbufb(3) xbufb(4) xbufb(5) xbufb(6) xbufb(7)

It is standard practice to use only upd_w(0) and leave xrot at 0x0 unless your application can benefit from this option.

zstart:

Selects which of the 8 scaling factor values in zbuf will be used for the multiply operation, simply varies between 0x0 and 0x7

rev_xstart:

The 4 LSB select the starting point within the 16 CP values, since only 8 input CP values will be used for a mac operation.

Once the starting point is selected, the remaining MSB of rev_xstart determines which direction the operation will take place. The use of this flag improves the memory efficiency for conjugate-symmetric CPs since only half of the CP coefficients need to be present in the memory.

Example :

If the 4 bits are set to 0x7, CP(7) will be selected as the starting point. Then the MSB of rev_xstart will influence the way the operation works :

  • Set to 0 : CP(7) up to CP(14) will be used
CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
start ---—>---—>----—>----—>----—>----—>end
  • Set to 1 : The complex conjugates of CP(7) down to CP(0) will be used
CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
end <---—<---—<---—<---—<---—<---—start



FULL EXAMPLES

For both examples, the CP values will have been loaded into the lower half of xbufa and xbufb before they are passed to the function, and xrot can be left at 0x0.

The mac variant of this intrinsic is similar, but accumulates into acc instead of assignment.

Example 1 (MSB of rev_xstart=0)

Command : mul8_cfr(xbufa, xbufb, 0x04, 0x0, zbuf, 0x2)

  • zstart is at 0x2, the third scaling factor value of zbuf will be used.
  • The 4 LSB of rev_xstart are set to 0x4, the starting point within the 16 available CP values will be 4.

Resulting operation :

acc(0) = zbuf(2) * CP(4)
acc(1) = zbuf(2) * CP(5)
acc(2) = zbuf(2) * CP(6)
acc(3) = zbuf(2) * CP(7)
acc(4) = zbuf(2) * CP(8)
acc(5) = zbuf(2) * CP(9)
acc(6) = zbuf(2) * CP(10)
acc(7) = zbuf(2) * CP(11)

Example 2 (MSB of rev_xstart=1)

Command : mul8_cfr(xbufa, xbufb, 0x09, 0x0, zbuf, 0x3)

  • zstart is at 0x3, the fourth scaling factor value of zbuf will be used.
  • The 4 LSB of rev_xstart are set to 0x9, the starting point within the 16 available CP values will be 9.
  • The MSB of rev_xstart is set to 1, so the CP values will go from CP(9) down to CP(2) and use the complex conjugates

Resulting operation :

acc(0) = zbuf(3) * conj(CP(9))
acc(1) = zbuf(3) * conj(CP(8))
acc(2) = zbuf(3) * conj(CP(7))
acc(3) = zbuf(3) * conj(CP(6))
acc(4) = zbuf(3) * conj(CP(5))
acc(5) = zbuf(3) * conj(CP(4))
acc(6) = zbuf(3) * conj(CP(3))
acc(7) = zbuf(3) * conj(CP(2))
v8cacc48 mul8_cfr ( v16cint16  xbufa,
v16cint16  xbufb,
int  rev_xstart,
int  xrot,
v8cint16  zbuf,
unsigned int  zstart 
)

Complex multiply intrinsic function for cancellation signal calculations in peak-cancellation crest factor reduction algorithm.

PARAMETERS

Input/OutputType Valid bitsComments
acc v8cacc48 All Running accumulation vector (8 x cint48 lanes). Only in mac variant.
xbufa v16cint16All First input buffer of 16 complex samples of type cint16
xbufb v16cint16All Second input buffer of 16 complex samples of type cint16
rev_xstart int 5b LSB MSB : Flag for backwards input selection / 4b LSB : select starting point within input data.
xrot int 2b LSB Selects which 256b lanes of 8 complex samples from bufa and bufb to use. This must be a compile time constant.
zbuf v8cint16 All Buffer of scaling factors for each qualified peak
zstart int 3b LSB Selects which of the 8 scaling factor values is used. This must be a compile time constant.
return valuev8cacc48 All Resulting accumulation vector (8 x cint48 lanes)
Note
Parameters 'xrot' and 'zstart' must be compile time constants.

The input data provided by xbufa and xbufb can be seen as a concatenation of 8 cancellation pulse (CP) coefficients of type cint16 from xbufa followed by the next 8 coefficients from xbufb as selected by xrot. The resulting 16 samples will be referred to as "CP" in this document. The CP coefficients are loaded to xbufa and xbufb from memory. zbuf contains the 8 scaling factor values, one for each qualified peak. The zstart parameter is used to select the scaling factor for each mul operation.

DATA SELECTORS

xrot:

Selects the first or second set of 8 values to be used from both buffers A and B :

xrot value selection in xbufa selection in xbufb
0x0 coefficients 0 to 7 coefficients 0 to 7
0x1 coefficients 8 to 15coefficients 0 to 7
0x2 coefficients 0 to 7 coefficients 8 to 15
0x3 coefficients 8 to 15coefficients 8 to 15

Examples :

If you have updated previously updated xbufa with upd_w(0) (values 0 to 7 have been replaced), and xbufb with upd_w(1) (values 8 to 15), you would chose xrot=0x2

CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
xbufa(0)xbufa(1)xbufa(2)xbufa(3)xbufa(4)xbufa(5)xbufa(6)xbufa(7)xbufb(8)xbufb(9)xbufb(10)xbufb(11)xbufb(12)xbufb(13)xbufb(14)xbufb(15)

If you have xrot=0x1 :

CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
xbufa(8)xbufa(9)xbufa(10)xbufa(11)xbufa(12)xbufa(13)xbufa(14)xbufa(15)xbufb(0)xbufb(1)xbufb(2) xbufb(3) xbufb(4) xbufb(5) xbufb(6) xbufb(7)

It is standard practice to use only upd_w(0) and leave xrot at 0x0 unless your application can benefit from this option.

zstart:

Selects which of the 8 scaling factor values in zbuf will be used for the multiply operation, simply varies between 0x0 and 0x7

rev_xstart:

The 4 LSB select the starting point within the 16 CP values, since only 8 input CP values will be used for a mac operation.

Once the starting point is selected, the remaining MSB of rev_xstart determines which direction the operation will take place. The use of this flag improves the memory efficiency for conjugate-symmetric CPs since only half of the CP coefficients need to be present in the memory.

Example :

If the 4 bits are set to 0x7, CP(7) will be selected as the starting point. Then the MSB of rev_xstart will influence the way the operation works :

  • Set to 0 : CP(7) up to CP(14) will be used
CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
start ---—>---—>----—>----—>----—>----—>end
  • Set to 1 : The complex conjugates of CP(7) down to CP(0) will be used
CP(0) CP(1) CP(2) CP(3) CP(4) CP(5) CP(6) CP(7) CP(8) CP(9) CP(10) CP(11) CP(12) CP(13) CP(14) CP(15)
end <---—<---—<---—<---—<---—<---—start



FULL EXAMPLES

For both examples, the CP values will have been loaded into the lower half of xbufa and xbufb before they are passed to the function, and xrot can be left at 0x0.

The mac variant of this intrinsic is similar, but accumulates into acc instead of assignment.

Example 1 (MSB of rev_xstart=0)

Command : mul8_cfr(xbufa, xbufb, 0x04, 0x0, zbuf, 0x2)

  • zstart is at 0x2, the third scaling factor value of zbuf will be used.
  • The 4 LSB of rev_xstart are set to 0x4, the starting point within the 16 available CP values will be 4.

Resulting operation :

acc(0) = zbuf(2) * CP(4)
acc(1) = zbuf(2) * CP(5)
acc(2) = zbuf(2) * CP(6)
acc(3) = zbuf(2) * CP(7)
acc(4) = zbuf(2) * CP(8)
acc(5) = zbuf(2) * CP(9)
acc(6) = zbuf(2) * CP(10)
acc(7) = zbuf(2) * CP(11)

Example 2 (MSB of rev_xstart=1)

Command : mul8_cfr(xbufa, xbufb, 0x09, 0x0, zbuf, 0x3)

  • zstart is at 0x3, the fourth scaling factor value of zbuf will be used.
  • The 4 LSB of rev_xstart are set to 0x9, the starting point within the 16 available CP values will be 9.
  • The MSB of rev_xstart is set to 1, so the CP values will go from CP(9) down to CP(2) and use the complex conjugates

Resulting operation :

acc(0) = zbuf(3) * conj(CP(9))
acc(1) = zbuf(3) * conj(CP(8))
acc(2) = zbuf(3) * conj(CP(7))
acc(3) = zbuf(3) * conj(CP(6))
acc(4) = zbuf(3) * conj(CP(5))
acc(5) = zbuf(3) * conj(CP(4))
acc(6) = zbuf(3) * conj(CP(3))
acc(7) = zbuf(3) * conj(CP(2))
void split ( int  a,
unsigned  n,
int &  d0,
unsigned &  d1 
)

Intrinsic used to split the 32 bit input data into two resulting variables at the n-th bit.

The split separates the 32 bits of into index info to update CP LUT pointers and intrinsic prepares the magnitude values for further processing in the DPD. The parameters are the following:

Parameters

Parameter Type Comments
a int Input data as a 32bit signed integer.
n unsigned Number of LSBs that shall end up in d1. This must be a compile-time constant
d0 int& Output variable that will contain bits n to 31 of the input. Intended as an index and is a signed number (sign extended).
d1 unsigned&Output variable that will contain bits 0 to n-1 of the input.

Example :

Command : split(data, 6, out1, out2)

We will imagine that data = 0x44FA, which gives the following operation :

data = 0100 0100 11|11 1010 (split after the n-th LSB, which is 6 in this example)

This gives :

out0 = 0000 0001 0001 0011
out1 = 0000 0000 0011 1010

Crest Factor Reduction Application

For Peak Cancellation CFR, one of the two input streams into an AI Engine is dedicated to communicate 32 bit metadata samples. The 27 MSB of a metadata sample are used for Cancellation Pulse (CP) LUT indexing, and the 5 LSB provide configuration information for the subsequent mul or mac operation.

See below for more information on how 5 LSB are used for configuring mul and mac operation:

v8cacc48 mul8_cfr(v16cint16 xbufa,v16cint16 xbufb, int rev_xstart, int xrot, v8cint16 zbuf, int zstart)

v8cacc48 mac8_cfr(v8cacc48 acc, v16cint16 xbufa,v16cint16 xbufb, int rev_xstart, int xrot, v8cint16 zbuf, int zstart)

Digital Pre-Distortion Appliction

The split intrinsic used in DPD applications is slightly different and has an additional parameter:

void split(int mag, int frac_bits, int lut_width, int& idx, unsigned& frac)