Vector update intrinsic functions allow substitution of the lanes within a vector value. More...

Overview

Vector update intrinsic functions allow substitution of the lanes within a vector value.

Below the buffer sizes are as follows: V - 128 bit W - 256 bit X - 512 bit Y - 1024 bit

For more information see Integer Vector Types.

Note: All intrinsics require a compile time constant for the idx parameter.

load_hi and load_lo intrinsic functions

Updates the a 48/80 bit accumulator from a 64/128 bit integer vector. Uses the saturation mode set by the user. load_lo updates the lower lanes of the accumulator and load_hi the higher ones.

upd_hi and upd_lo intrinsic functions

Update the top half or bottom half of the lanes within a data type.

upd_v({W,X,Y} buf,int idx,V val) 128-bit intrinsic functions

upd_v(buf,0...7,val) update the successive 128-bit lanes into a 256/512/1024 bit vector.

In the following example, a large 32-way complex vector is updated 4 elements at-a-time using a 128-bit update. The notation "XX++" in comments shows the starting element index of the lane being updated. While the notation "XX.." shows that the lane update was initiated in a prior cycle. This example also shows that the updates can be pipelined.

const v4cint16 * input  = d_in;
v32cint16 sbuff = undef_v32cint16();
...
sbuff = upd_v(sbuff,0, *input++);   // 00++|____|____|____    ____|____|____|____
sbuff = upd_v(sbuff,1, *input++);   // 00..|04++|____|____    ____|____|____|____

upd_w({X,Y} buf,int idx,W val) 256-bit intrinsic functions

upd_w(buf,0...3,val) update the successive 256-bit lanes into a 512/1024 bit vector.

The following example shows the update of a large 32-way complex vector 8 elements at-a-time using a 256-bit update. These updates are also pipelined.

const v8cint16 * input  = d_in;
v32cint16 sbuff = undef_v32cint16();
...
sbuff = upd_w(sbuff,0, *input_++);   // 00++|04++|____|____    ____|____|____|____
sbuff = upd_w(sbuff,1, *input_++);   // 00..|04..|08++|12++    ____|____|____|____

upd_x(Y buf,int idx,X val) 512-bit intrinsic functions

upd_x(buf,0...1,val) update the successive 512-bit lanes into a 1024 bit vector.

The following example shows the update of a large 32-way complex vector 16 elements at-a-time using a 512-bit update. These updates are also pipelined.

const v16cint16 * input  = d_in;
v32cint16 sbuff = undef_v32cint16();
...
sbuff = upd_x(sbuff,0, *input_++);   // 00++|04++|08++|12++    ____|____|____|____
sbuff = upd_x(sbuff,1, *input_++);   // 00..|04..|08..|12..    16++|20++|24++|28++

Accumulator load operations
v8acc48	load_lo (v8acc48 acc, v4int64 vec)
	Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 4 lower lanes.

v8acc48	load_hi (v8acc48 acc, v4int64 vec)
	Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 4 higher lanes.

v4cacc48	load_lo (v4cacc48 acc, v2cint64 vec)
	Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 2 lower lanes.

v4cacc48	load_hi (v4cacc48 acc, v2cint64 vec)
	Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 2 higher lanes.

v4acc80	load_lo (v4acc80 acc, v2int128 vec)
	Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 2 lower lanes.

v4acc80	load_hi (v4acc80 acc, v2int128 vec)
	Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 2 higher lanes.

v2cacc80	load_lo (v2cacc80 acc, v1cint128 vec)
	Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 1 lower lanes.

v2cacc80	load_hi (v2cacc80 acc, v1cint128 vec)
	Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 1 higher lanes.

Vector Unpacking Operations
v16int16	unpack (v16int8 vec)
	Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

v16int16	unpack (v16uint8 vec)
	Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

v32int16	unpack (v32int8 vec)
	Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

v32int16	unpack (v32uint8 vec)
	Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

128-bit lane update into 256-bit vector
Update the successive 128-bit lanes in a 256-bit vector. idx parameter must be a compile time constant.
promotion v32int8	upd_v (v32int8, int, v16int8)
	Update a 256-bit vector 16 elements at a time using a 128-bit update.

promotion v32uint8	upd_v (v32uint8, int, v16uint8)
	Update a 256-bit vector 16 elements at a time using a 128-bit update.

promotion v16int16	upd_v (v16int16, int, v8int16)
	Update a 256-bit vector 8 elements at a time using a 128-bit update.

promotion v8cint16	upd_v (v8cint16, int, v4cint16)
	Update a 256-bit vector 4 elements at a time using a 128-bit update.

promotion v8int32	upd_v (v8int32, int, v4int32)
	Update a 256-bit vector 4 elements at a time using a 128-bit update.

promotion v4cint32	upd_v (v4cint32, int, v2cint32)
	Update a 256-bit vector 2 elements at a time using a 128-bit update.

promotion v8float	upd_v (v8float, int, v4float)
	Update a 256-bit vector 4 elements at a time using a 128-bit update.

promotion v4cfloat	upd_v (v4cfloat, int, v2cfloat)
	Update a 256-bit vector 2 elements at a time using a 128-bit update.

128-bit lane update into 512-bit vector
Update the successive 128-bit lanes in a 512-bit vector. idx parameter must be a compile time constant.
promotion v64int8	upd_v (v64int8, int, v16int8)
	Update a 512-bit vector 16 elements at a time using a 128-bit update.

promotion v64uint8	upd_v (v64uint8, int, v16uint8)
	Update a 512-bit vector 16 elements at a time using a 128-bit update.

promotion v32int16	upd_v (v32int16, int, v8int16)
	Update a 512-bit vector 8 elements at a time using a 128-bit update.

promotion v16cint16	upd_v (v16cint16, int, v4cint16)
	Update a 512-bit vector 4 elements at a time using a 128-bit update.

promotion v16int32	upd_v (v16int32, int, v4int32)
	Update a 512-bit vector 4 elements at a time using a 128-bit update.

promotion v8cint32	upd_v (v8cint32, int, v2cint32)
	Update a 512-bit vector 2 elements at a time using a 128-bit update.

promotion v16float	upd_v (v16float, int, v4float)
	Update a 512-bit vector 4 elements at a time using a 128-bit update.

promotion v8cfloat	upd_v (v8cfloat, int, v2cfloat)
	Update a 512-bit vector 2 elements at a time using a 128-bit update.

128-bit lane update into 1024-bit vector
Set 128-bit lanes in a 1024-bit vector.
promotion v128int8	upd_v (v128int8, int, v16int8)
	Update a 1024-bit vector 16 elements at a time using a 128-bit update.

promotion v128uint8	upd_v (v128uint8, int, v16uint8)
	Update a 1024-bit vector 16 elements at a time using a 128-bit update.

promotion v64int16	upd_v (v64int16, int, v8int16)
	Update a 1024-bit vector 8 elements at a time using a 128-bit update.

promotion v32cint16	upd_v (v32cint16, int, v4cint16)
	Update a 1024-bit vector 4 elements at a time using a 128-bit update.

promotion v32int32	upd_v (v32int32, int, v4int32)
	Update a 1024-bit vector 4 elements at a time using a 128-bit update.

promotion v16cint32	upd_v (v16cint32, int, v2cint32)
	Update a 1024-bit vector 2 elements at a time using a 128-bit update.

promotion v32float	upd_v (v32float, int, v4float)
	Update a 1024-bit vector 4 elements at a time using a 128-bit update.

promotion v16cfloat	upd_v (v16cfloat, int, v2cfloat)
	Update a 1024-bit vector 2 elements at a time using a 128-bit update.

promotion v128int8	yset_v (int, v16int8)
	Creates a new 1024-bit vector with 16 elements already set using a 128-bit set.

promotion v128uint8	yset_v (int, v16uint8)
	Creates a new 1024-bit vector with 16 elements already set using a 128-bit set.

promotion v64int16	yset_v (int, v8int16)
	Creates a new 1024-bit vector with 8 elements already set using a 128-bit set.

promotion v32cint16	yset_v (int, v4cint16)
	Creates a new 1024-bit vector with 4 elements already set using a 128-bit set.

promotion v32int32	yset_v (int, v4int32)
	Creates a new 1024-bit vector with 4 elements already set using a 128-bit set.

promotion v16cint32	yset_v (int, v2cint32)
	Creates a new 1024-bit vector with 2 elements already set using a 128-bit set.

promotion v32float	yset_v (int, v4float)
	Creates a new 1024-bit vector with 4 elements already set using a 128-bit set.

promotion v16cfloat	yset_v (int, v2cfloat)
	Creates a new 1024-bit vector with 2 elements already set using a 128-bit set.

256-bit lane update into 512-bit vector
Update 256-bit lanes in a 512-bit vector. idx parameter must be a compile time constant.
promotion v64int8	upd_w (v64int8, int, v32int8)
	Update a 512-bit vector 32 elements at a time using a 256-bit update.

promotion v64uint8	upd_w (v64uint8, int, v32uint8)
	Update a 512-bit vector 32 elements at a time using a 256-bit update.

promotion v32int16	upd_w (v32int16, int, v16int16)
	Update a 512-bit vector 16 elements at a time using a 256-bit update.

promotion v16cint16	upd_w (v16cint16, int, v8cint16)
	Update a 512-bit vector 8 elements at a time using a 256-bit update.

promotion v16int32	upd_w (v16int32, int, v8int32)
	Update a 512-bit vector 8 elements at a time using a 256-bit update.

promotion v8cint32	upd_w (v8cint32, int, v4cint32)
	Update a 512-bit vector 4 elements at a time using a 256-bit update.

promotion v16float	upd_w (v16float, int, v8float)
	Update a 512-bit vector 8 elements at a time using a 256-bit update.

promotion v8cfloat	upd_w (v8cfloat, int, v4cfloat)
	Update a 512-bit vector 4 elements at a time using a 256-bit update.

256-bit lane update into 1024-bit vector
Update 256-bit lanes in a 1024-bit vector.
promotion v128int8	upd_w (v128int8, int, v32int8)
	Update a 1024-bit vector 32 elements at a time using a 256-bit update.

promotion v128uint8	upd_w (v128uint8, int, v32uint8)
	Update a 1024-bit vector 32 elements at a time using a 256-bit update.

promotion v64int16	upd_w (v64int16, int, v16int16)
	Update a 1024-bit vector 16 elements at a time using a 256-bit update.

promotion v32cint16	upd_w (v32cint16, int, v8cint16)
	Update a 1024-bit vector 8 elements at a time using a 256-bit update.

promotion v32int32	upd_w (v32int32, int, v8int32)
	Update a 1024-bit vector 8 elements at a time using a 256-bit update.

promotion v16cint32	upd_w (v16cint32, int, v4cint32)
	Update a 1024-bit vector 4 elements at a time using a 256-bit update.

promotion v32float	upd_w (v32float, int, v8float)
	Update a 1024-bit vector 8 elements at a time using a 256-bit update.

promotion v16cfloat	upd_w (v16cfloat, int, v4cfloat)
	Update a 1024-bit vector 4 elements at a time using a 256-bit update.

512-bit lane update into 1024-bit vector
Update 512-bit lanes in a 1024-bit vector. idx parameter must be a compile time constant.
promotion v128int8	upd_x (v128int8, int, v64int8)
	Update a 1024-bit vector 64 elements at a time using a 512-bit update.

promotion v128uint8	upd_x (v128uint8, int, v64uint8)
	Update a 1024-bit vector 64 elements at a time using a 512-bit update.

promotion v64int16	upd_x (v64int16, int, v32int16)
	Update a 1024-bit vector 32 elements at a time using a 512-bit update.

promotion v32cint16	upd_x (v32cint16, int, v16cint16)
	Update a 1024-bit vector 16 elements at a time using a 512-bit update.

promotion v32int32	upd_x (v32int32, int, v16int32)
	Update a 1024-bit vector 16 elements at a time using a 512-bit update.

promotion v16cint32	upd_x (v16cint32, int, v8cint32)
	Update a 1024-bit vector 8 elements at a time using a 512-bit update.

promotion v32float	upd_x (v32float, int, v16float)
	Update a 1024-bit vector 16 elements at a time using a 512-bit update.

promotion v16cfloat	upd_x (v16cfloat, int, v8cfloat)
	Update a 1024-bit vector 8 elements at a time using a 512-bit update.

320/384-bit lane update into 640/768-bit accumulator
v16acc48	upd_lo (v16acc48, v8acc48)

v16acc48	upd_hi (v16acc48, v8acc48)

v8cacc48	upd_lo (v8cacc48, v4cacc48)

v8cacc48	upd_hi (v8cacc48, v4cacc48)

v8acc80	upd_lo (v8acc80, v4acc80)

v8acc80	upd_hi (v8acc80, v4acc80)

v4cacc80	upd_lo (v4cacc80, v2cacc80)

v4cacc80	upd_hi (v4cacc80, v2cacc80)

Float accumulator updates
v8accfloat	upd_lo (v8accfloat, v4accfloat)

v16accfloat	upd_lo (v16accfloat, v8accfloat)

v32accfloat	upd_lo (v32accfloat, v16accfloat)

v8accfloat	upd_hi (v8accfloat, v4accfloat)

v16accfloat	upd_hi (v16accfloat, v8accfloat)

v32accfloat	upd_hi (v32accfloat, v16accfloat)

v4caccfloat	upd_lo (v4caccfloat, v2caccfloat)

v8caccfloat	upd_lo (v8caccfloat, v4caccfloat)

v16caccfloat	upd_lo (v4caccfloat, v8caccfloat)

v4caccfloat	upd_hi (v4caccfloat, v2caccfloat)

v8caccfloat	upd_hi (v8caccfloat, v4caccfloat)

v16caccfloat	upd_hi (v4caccfloat, v8caccfloat)

Updates for configuration registers
mac_idx	upd0 (mac_idx, unsigned int)

mac_idx	upd1 (mac_idx, unsigned int)

pmx_idx	upd0 (pmx_idx, unsigned int)

pmx_idx	upd1 (pmx_idx, unsigned int)

pmx_idx	upd2 (pmx_idx, unsigned int)

Function Documentation

v8acc48 load_hi	(	v8acc48	acc,
		v4int64	vec
	)

Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 4 higher lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v4cacc48 load_hi	(	v4cacc48	acc,
		v2cint64	vec
	)

Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 2 higher lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v4acc80 load_hi	(	v4acc80	acc,
		v2int128	vec
	)

Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 2 higher lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v2cacc80 load_hi	(	v2cacc80	acc,
		v1cint128	vec
	)

Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 1 higher lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v8acc48 load_lo	(	v8acc48	acc,
		v4int64	vec
	)

Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 4 lower lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v4cacc48 load_lo	(	v4cacc48	acc,
		v2cint64	vec
	)

Load the 64 bit from the vector saturated in a 48 bit accumulator. This is done for the 2 lower lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v4acc80 load_lo	(	v4acc80	acc,
		v2int128	vec
	)

Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 2 lower lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v2cacc80 load_lo	(	v2cacc80	acc,
		v1cint128	vec
	)

Load the 128 bit from the vector saturated in a 80 bit accumulator. This is done for the 1 lower lanes.

Parameters

acc	Target accumulator
vec	Input vector

Returns: The Updated accumulator

v16int16 unpack ( v16int8 vec )

Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

v16int16 unpack ( v16uint8 vec )

Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

v32int16 unpack ( v32int8 vec )

Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

v32int16 unpack ( v32uint8 vec )

Load and unpack vectors These intrinsics can only be used coupled with a memory access operation. Ex: v16int16 vec = unpack(<em>src);//where src is v16int8

mac_idx upd0	(	mac_idx	,
		unsigned	int
	)

pmx_idx upd0	(	pmx_idx	,
		unsigned	int
	)

mac_idx upd1	(	mac_idx	,
		unsigned	int
	)

pmx_idx upd1	(	pmx_idx	,
		unsigned	int
	)

pmx_idx upd2	(	pmx_idx	,
		unsigned	int
	)

v16acc48 upd_hi	(	v16acc48	,
		v8acc48
	)

v8cacc48 upd_hi	(	v8cacc48	,
		v4cacc48
	)

v8acc80 upd_hi	(	v8acc80	,
		v4acc80
	)

v4cacc80 upd_hi	(	v4cacc80	,
		v2cacc80
	)

v8accfloat upd_hi	(	v8accfloat	,
		v4accfloat
	)

v16accfloat upd_hi	(	v16accfloat	,
		v8accfloat
	)

v32accfloat upd_hi	(	v32accfloat	,
		v16accfloat
	)

v4caccfloat upd_hi	(	v4caccfloat	,
		v2caccfloat
	)

v8caccfloat upd_hi	(	v8caccfloat	,
		v4caccfloat
	)

v16caccfloat upd_hi	(	v4caccfloat	,
		v8caccfloat
	)

v16acc48 upd_lo	(	v16acc48	,
		v8acc48
	)

v8cacc48 upd_lo	(	v8cacc48	,
		v4cacc48
	)

v8acc80 upd_lo	(	v8acc80	,
		v4acc80
	)

v4cacc80 upd_lo	(	v4cacc80	,
		v2cacc80
	)

v8accfloat upd_lo	(	v8accfloat	,
		v4accfloat
	)

v16accfloat upd_lo	(	v16accfloat	,
		v8accfloat
	)

v32accfloat upd_lo	(	v32accfloat	,
		v16accfloat
	)

v4caccfloat upd_lo	(	v4caccfloat	,
		v2caccfloat
	)

v8caccfloat upd_lo	(	v8caccfloat	,
		v4caccfloat
	)

v16caccfloat upd_lo	(	v4caccfloat	,
		v8caccfloat
	)

promotion v32int8 upd_v	(	v32int8	,
		int	,
		v16int8
	)

Update a 256-bit vector 16 elements at a time using a 128-bit update.

v32int8 sbuff = undef_v32int8();