AI Engine API User Guide (AIE) 2022.1
Matrix Multiplication

Overview

The AIE API encapsulates the matrix multiplication functionality in the aie::mmul class template. This class template is parametrized with the matrix multiplication shape (MxKxN), the data types and, optionally, the requested accmululation precision. The resulting class defines a function that performs the multiplication and a data type for the result that can be converted to an accumulator/vector. The function interprets the input vectors as matrices as described by the shape parameters.

The following code snippet shows a sample blocked multiplication using the aie::mmul class. The matrices are assumed to be pre-tiled as defined by the mmul shape (MxK for A, KxN for B, and MxN for C).

template <unsigned M, unsigned K, unsigned N>
void mmul_blocked(unsigned rowA, unsigned colA, unsigned colB,
const int16 * __restrict pA, const int16 * __restrict pB, int16 * __restrict pC)
{
for (unsigned z = 0; z < rowA; z += 2) chess_loop_range(2,) {
int16 * __restrict pC1 = pC + ( z * colB + 0) * MMUL::size_C;
int16 * __restrict pC2 = pC + ((z + 1) * colB + 0) * MMUL::size_C;
for (unsigned j = 0; j < colB; j += 2) chess_loop_range(2,) {
const int16 * __restrict pA1 = pA + ( z * colA + 0) * MMUL::size_A;
const int16 * __restrict pA2 = pA + ((z + 1) * colA + 0) * MMUL::size_A;
const int16 * __restrict pB1 = pB + ( 0 * colB + j) * MMUL::size_B;
const int16 * __restrict pB2 = pB + ( 0 * colB + (j + 1)) * MMUL::size_B;
aie::vector<int16, MMUL::size_A> A0 = aie::load_v<MMUL::size_A>(pA1); pA1 += MMUL::size_A;
aie::vector<int16, MMUL::size_A> A1 = aie::load_v<MMUL::size_A>(pA2); pA2 += MMUL::size_A;
aie::vector<int16, MMUL::size_B> B0 = aie::load_v<MMUL::size_B>(pB1); pB1 += MMUL::size_B * colB;
aie::vector<int16, MMUL::size_B> B1 = aie::load_v<MMUL::size_B>(pB2); pB2 += MMUL::size_B * colB;
MMUL C00; C00.mul(A0, B0);
MMUL C01; C01.mul(A0, B1);
MMUL C10; C10.mul(A1, B0);
MMUL C11; C11.mul(A1, B1);
for (unsigned i = 1; i < colA; ++i) chess_prepare_for_pipelining chess_loop_range(3,) {
A0 = aie::load_v<MMUL::size_A>(pA1); pA1 += MMUL::size_A;
A1 = aie::load_v<MMUL::size_A>(pA2); pA2 += MMUL::size_A;
B0 = aie::load_v<MMUL::size_B>(pB1); pB1 += MMUL::size_B * colB;
B1 = aie::load_v<MMUL::size_B>(pB2); pB2 += MMUL::size_B * colB;
C00.mac(A0, B0);
C01.mac(A0, B1);
C10.mac(A1, B0);
C11.mac(A1, B1);
}
aie::store_v(pC1, C00.template to_vector<int16>()); pC1 += MMUL::size_C;
aie::store_v(pC1, C01.template to_vector<int16>()); pC1 += MMUL::size_C;
aie::store_v(pC2, C10.template to_vector<int16>()); pC2 += MMUL::size_C;
aie::store_v(pC2, C11.template to_vector<int16>()); pC2 += MMUL::size_C;
}
}
}
T1 * store_v(T1 *ptr, const vector< T2, Elems > &v)
Definition: aie.hpp:845
Definition: aie_declaration.hpp:81
Definition: aie_declaration.hpp:68
int16_t int16
Definition: types.hpp:63

Classes

struct  aie::mmul< M, N, K, TypeA, TypeB, AccumTag >
 

Supported matrix multiplication shapes

Matrix multiplication modes for real types
8b x 8b16b x 8b8b x 16b16b x 16b32b x 16b16b x 32b32b x 32bfloat
4x8x4
4x16x4
8x8x4
2x8x8
4x8x8
2x16x8
4x16x8
4x4x4
8x4x4
4x8x4
4x4x8
4x4x8
4x4x4
4x4x4
2x4x8
4x4x8
4x2x8
2x4x8
4x4x4
4x2x4
2x2x4
2x4x4
4x4x2
2x2x8
4x2x2
2x4x8
4x4x4
4x2x4
2x2x2
2x4x2
2x8x2
4x2x2
4x4x2
2x4x4
4x2x4
2x2x2
2x4x2
2x8x2
4x2x2
4x4x2
2x4x4
Matrix multiplication modes for complex types (c16b/c32b/cfloat represent complex types)
16b x c16b16b x c32bc16b x 16bc16b x c16bc16b x 32b c16b x c32b32b x c16b32b x c32bc32b x 16bc32b x c16b c32b x 32bc32b x c32b float x cfloatcfloat x floatcfloat x cfloat
4x2x2
4x4x4
2x4x2
2x4x4
2x8x2
4x4x2
2x2x4
2x2x8
2x4x4
2x4x8
4x2x4
4x4x2
4x4x4
2x2x2
2x4x2
2x8x2
2x4x4
4x2x2
4x4x2
4x2x4
2x2x2
2x4x2
2x8x2
2x4x4
4x2x2
4x4x2
4x2x4
2x2x2
2x4x2
2x2x2
2x4x2
2x8x2
2x4x4
4x2x2
4x4x2
4x2x4
2x2x2
2x4x2
2x4x2
2x8x2
2x4x4
4x4x2
2x2x2
2x4x2
1x2x2
2x2x2
2x4x2
1x2x2
2x2x1
2x2x2
2x2x2
2x4x2
2x2x2
2x4x2
2x2x2
2x2x4
2x4x2
4x2x2
Matrix multiplication modes for real types
8b x 4b8b x 8b16b x 8b8b x 16b16b x 16b32b x 16b16b x 32b32b x 32bbfloat16 x bfloat16
4x16x16 4x8x8
8x8x8
8x4x8
4x8x8
4x4x8
8x2x8
8x2x8
4x4x8
4x2x8
4x4x8
4x4x8 4x2x8
4x4x8
8x2x8

Matrix multiplication modes for real types (sparse B matrix)
8b x 4b8b x 8b16b x 8b16b x 16bbfloat16 x bfloat16

Matrix multiplication modes for complex types (c16b/c32b represent complex types)
c16b x 16bc16b x c16bc32b x c16bc32b x c32b
<td style="vertical-align:top">
1x4x8<br/> 2x2x16
<td style="vertical-align:top">
1x2x4<br/> 1x2x8<br/> 1x2x16
<td style="vertical-align:top">
1x2x8

Class Documentation

◆ aie::mmul

struct aie::mmul
template<unsigned M, unsigned N, unsigned K, typename TypeA, typename TypeB = TypeA, typename AccumTag = accauto>
struct aie::mmul< M, N, K, TypeA, TypeB, AccumTag >

Type that encapsulates a blocked matrix multiplication C = A x B

Objects of this type encapsulate the current result of the multiplication. The first result is computed with the mul method. New multiplications can be accumulated using the mac method.

Template Parameters
M_ElemsRows in matrix A.
K_ElemsColumns in matrix A / Rows in matrix B.
N_ElemsColumns in matrix B.
TypeAType of the elements in matrix A. It must meet ElemBaseType.
TypeBType of the elements in matrix B. By default is the same as TypeA. It must meet ElemBaseType.
AccumTagType of the elements of the accumulator that contains the results to be written in matrix C. It must meet AccumElemBaseType. If not specified, it uses the default accumulation type for multiplications of TypeA x TypeB.
Inheritance diagram for aie::mmul< M, N, K, TypeA, TypeB, AccumTag >:
aie::detail::mmul< M_Elems, K_Elems, N_Elems, TypeA, TypeA, detail::to_native_accum_bits_for_mul_types_tag< TypeA, TypeA, accauto >()>

Public Types

using accum_type = typename mmul_impl::accum_type
 
using mmul_impl = detail::mmul< M_Elems, K_Elems, N_Elems, TypeA, TypeB, detail::to_native_accum_bits_for_mul_types_tag< TypeA, TypeB, AccumTag >()>
 

Public Member Functions

 mmul ()
  More...
 
 mmul (const accum_type &acc)
  More...
 
template<typename T >
 mmul (const vector< T, M *N > &v, int shift=0)
  More...
 
template<VectorOrOp VecA, VectorOrOp VecB>
void mac (const VecA &a, const VecB &b)
  More...
 
template<VectorOrOp VecA, VectorOrOp VecB>
void mul (const VecA &a, const VecB &b)
  More...
 
 operator accum_type () const
  More...
 
accum_type to_accum () const
  More...
 
template<typename T >
vector< T, M *Nto_vector (int shift=0) const
  More...
 

Static Public Member Functions

static constexpr unsigned size ()
  More...
 

Static Public Attributes

static constexpr unsigned K = K_Elems
  More...
 
static constexpr unsigned M = M_Elems
  More...
 
static constexpr unsigned N = N_Elems
  More...
 
static constexpr unsigned size_A = M * K
  More...
 
static constexpr unsigned size_B = K * N
  More...
 
static constexpr unsigned size_C = M * N
  More...
 

Constructor & Destructor Documentation

◆ mmul() [1/3]

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mmul ( )
inline

Constructor. Data is undefined.

◆ mmul() [2/3]

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mmul ( const accum_type acc)
inline

Constructor. Data is initialized from the given accumulator.

Parameters
accAccumulator data is initialized from.

◆ mmul() [3/3]

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
template<typename T >
aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mmul ( const vector< T, M *N > &  v,
int  shift = 0 
)
inline

Constructor. Data is initialized from the given vector.

Parameters
vVector data is initialized from.
shiftUpshift in bits to be applied to input data. This parameter is ignored for floating-point types.

Member Function Documentation

◆ mac()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
template<VectorOrOp VecA, VectorOrOp VecB>
void aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mac ( const VecA &  a,
const VecB &  b 
)
inline

Multiply the two given matrices and add it to the result.

Parameters
aVector that represents the A input matrix. The number of elements must be M * N. Must meet VectorOrOp.
bVector that represents the B input matrix. The number of elements must be N * K. Must meet VectorOrOp.

◆ mul()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
template<VectorOrOp VecA, VectorOrOp VecB>
void aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mul ( const VecA &  a,
const VecB &  b 
)
inline

Initialize the result value with the multiplication of the two given matrices.

Parameters
aVector that represents the A input matrix. The number of elements must be M * N. Must meet VectorOrOp.
bVector that represents the B input matrix. The number of elements must be N * K. Must meet VectorOrOp.

◆ operator accum_type()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::operator accum_type ( ) const
inline

Conversion operator to accumulator.

◆ size()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
static constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size ( )
inlinestaticconstexpr

Returns number of elements in matrix C

◆ to_accum()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
accum_type aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::to_accum ( ) const
inline

Return the result of the multiplication as an accumulator.

◆ to_vector()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
template<typename T >
vector<T, M * N> aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::to_vector ( int  shift = 0) const
inline

Return the result of the multiplication as an accumulator.

Parameters
shiftDownshift in bits to be applied to output data. This parameter is ignored for floating-point types.

Member Data Documentation

◆ K

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::K = K_Elems
staticconstexpr

Number of columns in matrix A, and number of rows in matrix B.

◆ M

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::M = M_Elems
staticconstexpr

Number of rows in matrix A.

◆ N

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::N = N_Elems
staticconstexpr

Number of columns in matrix B.

◆ size_A

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size_A = M * K
staticconstexpr

Number of elements in matrix A

◆ size_B

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size_B = K * N
staticconstexpr

Number of elements in matrix B

◆ size_C

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>
constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size_C = M * N
staticconstexpr

Number of elements in matrix C