Vitis HLS Libraries Reference

The following foundational C libraries are provided with Vitis HLS:

You can use each of the C libraries in your design by including the library header file in your code. These header files are located in the include directory in the Vitis HLS installation area.

IMPORTANT: The header files for the Vitis HLS C libraries do not have to be in the include path if the design is used in Vitis HLS. The paths to the library header files are automatically added.

Arbitrary Precision Data Types Library

C-based native data types are on 8-bit boundaries (8, 16, 32, 64 bits). RTL buses (corresponding to hardware) support arbitrary lengths. HLS needs a mechanism to allow the specification of arbitrary precision bit-width and not rely on the artificial boundaries of native C data types: if a 17-bit multiplier is required, you should not be forced to implement this with a 32-bit multiplier.

Vitis HLS provides both integer and fixed-point arbitrary precision data types for C++. The advantage of arbitrary precision data types is that they allow the C code to be updated to use variables with smaller bit-widths and then for the C simulation to be re-executed to validate that the functionality remains identical or acceptable.

Using Arbitrary Precision Data Types

Vitis HLS provides arbitrary precision integer data types that manage the value of the integer numbers within the boundaries of the specified width, as shown in the following table.

Table 1. Arbitrary Precision Data Types
Language Integer Data Type Required Header
C++

ap_[u]int<W> (1024 bits)

Can be extended to 32K bits wide.

#include “ap_int.h”
C++ ap_[u]fixed<W,I,Q,O,N> #include “ap_fixed.h”

The header files define the arbitrary precision types are also provided with Vitis HLS as a standalone package with the rights to use them in your own source code. The package, xilinx_hls_lib_<release_number>.tgz, is provided in the include directory in the Vitis HLS installation area.

Arbitrary Integer Precision Types with C++

The header file ap_int.h defines the arbitrary precision integer data type for the C++ ap_[u]int data types. To use arbitrary precision integer data types in a C++ function:

  • Add header file ap_int.h to the source code.
  • Change the bit types to ap_int<N> for signed types or ap_uint<N> for unsigned types, where N is a bit-size from 1 to 1024.

The following example shows how the header file is added and two variables implemented to use 9-bit integer and 10-bit unsigned integer types:


#include "ap_int.h"

void foo_top (…) {
  
 ap_int<9>  var1;           // 9-bit
 ap_uint<10>  var2;         // 10-bit unsigned

Arbitrary Precision Fixed-Point Data Types

In Vitis HLS, it is important to use fixed-point data types, because the behavior of the C++ simulations performed using fixed-point data types match that of the resulting hardware created by synthesis. This allows you to analyze the effects of bit-accuracy, quantization, and overflow with fast C-level simulation.

These data types manage the value of real (non-integer) numbers within the boundaries of a specified total width and integer width, as shown in the following figure.

Figure 1: Fixed-Point Data Type


Fixed-Point Identifier Summary

The following table provides a brief overview of operations supported by fixed-point types.

Table 2. Fixed-Point Identifier Summary
Identifier Description

W

Word length in bits

I

The number of bits used to represent the integer value (the number of bits above the decimal point)
Q Quantization mode: This dictates the behavior when greater precision is generated than can be defined by smallest fractional bit in the variable used to store the result.
ap_fixed Types Description
AP_RND Round to plus infinity
AP_RND_ZERO Round to zero
AP_RND_MIN_INF Round to minus infinity
AP_RND_INF Round to infinity
AP_RND_CONV Convergent rounding
AP_TRN Truncation to minus infinity (default)
AP_TRN_ZERO Truncation to zero
O

Overflow mode: This dictates the behavior when the result of an operation exceeds the maximum (or minimum in the case of negative numbers) possible value that can be stored in the variable used to store the result.

ap_fixed Types Description
AP_SAT Saturation
AP_SAT_ZERO Saturation to zero
AP_SAT_SYM Symmetrical saturation
AP_WRAP Wrap around (default)
AP_WRAP_SM Sign magnitude wrap around
N This defines the number of saturation bits in overflow wrap modes.
Example Using ap_fixed

In this example the Vitis HLS ap_fixed type is used to define an 18-bit variable with 6 bits representing the numbers above the decimal point and 12-bits representing the value below the decimal point. The variable is specified as signed, the quantization mode is set to round to plus infinity and the default wrap-around mode is used for overflow.

#include <ap_fixed.h>
...
ap_fixed<18,6,AP_RND > my_type;
...

C++ Arbitrary Precision Integer Types

The native data types in C++ are on 8-bit boundaries (8, 16, 32 and 64 bits). RTL signals and operations support arbitrary bit-lengths.

Vitis HLS provides arbitrary precision data types for C++ to allow variables and operations in the C++ code to be specified with any arbitrary bit-widths: 6-bit, 17-bit, 234-bit, up to 1024 bits.

TIP: The default maximum width allowed is 1024 bits. You can override this default by defining the macro AP_INT_MAX_W with a positive integer value less than or equal to 32768 before inclusion of the ap_int.h header file.

Arbitrary precision data types have are two primary advantages over the native C++ types:

  • Better quality hardware: If for example, a 17-bit multiplier is required, arbitrary precision types can specify that exactly 17-bit are used in the calculation.

    Without arbitrary precision data types, such a multiplication (17-bit) must be implemented using 32-bit integer data types and result in the multiplication being implemented with multiple DSP modules.

  • Accurate C++ simulation/analysis: Arbitrary precision data types in the C++ code allows the C++ simulation to be performed using accurate bit-widths and for the C++ simulation to validate the functionality (and accuracy) of the algorithm before synthesis.

The arbitrary precision types in C++ have none of the disadvantages of those in C:

  • C++ arbitrary types can be compiled with standard C++ compilers (there is no C++ equivalent of apcc).
  • C++ arbitrary precision types do not suffer from Integer Promotion Issues.

It is not uncommon for users to change a file extension from .c to .cpp so the file can be compiled as C++, where neither of these issues are present.

For the C++ language, the header file ap_int.h defines the arbitrary precision integer data types ap_(u)int<W>. For example, ap_int<8> represents an 8-bit signed integer data type and ap_uint<234> represents a 234-bit unsigned integer type.

The ap_int.h file is located in the directory $HLS_ROOT/include, where $HLS_ROOT is the Vitis HLS installation directory.

The code shown in the following example is a repeat of the code shown in the Basic Arithmetic example in Standard Types. In this example, the data types in the top-level function to be synthesized are specified as dinA_t, dinB_t, and so on.


#include "cpp_ap_int_arith.h"

void cpp_ap_int_arith(din_A  inA, din_B  inB, din_C  inC, din_D  inD,
 dout_1 *out1, dout_2 *out2, dout_3 *out3, dout_4 *out4
) {

 // Basic arithmetic operations
 *out1 = inA * inB;
 *out2 = inB + inA;
 *out3 = inC / inA;
 *out4 = inD % inA;

}

In this latest update to this example, the C++ arbitrary precision types are used:

  • Add header file ap_int.h to the source code.
  • Change the native C++ types to arbitrary precision types ap_int<N> or ap_uint<N>, where N is a bit-size from 1 to 1024 (as noted above, this can be extended to 32K-bits if required).

The data types are defined in the header cpp_ap_int_arith.h.

Compared with the Basic Arithmetic example in Standard Types, the input data types have simply been reduced to represent the maximum size of the real input data (for example, 8-bit input inA is reduced to 6-bit input). The output types have been refined to be more accurate, for example, out2, the sum of inA and inB, need only be 13-bit and not 32-bit.

The following example shows basic arithmetic with C++ arbitrary precision types.


#ifndef _CPP_AP_INT_ARITH_H_
#define _CPP_AP_INT_ARITH_H_

#include <stdio.h>
#include "ap_int.h"

#define N 9

// Old data types
//typedef char dinA_t;
//typedef short dinB_t;
//typedef int dinC_t;
//typedef long long dinD_t;
//typedef int dout1_t;
//typedef unsigned int dout2_t;
//typedef int32_t dout3_t;
//typedef int64_t dout4_t;

typedef ap_int<6> dinA_t;
typedef ap_int<12> dinB_t;
typedef ap_int<22> dinC_t;
typedef ap_int<33> dinD_t;

typedef ap_int<18> dout1_t;
typedef ap_uint<13> dout2_t;
typedef ap_int<22> dout3_t;
typedef ap_int<6> dout4_t;

void cpp_ap_int_arith(dinA_t inA,dinB_t inB,dinC_t inC,dinD_t inD,dout1_t 
*out1,dout2_t *out2,dout3_t *out3,dout4_t *out4);

#endif

If C++ Arbitrary Precision Integer Types are synthesized, it results in a design that is functionally identical to Standard Types. Rather than use the C++ cout operator to output the results to a file, the built-in ap_int method .to_int() is used to convert the ap_int results to integer types used with the standard fprintf function.


fprintf(fp, %d*%d=%d; %d+%d=%d; %d/%d=%d; %d mod %d=%d;\n, 
 inA.to_int(), inB.to_int(), out1.to_int(), 
 inB.to_int(), inA.to_int(), out2.to_int(), 
 inC.to_int(), inA.to_int(), out3.to_int(), 
 inD.to_int(), inA.to_int(), out4.to_int());

C++ Arbitrary Precision Integer Types: Reference Information

For comprehensive information on the methods, synthesis behavior, and all aspects of using the ap_(u)int<N> arbitrary precision data types, see C++ Arbitrary Precision Types. This section includes:

  • Techniques for assigning constant and initialization values to arbitrary precision integers (including values greater than 1024-bit).
  • A description of Vitis HLS helper methods, such as printing, concatenating, bit-slicing and range selection functions.
  • A description of operator behavior, including a description of shift operations (a negative shift values, results in a shift in the opposite direction).
C++ Arbitrary Precision Types

Vitis HLS provides a C++ template class, ap_[u]int<>, that implements arbitrary precision (or bit-accurate) integer data types with consistent, bit-accurate behavior between software and hardware modeling.

This class provides all arithmetic, bitwise, logical and relational operators allowed for native C integer types. In addition, this class provides methods to handle some useful hardware operations, such as allowing initialization and conversion of variables of widths greater than 64 bits. Details for all operators and class methods are discussed below.

Compiling ap_[u]int<> Types

To use the ap_[u]int<> classes, you must include the ap_int.h header file in all source files that reference ap_[u]int<> variables.

When compiling software models that use these classes, it may be necessary to specify the location of the Vitis HLS header files, for example by adding the -I/<HLS_HOME>/include option for g++ compilation.

Declaring/Defining ap_[u] Variables

There are separate signed and unsigned classes:

  • ap_int<int_W> (signed)
  • ap_uint<int_W> (unsigned)

The template parameter int_W specifies the total width of the variable being declared.

User-defined types may be created with the C/C++ typedef statement as shown in the following examples:


include "ap_int.h"// use ap_[u]fixed<> types

typedef ap_uint<128> uint128_t; // 128-bit user defined type
ap_int<96> my_wide_var; // a global variable declaration

The default maximum width allowed is 1024 bits. This default may be overridden by defining the macro AP_INT_MAX_W with a positive integer value less than or equal to 32768 before inclusion of the ap_int.h header file.

CAUTION: Setting the value of AP_INT_MAX_W too High may cause slow software compile and run times.

Following is an example of overriding AP_INT_MAX_W:


#define AP_INT_MAX_W 4096 // Must be defined before next line
#include "ap_int.h"

ap_int<4096> very_wide_var;
Initialization and Assignment from Constants (Literals)

The class constructor and assignment operator overloads, allows initialization of and assignment to ap_[u]fixed<> variables using standard C/C++ integer literals.

This method of assigning values to ap_[u]fixed<> variables is subject to the limitations of C++ and the system upon which the software will run. This typically leads to a 64-bit limit on integer literals (for example, for those LL or ULL suffixes).

To allow assignment of values wider than 64-bits, the ap_[u]fixed<> classes provide constructors that allow initialization from a string of arbitrary length (less than or equal to the width of the variable).

By default, the string provided is interpreted as a hexadecimal value as long as it contains only valid hexadecimal digits (that is, 0-9 and a-f). To assign a value from such a string, an explicit C++ style cast of the string to the appropriate type must be made.

Following are examples of initialization and assignments, including for values greater than 64-bit, are:


ap_int<42> a_42b_var(-1424692392255LL); // long long decimal format
a_42b_var = 0x14BB648B13FLL; // hexadecimal format

a_42b_var = -1; // negative int literal sign-extended to full width

ap_uint<96> wide_var(“76543210fedcba9876543210”, 16); // Greater than 64-bit
wide_var = ap_int<96>(“0123456789abcdef01234567”, 16);
Note: To avoid unexpected behavior during co-simulation, do not initialize ap_uint<N> a ={0}.

The ap_[u]<> constructor may be explicitly instructed to interpret the string as representing the number in radix 2, 8, 10, or 16 formats. This is accomplished by adding the appropriate radix value as a second parameter to the constructor call.

A compilation error occurs if the string literal contains any characters that are invalid as digits for the radix specified.

The following examples use different radix formats:


ap_int<6> a_6bit_var(“101010”, 2); // 42d in binary format
a_6bit_var = ap_int<6>(“40”, 8); // 32d in octal format
a_6bit_var = ap_int<6>(“55”, 10); // decimal format
a_6bit_var = ap_int<6>(“2A”, 16); // 42d in hexadecimal format

a_6bit_var = ap_int<6>(“42”, 2);   // COMPILE-TIME ERROR! “42” is not binary

The radix of the number encoded in the string can also be inferred by the constructor, when it is prefixed with a zero (0) followed by one of the following characters: “b”, “o” or “x”. The prefixes “0b”, “0o” and “0x” correspond to binary, octal and hexadecimal formats respectively.

The following examples use alternate initializer string formats:


ap_int<6> a_6bit_var(“0b101010”, 2); // 42d in binary format
a_6bit_var = ap_int<6>(“0o40”, 8); // 32d in octal format
a_6bit_var = ap_int<6>(“0x2A”, 16); // 42d in hexidecimal format

a_6bit_var = ap_int<6>(“0b42”, 2); // COMPILE-TIME ERROR! “42” is not binary

If the bit-width is greater than 53-bits, the ap_[u]fixed value must be initialized with a string, for example:


       ap_ufixed<72,10> Val(“2460508560057040035.375”);
Support for Console I/O (Printing)

As with initialization and assignment to ap_[u]fixed<> variables, Vitis HLS supports printing values that require more than 64-bits to represent.

Using the C++ Standard Output Stream

The easiest way to output any value stored in an ap_[u]int variable is to use the C++ standard output stream:

std::cout (#include <iostream> or <iostream.h>)

The stream insertion operator (<<) is overloaded to correctly output the full range of values possible for any given ap_[u]fixed variable. The following stream manipulators are also supported:

  • dec (decimal)
  • hex (hexadecimal)
  • oct (octal)

These allow formatting of the value as indicated.

The following example uses cout to print values:


#include <iostream.h>
// Alternative: #include <iostream>

ap_ufixed<72> Val(“10fedcba9876543210”);

cout << Val << endl; // Yields: “313512663723845890576”
cout << hex << val << endl; // Yields: “10fedcba9876543210”
cout << oct << val << endl; // Yields: “41773345651416625031020”
Using the Standard C Library

You can also use the standard C library (#include <stdio.h>) to print out values larger than 64-bits:

  1. Convert the value to a C++ std::string using the ap_[u]fixed classes method to_string().
  2. Convert the result to a null-terminated C character string using the std::string class method c_str().
Optional Argument One (Specifying the Radix)

You can pass the ap[u]int::to_string() method an optional argument specifying the radix of the numerical format desired. The valid radix argument values are:

  • 2 (binary) (default)
  • 8 (octal)
  • 10 (decimal)
  • 16 (hexadecimal)
Optional Argument Two (Printing as Signed Values)

A second optional argument to ap_[u]int::to_string() specifies whether to print the non-decimal formats as signed values. This argument is boolean. The default value is false, causing the non-decimal formats to be printed as unsigned values.

The following examples use printf to print values:


ap_int<72> Val(“80fedcba9876543210”);

printf(“%s\n”, Val.to_string().c_str()); // => “80FEDCBA9876543210”
printf(“%s\n”, Val.to_string(10).c_str()); // => “-2342818482890329542128”
printf(“%s\n”, Val.to_string(8).c_str()); // => “401773345651416625031020” 
printf(“%s\n”, Val.to_string(16, true).c_str()); // => “-7F0123456789ABCDF0”
Expressions Involving ap_[u]<> types

Variables of ap_[u]<> types may generally be used freely in expressions involving C/C++ operators. Some behaviors may be unexpected. These are discussed in detail below.

Zero- and Sign-Extension on Assignment From Narrower to Wider Variables

When assigning the value of a narrower bit-width signed (ap_int<>) variable to a wider one, the value is sign-extended to the width of the destination variable, regardless of its signedness.

Similarly, an unsigned source variable is zero-extended before assignment.

Explicit casting of the source variable may be necessary to ensure expected behavior on assignment. See the following example:


ap_uint<10> Result;

ap_int<7> Val1 = 0x7f;
ap_uint<6> Val2 = 0x3f;

Result = Val1; // Yields: 0x3ff (sign-extended)
Result = Val2; // Yields: 0x03f (zero-padded)

Result = ap_uint<7>(Val1); // Yields: 0x07f (zero-padded)
Result = ap_int<6>(Val2); // Yields: 0x3ff (sign-extended)
Truncation on Assignment of Wider to Narrower Variables

Assigning the value of a wider source variable to a narrower one leads to truncation of the value. All bits beyond the most significant bit (MSB) position of the destination variable are lost.

There is no special handling of the sign information during truncation. This may lead to unexpected behavior. Explicit casting may help avoid this unexpected behavior.

Class Methods and Operators

The ap_[u]int types do not support implicit conversion from wide ap_[u]int (>64bits) to builtin C/C++ integer types. For example, the following code example return s1, because the implicit cast from ap_int[65] to bool in the if-statement returns a 0.

   bool nonzero(ap_uint<65> data) {
      return data; // This leads to implicit truncation to 64b int
    }

   int main() {
     if (nonzero((ap_uint<65>)1 << 64)) {
        return 0;
     }
     printf(FAIL\n);
     return 1;
   }

To convert wide ap_[u]int types to built-in integers, use the explicit conversion functions included with the ap_[u]int types:

  • to_int()
  • to_long()
  • to_bool()

In general, any valid operation that can be done on a native C/C++ integer data type is supported using operator overloading for ap_[u]int types.

In addition to these overloaded operators, some class specific operators and methods are included to ease bit-level operations.

Binary Arithmetic Operators

Standard binary integer arithmetic operators are overloaded to provide arbitrary precision arithmetic. These operators take either:

  • Two operands of ap_[u]int, or
  • One ap_[u]int type and one C/C++ fundamental integer data type

For example:

  • char
  • short
  • int

The width and signedness of the resulting value is determined by the width and signedness of the operands, before sign-extension, zero-padding or truncation are applied based on the width of the destination variable (or expression). Details of the return value are described for each operator.

When expressions contain a mix of ap_[u]int and C/C++ fundamental integer types, the C++ types assume the following widths:

  • char (8-bits)
  • short (16-bits)
  • int (32-bits)
  • long (32-bits)
  • long long (64-bits)
Addition
ap_(u)int::RType ap_(u)int::operator + (ap_(u)int op)

Returns the sum of:

  • Two ap_[u]int, or
  • One ap_[u]int and a C/C++ integer type

The width of the sum value is:

  • One bit more than the wider of the two operands, or
  • Two bits if and only if the wider is unsigned and the narrower is signed

The sum is treated as signed if either (or both) of the operands is of a signed type.

Subtraction
ap_(u)int::RType ap_(u)int::operator - (ap_(u)int op)

Returns the difference of two integers.

The width of the difference value is:

  • One bit more than the wider of the two operands, or
  • Two bits if and only if the wider is unsigned and the narrower signed

This is true before assignment, at which point it is sign-extended, zero-padded, or truncated based on the width of the destination variable.

The difference is treated as signed regardless of the signedness of the operands.

Multiplication
ap_(u)int::RType ap_(u)int::operator * (ap_(u)int op)

Returns the product of two integer values.

The width of the product is the sum of the widths of the operands.

The product is treated as a signed type if either of the operands is of a signed type.

Division
ap_(u)int::RType ap_(u)int::operator / (ap_(u)int op)

Returns the quotient of two integer values.

The width of the quotient is the width of the dividend if the divisor is an unsigned type. Otherwise, it is the width of the dividend plus one.

The quotient is treated as a signed type if either of the operands is of a signed type.

Modulus
ap_(u)int::RType ap_(u)int::operator % (ap_(u)int op)

Returns the modulus, or remainder of integer division, for two integer values.

The width of the modulus is the minimum of the widths of the operands, if they are both of the same signedness.

If the divisor is an unsigned type and the dividend is signed, then the width is that of the divisor plus one.

The quotient is treated as having the same signedness as the dividend.

IMPORTANT: Vitis HLS synthesis of the modulus (%) operator will lead to lead to instantiation of appropriately parameterized Xilinx LogiCORE divider cores in the generated RTL.

Following are examples of arithmetic operators:


ap_uint<71> Rslt;

ap_uint<42> Val1 = 5;
ap_int<23> Val2 = -8;

Rslt = Val1 + Val2; // Yields: -3 (43 bits) sign-extended to 71 bits
Rslt = Val1 - Val2; // Yields: +3 sign extended to 71 bits
Rslt = Val1 * Val2; // Yields: -40 (65 bits) sign extended to 71 bits
Rslt = 50 / Val2; // Yields: -6 (33 bits) sign extended to 71 bits
Rslt = 50 % Val2; // Yields: +2 (23 bits) sign extended to 71 bits
Bitwise Logical Operators

The bitwise logical operators all return a value with a width that is the maximum of the widths of the two operands. It is treated as unsigned if and only if both operands are unsigned. Otherwise, it is of a signed type.

Sign-extension (or zero-padding) may occur, based on the signedness of the expression, not the destination variable.

Bitwise OR
ap_(u)int::RType ap_(u)int::operator | (ap_(u)int op)

Returns the bitwise OR of the two operands.

Bitwise AND
ap_(u)int::RType ap_(u)int::operator & (ap_(u)int op)

Returns the bitwise AND of the two operands.

Bitwise XOR
ap_(u)int::RType ap_(u)int::operator ^ (ap_(u)int op)

Returns the bitwise XOR of the two operands.

Unary Operators
Addition
ap_(u)int ap_(u)int::operator + ()

Returns the self copy of the ap_[u]int operand.

Subtraction
ap_(u)int::RType ap_(u)int::operator - ()

Returns the following:

  • The negated value of the operand with the same width if it is a signed type, or
  • Its width plus one if it is unsigned.

The return value is always a signed type.

Bitwise Inverse
ap_(u)int::RType ap_(u)int::operator ~ ()

Returns the bitwise-NOT of the operand with the same width and signedness.

Logical Invert
bool ap_(u)int::operator ! ()

Returns a Boolean false value if and only if the operand is not equal to zero (0).

Returns a Boolean true value if the operand is equal to zero (0).

Ternary Operators

When you use the ternary operator with the standard C int type, you must explicitly cast from one type to the other to ensure that both results have the same type. For example:

// Integer type is cast to ap_int type
ap_int<32> testc3(int a, ap_int<32> b, ap_int<32> c, bool d) {
 return d?ap_int<32>(a):b;
}
// ap_int type is cast to an integer type
ap_int<32> testc4(int a, ap_int<32> b, ap_int<32> c, bool d) {
 return d?a+1:(int)b;
}
// Integer type is cast to ap_int type
ap_int<32> testc5(int a, ap_int<32> b, ap_int<32> c, bool d) {
 return d?ap_int<33>(a):b+1;
}
Shift Operators

Each shift operator comes in two versions:

  • One version for unsigned right-hand side (RHS) operands
  • One version for signed right-hand side (RHS) operands

A negative value supplied to the signed RHS versions reverses the shift operations direction. That is, a shift by the absolute value of the RHS operand in the opposite direction occurs.

The shift operators return a value with the same width as the left-hand side (LHS) operand. As with C/C++, if the LHS operand of a shift-right is a signed type, the sign bit is copied into the most significant bit positions, maintaining the sign of the LHS operand.

Unsigned Integer Shift Right
ap_(u)int ap_(u)int::operator << (ap_uint<int_W2> op)
Integer Shift Right
ap_(u)int ap_(u)int::operator << (ap_int<int_W2> op)
Unsigned Integer Shift Left
ap_(u)int ap_(u)int::operator >> (ap_uint<int_W2> op)
Integer Shift Left
ap_(u)int ap_(u)int::operator >> (ap_int<int_W2> op)

CAUTION: When assigning the result of a shift-left operator to a wider destination variable, some or all information may be lost. Xilinx recommends that you explicitly cast the shift expression to the destination type to avoid unexpected behavior.

Following are examples of shift operations:

ap_uint<13> Rslt;

ap_uint<7> Val1 = 0x41;

Rslt = Val1 << 6;  // Yields: 0x0040, i.e. msb of Val1 is lost
Rslt = ap_uint<13>(Val1) << 6;  // Yields: 0x1040, no info lost

ap_int<7> Val2 = -63;
Rslt = Val2 >> 4;  //Yields: 0x1ffc, sign is maintained and extended
Compound Assignment Operators

Vitis HLS supports compound assignment operators:

  • *=
  • /=
  • %=
  • +=
  • -=
  • <<=
  • >>=
  • &=
  • ^=
  • |=

The RHS expression is first evaluated then supplied as the RHS operand to the base operator, the result of which is assigned back to the LHS variable. The expression sizing, signedness, and potential sign-extension or truncation rules apply as discussed above for the relevant operations.

ap_uint<10> Val1 = 630;
ap_int<3> Val2 = -3;
ap_uint<5> Val3 = 27;

Val1 += Val2 - Val3; // Yields: 600 and is equivalent to:

// Val1 = ap_uint<10>(ap_int<11>(Val1) +
// ap_int<11>((ap_int<6>(Val2) -
// ap_int<6>(Val3))));
Increment and Decrement Operators

The increment and decrement operators are provided. All return a value of the same width as the operand and which is unsigned if and only if both operands are of unsigned types and signed otherwise.

Pre-Increment
ap_(u)int& ap_(u)int::operator ++ ()

Returns the incremented value of the operand.

Assigns the incremented value to the operand.

Post-Increment
const ap_(u)int ap_(u)int::operator ++ (int)

Returns the value of the operand before assignment of the incremented value to the operand variable.

Pre-Decrement
ap_(u)int& ap_(u)int::operator -- ()

Returns the decremented value of, as well as assigning the decremented value to, the operand.

Post-Decrement
const ap_(u)int ap_(u)int::operator -- (int)

Returns the value of the operand before assignment of the decremented value to the operand variable.

Relational Operators

Vitis HLS supports all relational operators. They return a Boolean value based on the result of the comparison. You can compare variables of ap_[u]int types to C/C++ fundamental integer types with these operators.

Equality
bool ap_(u)int::operator == (ap_(u)int op)
Inequality
bool ap_(u)int::operator != (ap_(u)int op)
Less than
bool ap_(u)int::operator < (ap_(u)int op)
Greater than
bool ap_(u)int::operator > (ap_(u)int op)
Less than or equal to
bool ap_(u)int::operator <= (ap_(u)int op)
Greater than or equal to
bool ap_(u)int::operator >= (ap_(u)int op)
Other Class Methods, Operators, and Data Members

The following sections discuss other class methods, operators, and data members.

Bit-Level Operations

The following methods facilitate common bit-level operations on the value stored in ap_[u]int type variables.

Length

int ap_(u)int::length ()

Returns an integer value providing the total number of bits in the ap_[u]int variable.

Concatenation

ap_concat_ref ap_(u)int::concat (ap_(u)int low)  
ap_concat_ref ap_(u)int::operator , (ap_(u)int high, ap_(u)int low)

Concatenates two ap_[u]int variables, the width of the returned value is the sum of the widths of the operands.

The High and Low arguments are placed in the higher and lower order bits of the result respectively; the concat() method places the argument in the lower order bits.

When using the overloaded comma operator, the parentheses are required. The comma operator version may also appear on the LHS of assignment.

Note: To avoid unexpected results, explicitly cast C/C++ native types (including integer literals) to an appropriate ap_[u]int type before concatenating.

ap_uint<10> Rslt;

ap_int<3> Val1 = -3;
ap_int<7> Val2 = 54;

Rslt = (Val2, Val1); // Yields: 0x1B5
Rslt = Val1.concat(Val2); // Yields: 0x2B6
(Val1, Val2) = 0xAB; // Yields: Val1 == 1, Val2 == 43
Bit Selection

ap_bit_ref ap_(u)int::operator [] (int bit)

Selects one bit from an arbitrary precision integer value and returns it.

The returned value is a reference value that can set or clear the corresponding bit in this ap_[u]int.

The bit argument must be an int value. It specifies the index of the bit to select. The least significant bit has index 0. The highest permissible index is one less than the bit-width of this ap_[u]int.

The result type ap_bit_ref represents the reference to one bit of this ap_[u]int instance specified by bit.

Range Selection

ap_range_ref ap_(u)int::range (unsigned Hi, unsigned Lo)
ap_range_ref ap_(u)int::operator () (unsigned Hi, unsigned Lo)

Returns the value represented by the range of bits specified by the arguments.

The Hi argument specifies the most significant bit (MSB) position of the range, and Lo specifies the least significant bit (LSB).

The LSB of the source variable is in position 0. If the Hi argument has a value less than Lo, the bits are returned in reverse order.


ap_uint<4> Rslt;

ap_uint<8> Val1 = 0x5f;
ap_uint<8> Val2 = 0xaa;

Rslt = Val1.range(3, 0); // Yields: 0xF
Val1(3,0) = Val2(3, 0); // Yields: 0x5A
Val1(3,0) = Val2(4, 1); // Yields: 0x55
Rslt = Val1.range(4, 7); // Yields: 0xA; bit-reversed!
AND reduce

bool ap_(u)int::and_reduce ()
  • Applies the AND operation on all bits in this ap_(u)int.
  • Returns the resulting single bit.
  • Equivalent to comparing this value against -1 (all ones) and returning true if it matches, false otherwise.
OR reduce

bool ap_(u)int::or_reduce ()
  • Applies the OR operation on all bits in this ap_(u)int.
  • Returns the resulting single bit.
  • Equivalent to comparing this value against 0 (all zeros) and returning false if it matches, true otherwise.
XOR reduce

bool ap_(u)int::xor_reduce ()
  • Applies the XOR operation on all bits in this ap_int.
  • Returns the resulting single bit.
  • Equivalent to counting the number of 1 bits in this value and returning false if the count is even or true if the count is odd.
NAND reduce

bool ap_(u)int::nand_reduce ()
  • Applies the NAND operation on all bits in this ap_int.
  • Returns the resulting single bit.
  • Equivalent to comparing this value against -1 (all ones) and returning false if it matches, true otherwise.
NOR reduce

bool ap_int::nor_reduce ()
  • Applies the NOR operation on all bits in this ap_int.
  • Returns the resulting single bit.
  • Equivalent to comparing this value against 0 (all zeros) and returning true if it matches, false otherwise.
XNOR reduce

bool ap_(u)int::xnor_reduce ()
  • Applies the XNOR operation on all bits in this ap_(u)int.
  • Returns the resulting single bit.
  • Equivalent to counting the number of 1 bits in this value and returning true if the count is even or false if the count is odd.
Bit Reduction Method Examples

ap_uint<8> Val = 0xaa;

bool t = Val.and_reduce(); // Yields: false
t = Val.or_reduce();       // Yields: true
t = Val.xor_reduce();      // Yields: false
t = Val.nand_reduce();     // Yields: true
t = Val.nor_reduce();      // Yields: false
t = Val.xnor_reduce();     // Yields: true
Bit Reverse

void ap_(u)int::reverse ()

Reverses the contents of ap_[u]int instance:

  • The LSB becomes the MSB.
  • The MSB becomes the LSB.
Reverse Method Example

ap_uint<8> Val = 0x12;

Val.reverse(); // Yields: 0x48
Test Bit Value

bool ap_(u)int::test (unsigned i)

Checks whether specified bit of ap_(u)int instance is 1.

Returns true if Yes, false if No.

Test Method Example

ap_uint<8> Val = 0x12;
bool t = Val.test(5); // Yields: true
Set Bit Value

void ap_(u)int::set (unsigned i, bool v)                              
void ap_(u)int::set_bit (unsigned i, bool v)

Sets the specified bit of the ap_(u)int instance to the value of integer V.

Set Bit (to 1)

void ap_(u)int::set (unsigned i)

Sets the specified bit of the ap_(u)int instance to the value 1 (one).

Clear Bit (to 0)

void ap_(u)int:: clear(unsigned i)

Sets the specified bit of the ap_(u)int instance to the value 0 (zero).

Invert Bit

void ap_(u)int:: invert(unsigned i)

Inverts the bit specified in the function argument of the ap_(u)int instance. The specified bit becomes 0 if its original value is 1 and vice versa.

Example of bit set, clear and invert bit methods:


ap_uint<8> Val = 0x12;
Val.set(0, 1); // Yields: 0x13
Val.set_bit(4, false); // Yields: 0x03
Val.set(7); // Yields: 0x83
Val.clear(1); // Yields: 0x81
Val.invert(4); // Yields: 0x91 
Rotate Right

void ap_(u)int:: rrotate(unsigned n)

Rotates the ap_(u)int instance n places to right.

Rotate Left

void ap_(u)int:: lrotate(unsigned n)

Rotates the ap_(u)int instance n places to left.


ap_uint<8> Val = 0x12;

Val.rrotate(3); // Yields: 0x42
Val.lrotate(6); // Yields: 0x90
Bitwise NOT

void ap_(u)int:: b_not()
  • Complements every bit of the ap_(u)int instance.

ap_uint<8> Val = 0x12;

Val.b_not(); // Yields: 0xED

Bitwise NOT Example

Test Sign

bool ap_int:: sign()
  • Checks whether the ap_(u)int instance is negative.
  • Returns true if negative.
  • Returns false if positive.
Explicit Conversion Methods
To C/C++ “(u)int”

int ap_(u)int::to_int ()
unsigned ap_(u)int::to_uint ()
  • Returns native C/C++ (32-bit on most systems) integers with the value contained in the ap_[u]int.
  • Truncation occurs if the value is greater than can be represented by an [unsigned] int.
To C/C++ 64-bit “(u)int”

long long ap_(u)int::to_int64 ()
unsigned long long ap_(u)int::to_uint64 ()
  • Returns native C/C++ 64-bit integers with the value contained in the ap_[u]int.
  • Truncation occurs if the value is greater than can be represented by an [unsigned] int.
To C/C++ “double”

double ap_(u)int::to_double ()
  • Returns a native C/C++ double 64-bit floating point representation of the value contained in the ap_[u]int.
  • If the ap_[u]int is wider than 53 bits (the number of bits in the mantissa of a double), the resulting double may not have the exact value expected.
Note: Xilinx recommends that you explicitly call member functions instead of using C-style cast to convert ap_[u]int to other data types.
Sizeof

The standard C++ sizeof() function should not be used with ap_[u]int or other classes or instance of object. The ap_int<> data type is a class and sizeof returns the storage used by that class or instance object. sizeof(ap_int<N>) always returns the number of bytes used. For example:


 sizeof(ap_int<127>)=16
 sizeof(ap_int<128>)=16
 sizeof(ap_int<129>)=24
 sizeof(ap_int<130>)=24
Compile Time Access to Data Type Attributes

The ap_[u]int<> types are provided with a static member that allows the size of the variables to be determined at compile time. The data type is provided with the static const member width, which is automatically assigned the width of the data type:


static const int width = _AP_W;

You can use the width data member to extract the data width of an existing ap_[u]int<> data type to create another ap_[u]int<> data type at compile time. The following example shows how the size of variable Res is defined as 1-bit greater than variables Val1 and Val2:


// Definition of basic data type
#define INPUT_DATA_WIDTH 8
typedef ap_int<INPUT_DATA_WIDTH> data_t;
// Definition of variables 
data_t Val1, Val2;
// Res is automatically sized at compile-time to be 1-bit greater than data type 
data_t
ap_int<data_t::width+1> Res = Val1 + Val2;

This ensures that Vitis HLS correctly models the bit-growth caused by the addition even if you update the value of INPUT_DATA_WIDTH for data_t.

C++ Arbitrary Precision Fixed-Point Types

C++ functions can take advantage of the arbitrary precision fixed-point types included with Vitis HLS. The following figure summarizes the basic features of these fixed-point types:

  • The word can be signed (ap_fixed) or unsigned (ap_ufixed).
  • A word with of any arbitrary size W can be defined.
  • The number of places above the decimal point I, also defines the number of decimal places in the word, W-I (represented by B in the following figure).
  • The type of rounding or quantization (Q) can be selected.
  • The overflow behavior (O and N) can be selected.
Figure 2: Arbitrary Precision Fixed-Point Types


TIP: The arbitrary precision fixed-point types can be used when header file ap_fixed.h is included in the code.

Arbitrary precision fixed-point types use more memory during C simulation. If using very large arrays of ap_[u]fixed types, refer to the discussion of C simulation in Arrays.

The advantages of using fixed-point types are:

  • They allow fractional number to be easily represented.
  • When variables have a different number of integer and decimal place bits, the alignment of the decimal point is handled.
  • There are numerous options to handle how rounding should happen: when there are too few decimal bits to represent the precision of the result.
  • There are numerous options to handle how variables should overflow: when the result is greater than the number of integer bits can represent.

These attributes are summarized by examining the code in the example below. First, the header file ap_fixed.h is included. The ap_fixed types are then defined using the typedef statement:

  • A 10-bit input: 8-bit integer value with 2 decimal places.
  • A 6-bit input: 3-bit integer value with 3 decimal places.
  • A 22-bit variable for the accumulation: 17-bit integer value with 5 decimal places.
  • A 36-bit variable for the result: 30-bit integer value with 6 decimal places.

The function contains no code to manage the alignment of the decimal point after operations are performed. The alignment is done automatically.

The following code sample shows ap_fixed type.


#include "ap_fixed.h"

typedef ap_ufixed<10,8, AP_RND, AP_SAT> din1_t;
typedef ap_fixed<6,3, AP_RND, AP_WRAP> din2_t;
typedef ap_fixed<22,17, AP_TRN, AP_SAT> dint_t;
typedef ap_fixed<36,30> dout_t;

dout_t cpp_ap_fixed(din1_t d_in1, din2_t d_in2) {

 static dint_t sum;
 sum += d_in1; 
 return sum * d_in2;
}

Using ap_(u)fixed types, the C++ simulation is bit accurate. Fast simulation can validate the algorithm and its accuracy. After synthesis, the RTL exhibits the identical bit-accurate behavior.

Arbitrary precision fixed-point types can be freely assigned literal values in the code. This is shown in the test bench (see the example below) used with the example above, in which the values of in1 and in2 are declared and assigned constant values.

When assigning literal values involving operators, the literal values must first be cast to ap_(u)fixed types. Otherwise, the C compiler and Vitis HLS interpret the literal as an integer or float/double type and may fail to find a suitable operator. As shown in the following example, in the assignment of in1 = in1 + din1_t(0.25), the literal 0.25 is cast to an ap_fixed type.


#include <cmath>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <cstdlib>
using namespace std;
#include "ap_fixed.h"

typedef ap_ufixed<10,8, AP_RND, AP_SAT> din1_t;
typedef ap_fixed<6,3, AP_RND, AP_WRAP> din2_t;
typedef ap_fixed<22,17, AP_TRN, AP_SAT> dint_t;
typedef ap_fixed<36,30> dout_t;

dout_t cpp_ap_fixed(din1_t d_in1, din2_t d_in2);
int main()
 {
 ofstream result;
 din1_t in1 = 0.25;
 din2_t in2 = 2.125;
 dout_t output;
 int retval=0;


 result.open(result.dat);
 // Persistent manipulators
 result << right << fixed << setbase(10) << setprecision(15);

 for (int i = 0; i <= 250; i++)
 {
 output = cpp_ap_fixed(in1,in2);

 result << setw(10) << i;
 result << setw(20) << in1;
 result << setw(20) << in2;
 result << setw(20) << output;
 result << endl;

 in1 = in1 + din1_t(0.25);
 in2 = in2 - din2_t(0.125);
 }
 result.close();

 // Compare the results file with the golden results
 retval = system(diff --brief -w result.dat result.golden.dat);
 if (retval != 0) {
 printf(Test failed  !!!\n); 
 retval=1;
 } else {
 printf(Test passed !\n);
 }

 // Return 0 if the test passes
 return retval;
}

Fixed-Point Identifier Summary

The following table shows the quantization and overflow modes.

TIP: Quantization and overflow modes that do more than the default behavior of standard hardware arithmetic (wrap and truncate) result in operators with more associated hardware. It costs logic (LUTs) to implement the more advanced modes, such as round to minus infinity or saturate symmetrically.
Table 3. Fixed-Point Identifier Summary
Identifier Description
W Word length in bits
I The number of bits used to represent the integer value (the number of bits above the decimal point)
Q Quantization mode dictates the behavior when greater precision is generated than can be defined by smallest fractional bit in the variable used to store the result.
Mode Description
AP_RND Rounding to plus infinity
AP_RND_ZERO Rounding to zero
AP_RND_MIN_INF Rounding to minus infinity
AP_RND_INF Rounding to infinity
AP_RND_CONV Convergent rounding
AP_TRN Truncation to minus infinity (default)
AP_TRN_ZERO Truncation to zero
O Overflow mode dictates the behavior when more bits are generated than the variable to store the result contains.
Mode Description
AP_SAT Saturation
AP_SAT_ZERO Saturation to zero
AP_SAT_SYM Symmetrical saturation
AP_WRAP Wrap around (default)
AP_WRAP_SM Sign magnitude wrap around
N The number of saturation bits in wrap modes.

C++ Arbitrary Precision Fixed-Point Types: Reference Information

For comprehensive information on the methods, synthesis behavior, and all aspects of using the ap_(u)fixed<N> arbitrary precision fixed-point data types, see C++ Arbitrary Precision Fixed-Point Types. This section includes:

  • Techniques for assigning constant and initialization values to arbitrary precision integers (including values greater than 1024-bit).
  • A detailed description of the overflow and saturation modes.
  • A description of Vitis HLS helper methods, such as printing, concatenating, bit-slicing and range selection functions.
  • A description of operator behavior, including a description of shift operations (a negative shift values, results in a shift in the opposite direction).
IMPORTANT: For the compiler to process, you must use the appropriate header files for the language.
C++ Arbitrary Precision Fixed-Point Types

Vitis HLS supports fixed-point types that allow fractional arithmetic to be easily handled. The advantage of fixed-point arithmetic is shown in the following example.


ap_fixed<11, 6> Var1 = 22.96875; // 11-bit signed word, 5 fractional bits
ap_ufixed<12,11> Var2 = 512.5; // 12-bit word, 1 fractional bit
ap_fixed<16,11> Res1; // 16-bit signed word, 5 fractional bits

Res1 = Var1 + Var2; // Result is 535.46875

Even though Var1 and Var2 have different precisions, the fixed-point type ensures that the decimal point is correctly aligned before the operation (an addition in this case), is performed. You are not required to perform any operations in the C code to align the decimal point.

The type used to store the result of any fixed-point arithmetic operation must be large enough (in both the integer and fractional bits) to store the full result.

If this is not the case, the ap_fixed type performs:

  • overflow handling (when the result has more MSBs than the assigned type supports)
  • quantization (or rounding, when the result has fewer LSBs than the assigned type supports)

The ap_[u]fixed type provides various options on how the overflow and quantization are performed. The options are discussed below.

ap_[u]fixed Representation

In ap[u]fixed types, a fixed-point value is represented as a sequence of bits with a specified position for the binary point.

  • Bits to the left of the binary point represent the integer part of the value.
  • Bits to the right of the binary point represent the fractional part of the value.

ap_[u]fixed type is defined as follows:


ap_[u]fixed<int W, 
 int I, 
 ap_q_mode Q, 
 ap_o_mode O,
 ap_sat_bits N>;

Quantization Modes
Rounding to plus infinity AP_RND
Rounding to zero AP_RND_ZERO
Rounding to minus infinity AP_RND_MIN_INF
Rounding to infinity AP_RND_INF
Convergent rounding AP_RND_CONV
Truncation AP_TRN
Truncation to zero AP_TRN_ZERO
AP_RND
  • Round the value to the nearest representable value for the specific ap_[u]fixed type.
    ap_fixed<3, 2, AP_RND, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.5
    ap_fixed<3, 2, AP_RND, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.0
AP_RND_ZERO
  • Round the value to the nearest representable value.
  • Round towards zero.
    • For positive values, delete the redundant bits.
    • For negative values, add the least significant bits to get the nearest representable value.
    ap_fixed<3, 2, AP_RND_ZERO, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_RND_ZERO, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.0
AP_RND_MIN_INF
  • Round the value to the nearest representable value.
  • Round towards minus infinity.
    • For positive values, delete the redundant bits.
    • For negative values, add the least significant bits.
    ap_fixed<3, 2, AP_RND_MIN_INF, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_RND_MIN_INF, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.5
AP_RND_INF
  • Round the value to the nearest representable value.
  • The rounding depends on the least significant bit.
    • For positive values, if the least significant bit is set, round towards plus infinity. Otherwise, round towards minus infinity.
    • For negative values, if the least significant bit is set, round towards minus infinity. Otherwise, round towards plus infinity.
    ap_fixed<3, 2, AP_RND_INF, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.5
    ap_fixed<3, 2, AP_RND_INF, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.5
AP_RND_CONV
  • Round the value to the nearest representable value.
  • The rounding depends on the least significant bit.
    • If least significant bit is set, round towards plus infinity.
    • Otherwise, round towards minus infinity.
    ap_fixed<3, 2, AP_RND_CONV, AP_SAT> UAPFixed4 = 0.75; // Yields: 1.0
    ap_fixed<3, 2, AP_RND_CONV, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.0
AP_TRN
  • Always round the value towards minus infinity.
    ap_fixed<3, 2, AP_TRN, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_TRN, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.5
AP_TRN_ZERO

Round the value to:

  • For positive values, the rounding is the same as mode AP_TRN.
  • For negative values, round towards zero.
    ap_fixed<3, 2, AP_TRN_ZERO, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_TRN_ZERO, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.0
Overflow Modes
Saturation AP_SAT
Saturation to zero AP_SAT_ZERO
Symmetrical saturation AP_SAT_SYM
Wrap-around AP_WRAP
Sign magnitude wrap-around AP_WRAP_SM
AP_SAT

Saturate the value.

  • To the maximum value in case of overflow.
  • To the negative maximum value in case of negative overflow.
    ap_fixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = 19.0; // Yields: 7.0
    ap_fixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = -19.0; // Yields: -8.0
    ap_ufixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = 19.0; // Yields: 15.0
    ap_ufixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = -19.0; // Yields: 0.0
AP_SAT_ZERO

Force the value to zero in case of overflow, or negative overflow.

ap_fixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = 19.0; // Yields: 0.0
ap_fixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = -19.0; // Yields: 0.0
ap_ufixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = 19.0; // Yields: 0.0
ap_ufixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = -19.0; // Yields: 0.0
AP_SAT_SYM

Saturate the value:

  • To the maximum value in case of overflow.
  • To the minimum value in case of negative overflow.
    • Negative maximum for signed ap_fixed types
    • Zero for unsigned ap_ufixed types
    ap_fixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = 19.0; // Yields: 7.0
    ap_fixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = -19.0; // Yields: -7.0
    ap_ufixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = 19.0; // Yields: 15.0
    ap_ufixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = -19.0; // Yields: 0.0
AP_WRAP

Wrap the value around in case of overflow.

ap_fixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = 31.0; // Yields: -1.0
ap_fixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = -19.0; // Yields: -3.0
ap_ufixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = 19.0; // Yields: 3.0
ap_ufixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = -19.0; // Yields: 13.0

If the value of N is set to zero (the default overflow mode):

  • All MSB bits outside the range are deleted.
  • For unsigned numbers. After the maximum it wraps around to zero.
  • For signed numbers. After the maximum, it wraps to the minimum values.

If N>0:

  • When N > 0, N MSB bits are saturated or set to 1.
  • The sign bit is retained, so positive numbers remain positive and negative numbers remain negative.
  • The bits that are not saturated are copied starting from the LSB side.
AP_WRAP_SM

The value should be sign-magnitude wrapped around.

ap_fixed<4, 4, AP_RND, AP_WRAP_SM> UAPFixed4 = 19.0; // Yields: -4.0
ap_fixed<4, 4, AP_RND, AP_WRAP_SM> UAPFixed4 = -19.0; // Yields: 2.0

If the value of N is set to zero (the default overflow mode):

  • This mode uses sign magnitude wrapping.
  • Sign bit set to the value of the least significant deleted bit.
  • If the most significant remaining bit is different from the original MSB, all the remaining bits are inverted.
  • If MSBs are same, the other bits are copied over.
    1. Delete redundant MSBs.
    2. The new sign bit is the least significant bit of the deleted bits. 0 in this case.
    3. Compare the new sign bit with the sign of the new value.
  • If different, invert all the numbers. They are different in this case.

If N>0:

  • Uses sign magnitude saturation
  • N MSBs are saturated to 1.
  • Behaves similar to a case in which N = 0, except that positive numbers stay positive and negative numbers stay negative.
Compiling ap_[u]fixed<> Types

To use the ap_[u]fixed<> classes, you must include the ap_fixed.h header file in all source files that reference ap_[u]fixed<> variables.

When compiling software models that use these classes, it may be necessary to specify the location of the Vitis HLS header files, for example by adding the “-I/<HLS_HOME>/include” option for g++ compilation.

Declaring and Defining ap_[u]fixed<> Variables

There are separate signed and unsigned classes:

  • ap_fixed<W,I> (signed)
  • ap_ufixed<W,I> (unsigned)

You can create user-defined types with the C/C++ typedef statement:


#include "ap_fixed.h" // use ap_[u]fixed<> types

typedef ap_ufixed<128,32> uint128_t; // 128-bit user defined type, 
 //  32 integer bits

User-Defined Types Examples

Initialization and Assignment from Constants (Literals)

You can initialize ap_[u]fixed variable with normal floating point constants of the usual C/C++ width:

  • 32 bits for type float
  • 64 bits for type double

That is, typically, a floating point value that is single precision type or in the form of double precision.

Note that the value assigned to the fixed-point variable will be limited by the precision of the constant. Use string initialization as described in Initialization and Assignment from Constants (Literals) to ensure that all bits of the fixed-point variable are populated according to the precision described by the string.


#include <ap_fixed.h>

ap_ufixed<30, 15> my15BitInt = 3.1415;
ap_fixed<42, 23> my42BitInt = -1158.987;
ap_ufixed<99, 40> = 287432.0382911;
ap_fixed<36,30> = -0x123.456p-1;

The ap_[u]fixed types do not support initialization if they are used in an array of std::complex types.


typedef ap_fixed<DIN_W, 1, AP_TRN, AP_SAT> coeff_t; // MUST have IW >= 1
std::complex<coeff_t> twid_rom[REAL_SZ/2] = {{ 1, -0 },{ 0.9,-0.006 }, etc.}

The initialization values must first be cast to std::complex:


typedef ap_fixed<DIN_W, 1, AP_TRN, AP_SAT> coeff_t; // MUST have IW >= 1
std::complex<coeff_t> twid_rom[REAL_SZ/2] = {std::complex<coeff_t>( 1, -0 ), 
std::complex<coeff_t>(0.9,-0.006 ),etc.}
Support for Console I/O (Printing)

As with initialization and assignment to ap_[u]fixed<> variables, Vitis HLS supports printing values that require more than 64 bits to represent.

The easiest way to output any value stored in an ap_[u]fixed variable is to use the C++ standard output stream, std::cout (#include <iostream> or <iostream.h>). The stream insertion operator, “<<“, is overloaded to correctly output the full range of values possible for any given ap_[u]fixed variable. The following stream manipulators are also supported, allowing formatting of the value as shown.

  • dec (decimal)
  • hex (hexadecimal)
  • oct (octal)
    #include <iostream.h>
    // Alternative: #include <iostream>
    
    ap_fixed<6,3, AP_RND, AP_WRAP> Val = 3.25;
    
    cout << Val << endl;     // Yields: 3.25
Using the Standard C Library

You can also use the standard C library (#include <stdio.h>) to print out values larger than 64-bits:

  1. Convert the value to a C++ std::string using the ap_[u]fixed classes method to_string().
  2. Convert the result to a null-terminated C character string using the std::string class method c_str().
Optional Argument One (Specifying the Radix)

You can pass the ap[u]int::to_string() method an optional argument specifying the radix of the numerical format desired. The valid radix argument values are:

  • 2 (binary)
  • 8 (octal
  • 10 (decimal)
  • 16 (hexadecimal) (default)
Optional Argument Two (Printing as Signed Values)

A second optional argument to ap_[u]int::to_string() specifies whether to print the non-decimal formats as signed values. This argument is boolean. The default value is false, causing the non-decimal formats to be printed as unsigned values.

ap_fixed<6,3, AP_RND, AP_WRAP> Val = 3.25;

printf("%s \n", in2.to_string().c_str()); // Yields: 0b011.010
printf("%s \n", in2.to_string(10).c_str()); //Yields: 3.25

The ap_[u]fixed types are supported by the following C++ manipulator functions:

  • setprecision
  • setw
  • setfill

The setprecision manipulator sets the decimal precision to be used. It takes one parameter f as the value of decimal precision, where n specifies the maximum number of meaningful digits to display in total (counting both those before and those after the decimal point).

The default value of f is 6, which is consistent with native C float type.

ap_fixed<64, 32> f =3.14159;
cout << setprecision (5) << f << endl;
cout << setprecision (9) << f << endl;
f = 123456;
cout << setprecision (5) << f << endl;

The example above displays the following results where the printed results are rounded when the actual precision exceeds the specified precision:

   3.1416
   3.14159
   1.2346e+05

The setw manipulator:

  • Sets the number of characters to be used for the field width.
  • Takes one parameter w as the value of the width

    where

    • w determines the minimum number of characters to be written in some output representation.

If the standard width of the representation is shorter than the field width, the representation is padded with fill characters. Fill characters are controlled by the setfill manipulator which takes one parameter f as the padding character.

For example, given:

    ap_fixed<65,32> aa = 123456;
    int precision = 5;
    cout<<setprecision(precision)<<setw(13)<<setfill('T')<<a<<endl;

The output is:

     TTT1.2346e+05
Expressions Involving ap_[u]fixed<> types

Arbitrary precision fixed-point values can participate in expressions that use any operators supported by C/C++. After an arbitrary precision fixed-point type or variable is defined, their usage is the same as for any floating point type or variable in the C/C++ languages.

Observe the following caveats:

  • Zero and Sign Extensions

    All values of smaller bit-width are zero or sign-extended depending on the sign of the source value. You may need to insert casts to obtain alternative signs when assigning smaller bit-widths to larger.

  • Truncations

    Truncation occurs when you assign an arbitrary precision fixed-point of larger bit-width than the destination variable.

Class Methods, Operators, and Data Members

In general, any valid operation that can be done on a native C/C++ integer data type is supported (using operator overloading) for ap_[u]fixed types. In addition to these overloaded operators, some class specific operators and methods are included to ease bit-level operations.

Binary Arithmetic Operators
Addition
ap_[u]fixed::RType ap_[u]fixed::operator + (ap_[u]fixed op)

Adds an arbitrary precision fixed-point with a given operand op.

The operands can be any of the following integer types:

  • ap_[u]fixed
  • ap_[u]int
  • C/C++

The result type ap_[u]fixed::RType depends on the type information of the two operands.

ap_fixed<76, 63> Result;

ap_fixed<5, 2> Val1 = 1.125;
ap_fixed<75, 62> Val2 = 6721.35595703125;

Result = Val1 + Val2; //Yields 6722.480957

Because Val2 has the larger bit-width on both integer part and fraction part, the result type has the same bit-width and plus one to be able to store all possible result values.

Specifying the data's width controls resources by using the power functions, as shown below. In similar cases, Xilinx recommends specifying the width of the stored result instead of specifying the width of fixed point operations.

ap_ufixed<16,6> x=5; 
ap_ufixed<16,7>y=hl::rsqrt<16,6>(x+x); 
Subtraction
ap_[u]fixed::RType ap_[u]fixed::operator - (ap_[u]fixed op)

Subtracts an arbitrary precision fixed-point with a given operand op.

The result type ap_[u]fixed::RType depends on the type information of the two operands.

ap_fixed<76, 63> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val2 - Val1; // Yields 6720.23057

Because Val2 has the larger bit-width on both integer part and fraction part, the result type has the same bit-width and plus one to be able to store all possible result values.

Multiplication
ap_[u]fixed::RType ap_[u]fixed::operator * (ap_[u]fixed op)

Multiplies an arbitrary precision fixed-point with a given operand op.

ap_fixed<80, 64> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 * Val2; // Yields 7561.525452

This shows the multiplication of Val1 and Val2. The result type is the sum of their integer part bit-width and their fraction part bit width.

Division
ap_[u]fixed::RType ap_[u]fixed::operator / (ap_[u]fixed op)

Divides an arbitrary precision fixed-point by a given operand op.

ap_fixed<84, 66> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Val2 / Val1; // Yields 5974.538628

This shows the division of Val1 and Val2. To preserve enough precision:

  • The integer bit-width of the result type is sum of the integer bit-width of Val2 and the fraction bit-width of Val1.
  • The fraction bit-width of the result type is equal to the fraction bit-width of Val2.
Bitwise Logical Operators
Bitwise OR
ap_[u]fixed::RType ap_[u]fixed::operator | (ap_[u]fixed op)

Applies a bitwise operation on an arbitrary precision fixed-point and a given operand op.

ap_fixed<75, 62> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 | Val2; // Yields 6271.480957
Bitwise AND
ap_[u]fixed::RType ap_[u]fixed::operator & (ap_[u]fixed op)

Applies a bitwise operation on an arbitrary precision fixed-point and a given operand op.

ap_fixed<75, 62> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 & Val2;  // Yields 1.00000
Bitwise XOR
ap_[u]fixed::RType ap_[u]fixed::operator ^ (ap_[u]fixed op)

Applies an xor bitwise operation on an arbitrary precision fixed-point and a given operand op.

ap_fixed<75, 62> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 ^ Val2; // Yields 6720.480957
Increment and Decrement Operators
Pre-Increment
ap_[u]fixed ap_[u]fixed::operator ++ ()

This operator function prefix increases an arbitrary precision fixed-point variable by 1.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = ++Val1; // Yields 6.125000
Post-Increment
ap_[u]fixed ap_[u]fixed::operator ++ (int)

This operator function postfix:

  • Increases an arbitrary precision fixed-point variable by 1.
  • Returns the original val of this arbitrary precision fixed-point.
    ap_fixed<25, 8> Result;
    ap_fixed<8, 5> Val1 = 5.125;
    
    Result = Val1++; // Yields 5.125000
Pre-Decrement
ap_[u]fixed ap_[u]fixed::operator -- ()

This operator function prefix decreases this arbitrary precision fixed-point variable by 1.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = --Val1; // Yields 4.125000
Post-Decrement
ap_[u]fixed ap_[u]fixed::operator -- (int)

This operator function postfix:

  • Decreases this arbitrary precision fixed-point variable by 1.
  • Returns the original val of this arbitrary precision fixed-point.
    ap_fixed<25, 8> Result;
    ap_fixed<8, 5> Val1 = 5.125;
    
    Result = Val1--; // Yields 5.125000
Unary Operators
Addition
ap_[u]fixed ap_[u]fixed::operator + ()

Returns a self copy of an arbitrary precision fixed-point variable.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = +Val1;  // Yields 5.125000
Subtraction
ap_[u]fixed::RType ap_[u]fixed::operator - ()

Returns a negative value of an arbitrary precision fixed-point variable.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = -Val1; // Yields -5.125000
Equality Zero
bool ap_[u]fixed::operator ! ()

This operator function:

  • Compares an arbitrary precision fixed-point variable with 0,
  • Returns the result.
    bool  Result;
    ap_fixed<8, 5> Val1 = 5.125;
    
    Result = !Val1; // Yields false
Bitwise Inverse
ap_[u]fixed::RType ap_[u]fixed::operator ~ ()

Returns a bitwise complement of an arbitrary precision fixed-point variable.

ap_fixed<25, 15> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = ~Val1; // Yields -5.25
Shift Operators
Unsigned Shift Left
ap_[u]fixed ap_[u]fixed::operator << (ap_uint<_W2> op) 

This operator function:

  • Shifts left by a given integer operand.
  • Returns the result.

The operand can be a C/C++ integer type:

  • char
  • short
  • int
  • long

The return type of the shift left operation is the same width as the type being shifted.

Note: Shift does not support overflow or quantization modes.
ap_fixed<25, 15> Result;
ap_fixed<8, 5> Val = 5.375;

ap_uint<4> sh = 2;

Result = Val << sh; // Yields -10.5

The bit-width of the result is (W = 25, I = 15). Because the shift left operation result type is same as the type of Val:

  • The high order two bits of Val are shifted out.
  • The result is -10.5.

If a result of 21.5 is required, Val must be cast to ap_fixed<10, 7> first -- for example, ap_ufixed<10, 7>(Val).

Signed Shift Left
ap_[u]fixed ap_[u]fixed::operator << (ap_int<_W2> op)

This operator:

  • Shifts left by a given integer operand.
  • Returns the result.

The shift direction depends on whether the operand is positive or negative.

  • If the operand is positive, a shift right is performed.
  • If the operand is negative, a shift left (opposite direction) is performed.

The operand can be a C/C++ integer type:

  • char
  • short
  • int
  • long

The return type of the shift right operation is the same width as the type being shifted.

ap_fixed<25, 15,  false> Result;
ap_uint<8, 5> Val = 5.375;

ap_int<4> Sh = 2;
Result = Val << sh; // Shift left, yields -10.25

Sh = -2;
Result = Val << sh; // Shift right, yields 1.25
Unsigned Shift Right
ap_[u]fixed ap_[u]fixed::operator >> (ap_uint<_W2> op) 

This operator function:

  • Shifts right by a given integer operand.
  • Returns the result.

The operand can be a C/C++ integer type:

  • char
  • short
  • int
  • long

The return type of the shift right operation is the same width as the type being shifted.

ap_fixed<25, 15> Result;
ap_fixed<8, 5> Val = 5.375;

ap_uint<4> sh = 2;

Result = Val >> sh; // Yields 1.25

If it is necessary to preserve all significant bits, extend fraction part bit-width of the Val first, for example ap_fixed<10, 5>(Val).

Signed Shift Right
ap_[u]fixed ap_[u]fixed::operator >> (ap_int<_W2> op) 

This operator:

  • Shifts right by a given integer operand.
  • Returns the result.

The shift direction depends on whether operand is positive or negative.

  • If the operand is positive, a shift right performed.
  • If operand is negative, a shift left (opposite direction) is performed.

The operand can be a C/C++ integer type (char, short, int, or long).

The return type of the shift right operation is the same width as type being shifted. For example:

ap_fixed<25, 15,  false> Result;
ap_uint<8, 5> Val = 5.375;

ap_int<4> Sh = 2;
Result = Val >> sh; // Shift right, yields 1.25

Sh = -2;
Result = Val >> sh; // Shift left,  yields -10.5

1.25
Relational Operators
Equality
bool ap_[u]fixed::operator == (ap_[u]fixed op)

This operator compares the arbitrary precision fixed-point variable with a given operand.

Returns true if they are equal and false if they are not equal.

The type of operand op can be ap_[u]fixed, ap_int or C/C++ integer types. For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 == Val2; // Yields  true
Result = Val1 == Val3; // Yields  false
Inequality
bool ap_[u]fixed::operator != (ap_[u]fixed op)

This operator compares this arbitrary precision fixed-point variable with a given operand.

Returns true if they are not equal and false if they are equal.

The type of operand op can be:

  • ap_[u]fixed
  • ap_int
  • C or C++ integer types

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 != Val2; // Yields false
Result = Val1 != Val3; // Yields true
Greater than or equal to
bool ap_[u]fixed::operator >= (ap_[u]fixed op)

This operator compares a variable with a given operand.

Returns true if they are equal or if the variable is greater than the operator and false otherwise.

The type of operand op can be ap_[u]fixed, ap_int or C/C++ integer types.

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 >= Val2; // Yields true
Result = Val1 >= Val3; // Yields false
Less than or equal to
bool ap_[u]fixed::operator <= (ap_[u]fixed op)

This operator compares a variable with a given operand, and return true if it is equal to or less than the operand and false if not.

The type of operand op can be ap_[u]fixed, ap_int or C/C++ integer types.

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 <= Val2; // Yields true
Result = Val1 <= Val3; // Yields true
Greater than
bool ap_[u]fixed::operator > (ap_[u]fixed op)

This operator compares a variable with a given operand, and return true if it is greater than the operand and false if not.

The type of operand op can be ap_[u]fixed, ap_int, or C/C++ integer types.

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 > Val2; // Yields false
Result = Val1 > Val3; // Yields false
Less than
bool ap_[u]fixed::operator < (ap_[u]fixed op)

This operator compares a variable with a given operand, and return true if it is less than the operand and false if not.

The type of operand op can be ap_[u]fixed, ap_int, or C/C++ integer types. For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 < Val2; // Yields false
Result = Val1 < Val3; // Yields true
Bit Operator
Bit-Select and Set
af_bit_ref ap_[u]fixed::operator [] (int bit) 

This operator selects one bit from an arbitrary precision fixed-point value and returns it.

The returned value is a reference value that can set or clear the corresponding bit in the ap_[u]fixed variable. The bit argument must be an integer value and it specifies the index of the bit to select. The least significant bit has index 0. The highest permissible index is one less than the bit-width of this ap_[u]fixed variable.

The result type is af_bit_ref with a value of either 0 or 1. For example:

ap_int<8, 5> Value = 1.375;

Value[3]; // Yields  1
Value[4]; // Yields  0

Value[2] = 1; // Yields 1.875
Value[3] = 0; // Yields 0.875
Bit Range
af_range_ref af_(u)fixed::range (unsigned Hi, unsigned Lo)
af_range_ref af_(u)fixed::operator [] (unsigned Hi, unsigned Lo) 

This operation is similar to bit-select operator [] except that it operates on a range of bits instead of a single bit.

It selects a group of bits from the arbitrary precision fixed-point variable. The Hi argument provides the upper range of bits to be selected. The Lo argument provides the lowest bit to be selected. If Lo is larger than Hi the bits selected are returned in the reverse order.

The return type af_range_ref represents a reference in the range of the ap_[u]fixed variable specified by Hi and Lo. For example:

ap_uint<4> Result = 0;
ap_ufixed<4, 2> Value = 1.25;
ap_uint<8> Repl = 0xAA;

Result = Value.range(3, 0); // Yields: 0x5
Value(3, 0) = Repl(3, 0); // Yields: -1.5

// when Lo > Hi, return the reverse bits string
Result = Value.range(0, 3); // Yields: 0xA
Range Select
af_range_ref af_(u)fixed::range ()
af_range_ref af_(u)fixed::operator [] 

This operation is the special case of the range select operator []. It selects all bits from this arbitrary precision fixed-point value in the normal order.

The return type af_range_ref represents a reference to the range specified by Hi = W - 1 and Lo = 0. For example:

ap_uint<4> Result = 0;

ap_ufixed<4, 2> Value = 1.25;
ap_uint<8> Repl = 0xAA;

Result = Value.range(); // Yields: 0x5
Value() = Repl(3, 0); // Yields: -1.5
Length
int ap_[u]fixed::length ()

This function returns an integer value that provides the number of bits in an arbitrary precision fixed-point value. It can be used with a type or a value. For example:

ap_ufixed<128, 64> My128APFixed;

int bitwidth = My128APFixed.length(); // Yields 128
Explicit Conversion Methods
Fixed to Double
double ap_[u]fixed::to_double ()

This member function returns this fixed-point value in form of IEEE double precision format. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
double Result;

Result = MyAPFixed.to_double(); // Yields 333.789
Fixed to Float
float ap_[u]fixed::to_float()

This member function returns this fixed-point value in form of IEEE float precision format. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
float Result;

Result = MyAPFixed.to_float();  // Yields 333.789
Fixed to Half-Precision Floating Point
half ap_[u]fixed::to_half()

This member function return this fixed-point value in form of HLS half-precision (16-bit) float precision format. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
half Result;

Result = MyAPFixed.to_half();  // Yields 333.789
Fixed to ap_int
ap_int ap_[u]fixed::to_ap_int ()

This member function explicitly converts this fixed-point value to ap_int that captures all integer bits (fraction bits are truncated). For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
ap_uint<77> Result;

Result = MyAPFixed.to_ap_int(); //Yields 333
Fixed to Integer
int ap_[u]fixed::to_int ()
unsigned ap_[u]fixed::to_uint ()
ap_slong ap_[u]fixed::to_int64 ()
ap_ulong ap_[u]fixed::to_uint64 ()

This member function explicitly converts this fixed-point value to C built-in integer types. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
unsigned int  Result;

Result = MyAPFixed.to_uint(); //Yields 333

unsigned long long Result;
Result = MyAPFixed.to_uint64(); //Yields 333
Note: Xilinx recommends that you explicitly call member functions instead of using C-style cast to convert ap_[u]fixed to other data types.
Compile Time Access to Data Type Attributes

The ap_[u]fixed<> types are provided with several static members that allow the size and configuration of data types to be determined at compile time. The data type is provided with the static const members: width, iwidth, qmode and omode:

static const int width = _AP_W;
static const int iwidth = _AP_I;
static const ap_q_mode qmode = _AP_Q;
static const ap_o_mode omode = _AP_O;

You can use these data members to extract the following information from any existing ap_[u]fixed<> data type:

  • width: The width of the data type.
  • iwidth: The width of the integer part of the data type.
  • qmode: The quantization mode of the data type.
  • omode: The overflow mode of the data type.

For example, you can use these data members to extract the data width of an existing ap_[u]fixed<> data type to create another ap_[u]fixed<> data type at compile time.

The following example shows how the size of variable Res is automatically defined as 1-bit greater than variables Val1 and Val2 with the same quantization modes:

// Definition of basic data type
#define INPUT_DATA_WIDTH 12
#define IN_INTG_WIDTH 6
#define IN_QMODE AP_RND_ZERO
#define IN_OMODE AP_WRAP
typedef ap_fixed<INPUT_DATA_WIDTH, IN_INTG_WIDTH, IN_QMODE, IN_OMODE> data_t;
// Definition of variables 
data_t Val1, Val2;
// Res is automatically sized at run-time to be 1-bit greater than INPUT_DATA_WIDTH 
// The bit growth in Res will be in the integer bits
ap_int<data_t::width+1, data_t::iwidth+1, data_t::qmode, data_t::omode> Res = Val1 + 
Val2;

This ensures that Vitis HLS correctly models the bit-growth caused by the addition even if you update the value of INPUT_DATA_WIDTH, IN_INTG_WIDTH, or the quantization modes for data_t.

Vitis HLS Math Library

The Vitis HLS Math Library (hls_math.h) provides support for the synthesis of the standard C (math.h) and C++ (cmath.h) libraries and is automatically used to specify the math operations during synthesis. The support includes floating point (single-precision, double-precision and half-precision) for all functions and fixed-point support for some functions.

The hls_math.h library can optionally be used in C++ source code in place of the standard C++ math library (cmath.h), but it cannot be used in C source code. Vitis HLS will use the appropriate simulation implementation to avoid accuracy difference between C simulation and C/RTL co-simulation.

HLS Math Library Accuracy

The HLS math functions are implemented as synthesizable bit-approximate functions from the hls_math.h library. Bit-approximate HLS math library functions do not provide the same accuracy as the standard C function. To achieve the desired result, the bit-approximate implementation might use a different underlying algorithm than the standard C math library version. The accuracy of the function is specified in terms of ULP (Unit of Least Precision). This difference in accuracy has implications for both C simulation and C/RTL co-simulation.

The ULP difference is typically in the range of 1-4 ULP.

  • If the standard C math library is used in the C source code, there may be a difference between the C simulation and the C/RTL co-simulation due to the fact that some functions exhibit a ULP difference from the standard C math library.
  • If the HLS math library is used in the C source code, there will be no difference between the C simulation and the C/RTL co-simulation. A C simulation using the HLS math library, may however differ from a C simulation using the standard C math library.

In addition, the following seven functions might show some differences, depending on the C standard used to compile and run the C simulation:

  • copysign
  • fpclassify
  • isinf
  • isfinite
  • isnan
  • isnormal
  • signbit

C90 mode

Only isinf, isnan, and copysign are usually provided by the system header files, and they operate on doubles. In particular, copysign always returns a double result. This might result in unexpected results after synthesis if it must be returned to a float, because a double-to-float conversion block is introduced into the hardware.

C99 mode (-std=c99)

All seven functions are usually provided under the expectation that the system header files will redirect them to __isnan(double) and __isnan(float). The usual GCC header files do not redirect isnormal, but implement it in terms of fpclassify.

C++ Using math.h

All seven are provided by the system header files, and they operate on doubles.

copysign always returns a double result. This might cause unexpected results after synthesis if it must be returned to a float, because a double-to-float conversion block is introduced into the hardware.

C++ Using cmath

Similar to C99 mode(-std=c99), except that:

  • The system header files are usually different.
  • The functions are properly overloaded for:
    • float(). snan(double)
    • isinf(double)

copysign and copysignf are handled as built-ins even when using namespace std;.

C++ Using cmath and namespace std

No issues. Xilinx recommends using the following for best results:

  • -std=c99 for C
  • -fno-builtin for C and C++
Note: To specify the C compile options, such as -std=c99, use the Tcl command add_files with the -cflags option. Alternatively, use the Edit CFLAGs button in the Project Settings dialog box.

The HLS Math Library

The following functions are provided in the HLS math library. Each function supports half-precision (type half), single-precision (type float) and double precision (type double).

IMPORTANT: For each function func listed below, there is also an associated half-precision only function named half_func and single-precision only function named funcf provided in the library.

When mixing half-precision, single-precision and double-precision data types, check for common synthesis errors to prevent introducing type-conversion hardware in the final FPGA implementation.

Trigonometric Functions

acos acospi asin asinpi
atan atan2 atan2pi cos
cospi sin sincos sinpi
tan tanpi

Hyperbolic Functions

acosh asinh atanh cosh
sinh tanh

Exponential Functions

exp exp10 exp2 expm1
frexp ldexp modf

Logarithmic Functions

ilogb log log10 log1p

Power Functions

cbrt hypot pow rsqrt
sqrt

Error Functions

erf erfc

Rounding Functions

ceil floor llrint llround
lrint lround nearbyint rint
round trunc

Remainder Functions

fmod remainder remquo

Floating-point

copysign nan nextafter nexttoward

Difference Functions

fdim fmax fmin maxmag
minmag

Other Functions

abs divide fabs fma
fract mad recip

Classification Functions

fpclassify isfinite isinf isnan
isnormal signbit

Comparison Functions

isgreater isgreaterequal isless islessequal
islessgreater isunordered

Relational Functions

all any bitselect isequal
isnotequal isordered select

Fixed-Point Math Functions

Fixed-point implementations are also provided for the following math functions.

All fixed-point math functions support ap_[u]fixed and ap_[u]int data types with following bit-width specification,

  1. ap_fixed<W,I> where I<=33 and W-I<=32
  2. ap_ufixed<W,I> where I<=32 and W-I<=32
  3. ap_int<I> where I<=33
  4. ap_uint<I> where I<=32

Trigonometric Functions

cos sin tan acos asin atan atan2 sincos
cospi sinpi

Hyperbolic Functions

cosh sinh tanh acosh asinh atanh

Exponential Functions

exp frexp modf exp2 expm1

Logarithmic Functions

log log10 ilogb log1p

Power Functions

pow sqrt rsqrt cbrt hypot

Error Functions

erf erfc

Rounding Functions

ceil floor trunc round rint nearbyint

Floating Point

nextafter nexttoward

Difference Functions

erf erfc fdim fmax fmin maxmag minmag

Other Functions

fabs recip abs fract divide

Classification Functions

signbit

Comparison Functions

isgreater isgreaterequal isless islessequal islessgreater

Relational Functions

isequal isnotequal any all bitselect

The fixed-point type provides a slightly-less accurate version of the function value, but a smaller and faster RTL implementation.

The methodology for implementing a math function with a fixed-point data types is:

  1. Determine if a fixed-point implementation is supported.
  2. Update the math functions to use ap_fixed types.
  3. Perform C simulation to validate the design still operates with the required precision. The C simulation is performed using the same bit-accurate types as the RTL implementation.
  4. Synthesize the design.

For example, a fixed-point implementation of the function sin is specified by using fixed-point types with the math function as follows:

#include "hls_math.h"
#include "ap_fixed.h"

ap_fixed<32,2> my_input, my_output;

my_input = 24.675;
my_output = sin(my_input);

When using fixed-point math functions, the result type must have the same width and integer bits as the input.

Verification and Math Functions

If the standard C math library is used in the C source code, the C simulation results and the C/RTL co-simulation results may be different: if any of the math functions in the source code have an ULP difference from the standard C math library it may result in differences when the RTL is simulated.

If the hls_math.h library is used in the C source code, the C simulation and C/RTL co-simulation results are identical. However, the results of C simulation using hls_math.h are not the same as those using the standard C libraries. The hls_math.h library simply ensures the C simulation matches the C/RTL co-simulation results. In both cases, the same RTL implementation is created. The following explains each of the possible options which are used to perform verification when using math functions.

Verification Option 1: Standard Math Library and Verify Differences

In this option, the standard C math libraries are used in the source code. If any of the functions synthesized do have exact accuracy the C/RTL co-simulation is different than the C simulation. The following example highlights this approach.

#include <cmath>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <cstdlib>
using namespace std;

typedef float data_t;

data_t cpp_math(data_t angle) {
     data_t s = sinf(angle);
     data_t c = cosf(angle);
     return sqrtf(s*s+c*c);
}

In this case, the results between C simulation and C/RTL co-simulation are different. Keep in mind when comparing the outputs of simulation, any results written from the test bench are written to the working directory where the simulation executes:

  • C simulation: Folder <project>/<solution>/csim/build
  • C/RTL co-simulation: Folder <project>/<solution>/sim/<RTL>

where <project> is the project folder, <solution> is the name of the solution folder and <RTL> is the type of RTL verified (verilog or vhdl). The following figure shows a typical comparison of the pre-synthesis results file on the left-hand side and the post-synthesis RTL results file on the right-hand side. The output is shown in the third column.

Figure 3: Pre-Synthesis and Post-Synthesis Simulation Differences


The results of pre-synthesis simulation and post-synthesis simulation differ by fractional amounts. You must decide whether these fractional amounts are acceptable in the final RTL implementation.

The recommended flow for handling these differences is using a test bench that checks the results to ensure that they lie within an acceptable error range. This can be accomplished by creating two versions of the same function, one for synthesis and one as a reference version. In this example, only function cpp_math is synthesized.


#include <cmath>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <cstdlib>
using namespace std;

typedef float data_t;

data_t cpp_math(data_t angle) {
 data_t s = sinf(angle);
 data_t c = cosf(angle);
 return sqrtf(s*s+c*c);
}

data_t cpp_math_sw(data_t angle) {
 data_t s = sinf(angle);
 data_t c = cosf(angle);
 return sqrtf(s*s+c*c);
}

The test bench to verify the design compares the outputs of both functions to determine the difference, using variable diff in the following example. During C simulation both functions produce identical outputs. During C/RTL co-simulation function cpp_math produces different results and the difference in results are checked.


int main() {
 data_t angle = 0.01;
 data_t output, exp_output, diff;
 int retval=0;

 for (data_t i = 0; i <= 250; i++) {
 output = cpp_math(angle);
 exp_output = cpp_math_sw(angle);

 // Check for differences
 diff = ( (exp_output > output) ? exp_output - output : output - exp_output);
 if (diff > 0.0000005) {
 printf("Difference %.10f exceeds tolerance at angle %.10f \n", diff, angle);
 retval=1;
 }

 angle = angle + .1;
 }

 if (retval != 0) {
 printf("Test failed  !!!\n"); 
 retval=1;
 } else {
 printf("Test passed !\n");
  }
 // Return 0 if the test passes
  return retval;
}

If the margin of difference is lowered to 0.00000005, this test bench highlights the margin of error during C/RTL co-simulation:


Difference 0.0000000596 at angle 1.1100001335
Difference 0.0000000596 at angle 1.2100001574
Difference 0.0000000596 at angle 1.5100002289
Difference 0.0000000596 at angle 1.6100002527
etc..

When using the standard C math libraries (math.h and cmath.h) create a “smart” test bench to verify any differences in accuracy are acceptable.

Verification Option 2: HLS Math Library and Validate Differences

An alternative verification option is to convert the source code to use the HLS math library. With this option, there are no differences between the C simulation and C/RTL co-simulation results. The following example shows how the code above is modified to use the hls_math.h library.

Note: This option is only available in C++.
  • Include the hls_math.h header file.
  • Replace the math functions with the equivalent hls:: function.
    #include <cmath>
    #include "hls_math.h"
    #include <fstream>
    #include <iostream>
    #include <iomanip>
    #include <cstdlib>
    using namespace std;
    
    typedef float data_t;
    
    data_t cpp_math(data_t angle) {
     data_t s = hls::sinf(angle);
     data_t c = hls::cosf(angle);
     return hls::sqrtf(s*s+c*c);
    }

Verification Option 3: HLS Math Library File and Validate Differences

Including the HLS math library file lib_hlsm.cpp as a design file ensures Vitis HLS uses the HLS math library for C simulation. This option is identical to option2 however it does not require the C code to be modified.

The HLS math library file is located in the src directory in the Vitis HLS installation area. Simply copy the file to your local folder and add the file as a standard design file.

Note: This option is only available in C++.

As with option 2, with this option there is now a difference between the C simulation results using the HLS math library file and those previously obtained without adding this file. These difference should be validated with C simulation using a “smart” test bench similar to option 1.

Common Synthesis Errors

The following are common use errors when synthesizing math functions. These are often (but not exclusively) caused by converting C functions to C++ to take advantage of synthesis for math functions.

C++ cmath.h

If the C++ cmath.h header file is used, the floating point functions (for example, sinf and cosf) can be used. These result in 32-bit operations in hardware. The cmath.h header file also overloads the standard functions (for example, sin and cos) so they can be used for float and double types.

C math.h

If the C math.h library is used, the single-precision functions (for example, sinf and cosf) are required to synthesize 32-bit floating point operations. All standard function calls (for example, sin and cos) result in doubles and 64-bit double-precision operations being synthesized.

Cautions

When converting C functions to C++ to take advantage of math.h support, be sure that the new C++ code compiles correctly before synthesizing with Vitis HLS. For example, if sqrtf() is used in the code with math.h, it requires the following code extern added to the C++ code to support it:


#include <math.h>
extern “C” float sqrtf(float);

To avoid unnecessary hardware caused by type conversion, follow the warnings on mixing double and float types discussed in Floats and Doubles.

HLS Stream Library

Streaming data is a type of data transfer in which data samples are sent in sequential order starting from the first sample. Streaming requires no address management.

Modeling designs that use streaming data can be difficult in C. The approach of using pointers to perform multiple read and/or write accesses can introduce issues, because there are implications for the type qualifier and how the test bench is constructed.

Vitis HLS provides a C++ template class hls::stream<> for modeling streaming data structures. The streams implemented with the hls::stream<> class have the following attributes.

  • In the C code, an hls::stream<> behaves like a FIFO of infinite depth. There is no requirement to define the size of an hls::stream<>.
  • They are read from and written to sequentially. That is, after data is read from an hls::stream<>, it cannot be read again.
  • An hls::stream<> on the top-level interface is by default implemented with an ap_fifo interface.
  • An hls::stream<> internal to the design is implemented as a FIFO with a depth of 2. The optimization directive STREAM is used to change this default size.

This section shows how the hls::stream<> class can more easily model designs with streaming data. The topics in this section provide:

  • An overview of modeling with streams and the RTL implementation of streams.
  • Rules for global stream variables.
  • How to use streams.
  • Blocking reads and writes.
  • Non-Blocking Reads and writes.
  • Controlling the FIFO depth.
Note: The hls::stream class should always be passed between functions as a C++ reference argument. For example, &my_stream.
IMPORTANT: The hls::stream class is only used in C++ designs. Array of streams is not supported.

C Modeling and RTL Implementation

Streams are modeled as an infinite queue in software (and in the test bench during RTL co-simulation). There is no need to specify any depth to simulate streams in C++. Streams can be used inside functions and on the interface to functions. Internal streams may be passed as function parameters.

Streams can be used only in C++ based designs. Each hls::stream<> object must be written by a single process and read by a single process.

If an hls::stream is used on the top-level interface, it is by default implemented in the RTL as a FIFO interface (ap_fifo) but may be optionally implemented as a handshake interface (ap_hs) or an AXI-Stream interface (axis).

If an hls::stream is used inside the design function and synthesized into hardware, it is implemented as a FIFO with a default depth of 2. In some cases, such as when interpolation is used, the depth of the FIFO might have to be increased to ensure the FIFO can hold all the elements produced by the hardware. Failure to ensure the FIFO is large enough to hold all the data samples generated by the hardware can result in a stall in the design (seen in C/RTL co-simulation and in the hardware implementation). The depth of the FIFO can be adjusted using the STREAM directive with the depth option. An example of this is provided in the example design hls_stream.

IMPORTANT: Ensure hls::stream variables are correctly sized when used in the default non-DATAFLOW regions.

If an hls::stream is used to transfer data between tasks (sub-functions or loops), you should immediately consider implementing the tasks in a DATAFLOW region where data streams from one task to the next. The default (non-DATAFLOW) behavior is to complete each task before starting the next task, in which case the FIFOs used to implement the hls::stream variables must be sized to ensure they are large enough to hold all the data samples generated by the producer task. Failure to increase the size of the hls::stream variables results in the error below:


ERROR: [XFORM 203-733] An internal stream xxxx.xxxx.V.user.V' with default size is 
used in a non-dataflow region, which may result in deadlock. Please consider to 
resize the stream using the directive 'set_directive_stream' or the 'HLS stream' 
pragma.

This error informs you that in a non-DATAFLOW region (the default FIFOs depth is 2) may not be large enough to hold all the data samples written to the FIFO by the producer task.

Global and Local Streams

Streams may be defined either locally or globally. Local streams are always implemented as internal FIFOs. Global streams can be implemented as internal FIFOs or ports:

  • Globally-defined streams that are only read from, or only written to, are inferred as external ports of the top-level RTL block.
  • Globally-defined streams that are both read from and written to (in the hierarchy below the top-level function) are implemented as internal FIFOs.

Streams defined in the global scope follow the same rules as any other global variables.

Using HLS Streams

To use hls::stream<> objects, include the header file hls_stream.h. Streaming data objects are defined by specifying the type and variable name. In this example, a 128-bit unsigned integer type is defined and used to create a stream variable called my_wide_stream.


#include "ap_int.h"
#include "hls_stream.h"

typedef ap_uint<128> uint128_t;  // 128-bit user defined type
hls::stream<uint128_t> my_wide_stream;  // A stream declaration

Streams must use scoped naming. Xilinx recommends using the scoped hls:: naming shown in the example above. However, if you want to use the hls namespace, you can rewrite the preceding example as:


#include <ap_int.h>
#include <hls_stream.h>
using namespace hls;

typedef ap_uint<128> uint128_t;  // 128-bit user defined type
stream<uint128_t> my_wide_stream;  // hls:: no longer required

Given a stream specified as hls::stream<T>, the type T may be:

  • Any C++ native data type
  • A Vitis HLS arbitrary precision type (for example, ap_int<>, ap_ufixed<>)
  • A user-defined struct containing either of the above types
Note: General user-defined classes (or structures) that contain methods (member functions) should not be used as the type (T) for a stream variable.

A stream can also be specified as hls::stream<Type, Depth>, where Depth indicates the depth of the FIFO needed in the verification adapter that the HLS tool creates for RTL co-simulation.

Streams may be optionally named. Providing a name for the stream allows the name to be used in reporting. For example, Vitis HLS automatically checks to ensure all elements from an input stream are read during simulation. Given the following two streams:


stream<uint8_t> bytestr_in1;
stream<uint8_t> bytestr_in2("input_stream2");

WARNING: Hls::stream 'hls::stream<unsigned char>.1' contains leftover data, which 
may result in RTL simulation hanging.
WARNING: Hls::stream 'input_stream2' contains leftover data, which may result in RTL 
simulation hanging.
Any warning on elements left in the streams are reported as follows, where
         it is clear which message relates to bytetr_in2:

When streams are passed into and out of functions, they must be passed-by-reference as in the following example:


   void stream_function (
         hls::stream<uint8_t> &strm_out,
         hls::stream<uint8_t> &strm_in,
        uint16_t strm_len
       )

Vitis HLS supports both blocking and non-blocking access methods.

  • Non-blocking accesses can be implemented only as FIFO interfaces.
  • Streaming ports that are implemented as ap_fifo ports and that are defined with an AXI4-Stream resource must not use non-blocking accesses.

A complete design example using streams is provided in the Vitis HLS examples. Refer to the hls_stream example in the design examples available from the GUI welcome screen.

Blocking Reads and Writes

The basic accesses to an hls::stream<> object are blocking reads and writes. These are accomplished using class methods. These methods stall (block) execution if a read is attempted on an empty stream FIFO, a write is attempted to a full stream FIFO, or until a full handshake is accomplished for a stream mapped to an ap_hs interface protocol.

A stall can be observed in C/RTL co-simulation as the continued execution of the simulator without any progress in the transactions. The following shows a classic example of a stall situation, where the RTL simulation time keeps increasing, but there is no progress in the inter or intra transactions:


// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ 
"Simulation Time"
///////////////////////////////////////////////////////////////////////////////////
// RTL Simulation : 0 / 1 [0.00%] @ "110000"
// RTL Simulation : 0 / 1 [0.00%] @ "202000"
// RTL Simulation : 0 / 1 [0.00%] @ "404000"
Blocking Write Methods

In this example, the value of variable src_var is pushed into the stream.


// Usage of void write(const T & wdata)

hls::stream<int> my_stream;
int src_var = 42;

my_stream.write(src_var);

The << operator is overloaded such that it may be used in a similar fashion to the stream insertion operators for C++ stream (for example, iostreams and filestreams). The hls::stream<> object to be written to is supplied as the left-hand side argument and the value to be written as the right-hand side.


// Usage of void operator << (T & wdata)

hls::stream<int> my_stream;
int src_var = 42;

my_stream << src_var;

Blocking Read Methods

This method reads from the head of the stream and assigns the values to the variable dst_var.


// Usage of void read(T &rdata)

hls::stream<int> my_stream;
int dst_var;

my_stream.read(dst_var);

Alternatively, the next object in the stream can be read by assigning (using for example =, +=) the stream to an object on the left-hand side:


// Usage of T read(void)

hls::stream<int> my_stream;

int dst_var = my_stream.read();

The '>>' operator is overloaded to allow use similar to the stream extraction operator for C++ stream (for example, iostreams and filestreams). The hls::stream is supplied as the LHS argument and the destination variable the RHS.


// Usage of void operator >> (T & rdata)

hls::stream<int> my_stream;
int dst_var;

my_stream >> dst_var;

Non-Blocking Reads and Writes

Non-blocking write and read methods are also provided. These allow execution to continue even when a read is attempted on an empty stream or a write to a full stream.

These methods return a Boolean value indicating the status of the access (true if successful, false otherwise). Additional methods are included for testing the status of an hls::stream<> stream.

IMPORTANT: Non-blocking behavior is only supported on interfaces using the ap_fifo protocol. More specifically, the AXI-Stream standard and the Xilinx ap_hs IO protocol do not support non-blocking accesses.

During C simulation, streams have an infinite size. It is therefore not possible to validate with C simulation if the stream is full. These methods can be verified only during RTL simulation when the FIFO sizes are defined (either the default size of 1, or an arbitrary size defined with the STREAM directive).

IMPORTANT: If the design is specified to use the block-level I/O protocol ap_ctrl_none and the design contains any hls::stream variables that employ non-blocking behavior, C/RTL co-simulation is not guaranteed to complete.
Non-Blocking Writes

This method attempts to push variable src_var into the stream my_stream, returning a boolean true if successful. Otherwise, false is returned and the queue is unaffected.


// Usage of void write_nb(const T & wdata)

hls::stream<int> my_stream;
int src_var = 42;

if (my_stream.write_nb(src_var)) {
 // Perform standard operations
 ...
} else {
 // Write did not occur
 return;
}

Fullness Test
bool full(void)

Returns true, if and only if the hls::stream<> object is full.

// Usage of bool full(void)

hls::stream<int> my_stream;
int src_var = 42;
bool stream_full;

stream_full = my_stream.full();
Non-Blocking Read
bool read_nb(T & rdata)

This method attempts to read a value from the stream, returning true if successful. Otherwise, false is returned and the queue is unaffected.

// Usage of void read_nb(const T & wdata)

hls::stream<int> my_stream;
int dst_var;

if (my_stream.read_nb(dst_var)) {
 // Perform standard operations
 ...
} else {
 // Read did not occur
 return;
}
Emptiness Test
bool empty(void)

Returns true if the hls::stream<> is empty.

// Usage of bool empty(void)

hls::stream<int> my_stream;
int dst_var;
bool stream_empty;

stream_empty = my_stream.empty();

The following example shows how a combination of non-blocking accesses and full/empty tests can provide error handling functionality when the RTL FIFOs are full or empty:

#include "hls_stream.h"
using namespace hls;

typedef struct {
   short    data;
   bool     valid;
   bool     invert;
} input_interface;

bool invert(stream<input_interface>& in_data_1,
            stream<input_interface>& in_data_2,
            stream<short>& output
  ) {
  input_interface in;
  bool full_n;

// Read an input value or return
  if (!in_data_1.read_nb(in))
      if (!in_data_2.read_nb(in))
          return false;

// If the valid data is written, return not-full (full_n) as true
  if (in.valid) {
    if (in.invert)
      full_n = output.write_nb(~in.data);
    else
      full_n = output.write_nb(in.data);
  }
  return full_n;
}

Controlling the RTL FIFO Depth

For most designs using streaming data, the default RTL FIFO depth of 2 is sufficient. Streaming data is generally processed one sample at a time.

For multirate designs in which the implementation requires a FIFO with a depth greater than 2, you must determine (and set using the STREAM directive) the depth necessary for the RTL simulation to complete. If the FIFO depth is insufficient, RTL co-simulation stalls.

Because stream objects cannot be viewed in the GUI directives pane, the STREAM directive cannot be applied directly in that pane.

Right-click the function in which an hls::stream<> object is declared (or is used, or exists in the argument list) to:

  • Select the STREAM directive.
  • Populate the variable field manually with name of the stream variable.

Alternatively, you can:

  • Specify the STREAM directive manually in the directives.tcl file, or
  • Add it as a pragma in source.

C/RTL Co-Simulation Support

The Vitis HLS C/RTL co-simulation feature does not support structures or classes containing hls::stream<> members in the top-level interface. Vitis HLS supports these structures or classes for synthesis.


typedef struct {
   hls::stream<uint8_t> a;
   hls::stream<uint16_t> b;
} strm_strct_t;

void dut_top(strm_strct_t indata, strm_strct_t outdata) { … }

These restrictions apply to both top-level function arguments and globally declared objects. If structs of streams are used for synthesis, the design must be verified using an external RTL simulator and user-created HDL test bench. There are no such restrictions on hls::stream<> objects with strictly internal linkage.

HLS IP Libraries

Vitis HLS provides C++ libraries to implement a number of Xilinx IP blocks. The C libraries allow the following Xilinx IP blocks to be directly inferred from the C++ source code ensuring a high-quality implementation in the FPGA.

Table 4. HLS IP Libraries
Library Header File Description
hls_fft.h Allows the Xilinx LogiCORE IP FFT to be simulated in C and implemented using the Xilinx LogiCORE block.
hls_fir.h Allows the Xilinx LogiCORE IP FIR to be simulated in C and implemented using the Xilinx LogiCORE block.
hls_dds.h Allows the Xilinx LogiCORE IP DDS to be simulated in C and implemented using the Xilinx LogiCORE block.
ap_shift_reg.h Provides a C++ class to implement a shift register which is implemented directly using a Xilinx SRL primitive.

FFT IP Library

The Xilinx FFT IP block can be called within a C++ design using the library hls_fft.h. This section explains how the FFT can be configured in your C++ code.

Note: Xilinx highly recommends that you review the Fast Fourier Transform LogiCORE IP Product Guide (PG109) for information on how to implement and use the features of the IP.

To use the FFT in your C++ code:

  1. Include the hls_fft.h library in the code
  2. Set the default parameters using the pre-defined struct hls::ip_fft::params_t
  3. Define the run time configuration
  4. Call the FFT function
  5. Optionally, check the run time status

The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.

First, include the FFT library in the source code. This header file resides in the include directory in the Vitis HLS installation area which is automatically searched when Vitis HLS executes.

#include "hls_fft.h"

Define the static parameters of the FFT. This includes such things as input width, number of channels, type of architecture. which do not change dynamically. The FFT library includes a parameterization struct hls::ip_fft::params_t, which can be used to initialize all static parameters with default values.

In this example, the default values for output ordering and the widths of the configuration and status ports are over-ridden using a user-defined struct param1 based on the pre-defined struct.

struct param1 : hls::ip_fft::params_t {
    static const unsigned ordering_opt = hls::ip_fft::natural_order;
    static const unsigned config_width = FFT_CONFIG_WIDTH;
    static const unsigned status_width = FFT_STATUS_WIDTH;
};

Define types and variables for both the run time configuration and run time status. These values can be dynamic and are therefore defined as variables in the C code which can change and are accessed through APIs.

typedef hls::ip_fft::config_t<param1> config_t;
typedef hls::ip_fft::status_t<param1> status_t;
config_t fft_config1;
status_t fft_status1;

Next, set the run time configuration. This example sets the direction of the FFT (Forward or Inverse) based on the value of variable “direction” and also set the value of the scaling schedule.

fft_config1.setDir(direction);
fft_config1.setSch(0x2AB);

Call the FFT function using the HLS namespace with the defined static configuration (param1 in this example). The function parameters are, in order, input data, output data, output status and input configuration.

hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);

Finally, check the output status. This example checks the overflow flag and stores the results in variable “ovflo”.

    *ovflo = fft_status1->getOvflo();

Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FFT.

FFT Static Parameters

The static parameters of the FFT define how the FFT is configured and specifies the fixed parameters such as the size of the FFT, whether the size can be changed dynamically, whether the implementation is pipelined or radix_4_burst_io.

The hls_fft.h header file defines a struct hls::ip_fft::params_t which can be used to set default values for the static parameters. If the default values are to be used, the parameterization struct can be used directly with the FFT function.


 hls::fft<hls::ip_fft::params_t >  
     (xn1, xk1, &fft_status1, &fft_config1);

A more typical use is to change some of the parameters to non-default values. This is performed by creating a new user-defined parameterization struct based on the default parameterization struct and changing some of the default values.

In the following example, a new user struct my_fft_config is defined with a new value for the output ordering (changed to natural_order). All other static parameters to the FFT use the default values.


struct my_fft_config : hls::ip_fft::params_t {
    static const unsigned ordering_opt = hls::ip_fft::natural_order;
};

hls::fft<my_fft_config >  
     (xn1, xk1, &fft_status1, &fft_config1);

The values used for the parameterization struct hls::ip_fft::params_t are explained in FFT Struct Parameters. The default values for the parameters and a list of possible values are provided in FFT Struct Parameter Values.

Note: Xilinx highly recommends that you review the LogiCORE IP Fast Fourier Transform Product Guide (PG109) for details on the parameters and the implication for their settings.
FFT Struct Parameters
Table 5. FFT Struct Parameters
Parameter Description
input_width Data input port width.
output_width Data output port width.
status_width Output status port width.
config_width Input configuration port width.
max_nfft The size of the FFT data set is specified as 1 << max_nfft.
has_nfft Determines if the size of the FFT can be run time configurable.
channels Number of channels.
arch_opt The implementation architecture.
phase_factor_width Configure the internal phase factor precision.
ordering_opt The output ordering mode.
ovflo Enable overflow mode.
scaling_opt Define the scaling options.
rounding_opt Define the rounding modes.
mem_data Specify using block or distributed RAM for data memory.
mem_phase_factors Specify using block or distributed RAM for phase factors memory.
mem_reorder Specify using block or distributed RAM for output reorder memory.
stages_block_ram Defines the number of block RAM stages used in the implementation.
mem_hybrid When block RAMs are specified for data, phase factor, or reorder buffer, mem_hybrid specifies where or not to use a hybrid of block and distributed RAMs to reduce block RAM count in certain configurations.
complex_mult_type Defines the types of multiplier to use for complex multiplications.
butterfly_type Defines the implementation used for the FFT butterfly.

When specifying parameter values which are not integer or boolean, the HLS FFT namespace should be used.

For example, the possible values for parameter butterfly_type in the following table are use_luts and use_xtremedsp_slices. The values used in the C program should be butterfly_type = hls::ip_fft::use_luts and butterfly_type = hls::ip_fft::use_xtremedsp_slices.

FFT Struct Parameter Values

The following table covers all features and functionality of the FFT IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.

Table 6. FFT Struct Parameter Values
Parameter C Type Default Value Valid Values
input_width unsigned 16 8-34
output_width unsigned 16 input_width to (input_width + max_nfft + 1)
status_width unsigned 8 Depends on FFT configuration
config_width unsigned 16 Depends on FFT configuration
max_nfft unsigned 10 3-16
has_nfft bool false True, False
channels unsigned 1 1-12
arch_opt unsigned pipelined_streaming_io

automatically_select

pipelined_streaming_io

radix_4_burst_io

radix_2_burst_io

radix_2_lite_burst_io

phase_factor_width unsigned 16 8-34
ordering_opt unsigned bit_reversed_order

bit_reversed_order

natural_order

ovflo bool true

false

true

scaling_opt unsigned scaled

scaled

unscaled

block_floating_point

rounding_opt unsigned truncation

truncation

convergent_rounding

mem_data unsigned block_ram

block_ram

distributed_ram

mem_phase_factors unsigned block_ram

block_ram

distributed_ram

mem_reorder unsigned block_ram

block_ram

distributed_ram

stages_block_ram unsigned

(max_nfft < 10) ? 0 :

(max_nfft - 9)

0-11
mem_hybrid bool false

false

true

complex_mult_type unsigned use_mults_resources

use_luts

use_mults_resources

use_mults_performance

butterfly_type unsigned use_luts

use_luts

use_xtremedsp_slices

FFT Runtime Configuration and Status

The FFT supports runtime configuration and runtime status monitoring through the configuration and status ports. These ports are defined as arguments to the FFT function, shown here as variables fft_status1 and fft_config1:


hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);

The runtime configuration and status can be accessed using the predefined structs from the FFT C library:

  • hls::ip_fft::config_t<param1>
  • hls::ip_fft::status_t<param1>
Note: In both cases, the struct requires the name of the static parameterization struct, shown in these examples as param1. Refer to the previous section for details on defining the static parameterization struct.

The runtime configuration struct allows the following actions to be performed in the C code:

  • Set the FFT length, if runtime configuration is enabled
  • Set the FFT direction as forward or inverse
  • Set the scaling schedule

The FFT length can be set as follows:

typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Set FFT length to 512 => log2(512) =>9
fft_config1-> setNfft(9);
IMPORTANT: The length specified during runtime cannot exceed the size defined by max_nfft in the static configuration.

The FFT direction can be set as follows:


typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Forward FFT
fft_config1->setDir(1);
// Inverse FFT 
fft_config1->setDir(0);

The FFT scaling schedule can be set as follows:


typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
fft_config1->setSch(0x2AB);

The output status port can be accessed using the pre-defined struct to determine:

  • If any overflow occurred during the FFT
  • The value of the block exponent

The FFT overflow mode can be checked as follows:


typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Check the overflow flag
bool *ovflo = fft_status1->getOvflo();

IMPORTANT: After each transaction completes, check the overflow status to confirm the correct operation of the FFT.

And the block exponent value can be obtained using:


typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Obtain the block exponent
unsigned int *blk_exp = fft_status1-> getBlkExp();

Using the FFT Function

The FFT function is defined in the HLS namespace and can be called as follows:


hls::fft<STATIC_PARAM> (
INPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY, 
OUTPUT_STATUS, 
INPUT_RUN_TIME_CONFIGURATION);

The STATIC_PARAM is the static parameterization struct that defines the static parameters for the FFT.

Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY and OUTPUT_DATA_ARRAY). In the final implementation, the ports on the FFT RTL block will be implemented as AXI4-Stream ports. Xilinx recommends always using the FFT function in a region using dataflow optimization (set_directive_dataflow), because this ensures the arrays are implemented as streaming arrays. An alternative is to specify both arrays as streaming using the set_directive_stream command.

IMPORTANT: The FFT cannot be used in a region which is pipelined. If high-performance operation is required, pipeline the loops or functions before and after the FFT then use dataflow optimization on all loops and functions in the region.

The data types for the arrays can be float or ap_fixed.


typedef float data_t;
complex<data_t> xn[FFT_LENGTH];
complex<data_t> xk[FFT_LENGTH];

To use fixed-point data types, the Vitis HLS arbitrary precision type ap_fixed should be used.


#include "ap_fixed.h"
typedef ap_fixed<FFT_INPUT_WIDTH,1> data_in_t;
typedef ap_fixed<FFT_OUTPUT_WIDTH,FFT_OUTPUT_WIDTH-FFT_INPUT_WIDTH+1> data_out_t;
#include <complex>
typedef hls::x_complex<data_in_t> cmpxData;
typedef hls::x_complex<data_out_t> cmpxDataOut;

In both cases, the FFT should be parameterized with the same correct data sizes. In the case of floating point data, the data widths will always be 32-bit and any other specified size will be considered invalid.

IMPORTANT: The input and output width of the FFT can be configured to any arbitrary value within the supported range. The variables which connect to the input and output parameters must be defined in increments of 8-bit. For example, if the output width is configured as 33-bit, the output variable must be defined as a 40-bit variable.

The multichannel functionality of the FFT can be used by using two-dimensional arrays for the input and output data. In this case, the array data should be configured with the first dimension representing each channel and the second dimension representing the FFT data.


typedef float data_t;
static complex<data_t> xn[CHANNEL][FFT_LENGTH];
static complex<data_t> xk[CHANELL][FFT_LENGTH];

The FFT core consumes and produces data as interleaved channels (for example, ch0-data0, ch1-data0, ch2-data0, etc, ch0-data1, ch1-data1, ch2-data2, etc.). Therefore, to stream the input or output arrays of the FFT using the same sequential order that the data was read or written, you must fill or empty the two-dimensional arrays for multiple channels by iterating through the channel index first, as shown in the following example:


cmpxData   in_fft[FFT_CHANNELS][FFT_LENGTH];
cmpxData  out_fft[FFT_CHANNELS][FFT_LENGTH];
 
// Write to FFT Input Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
 for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
 in_fft[j][i] = in.read().data;
 }
}
   
// Read from FFT Output Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
 for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
 out.data = out_fft[j][i];
 
 }
}

Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FFT.

FIR Filter IP Library

The Xilinx FIR IP block can be called within a C++ design using the library hls_fir.h. This section explains how the FIR can be configured in your C++ code.

Note: Xilinx highly recommends that you review the FIR Compiler LogiCORE IP Product Guide (PG149) for information on how to implement and use the features of the IP.

To use the FIR in your C++ code:

  1. Include the hls_fir.h library in the code.
  2. Set the static parameters using the pre-defined struct hls::ip_fir::params_t.
  3. Call the FIR function.
  4. Optionally, define a run time input configuration to modify some parameters dynamically.

The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.

First, include the FIR library in the source code. This header file resides in the include directory in the Vitis HLS installation area. This directory is automatically searched when Vitis HLS executes. There is no need to specify the path to this directory if compiling inside Vitis HLS.

#include "hls_fir.h"

Define the static parameters of the FIR. This includes such static attributes such as the input width, the coefficients, the filter rate (single, decimation, hilbert). The FIR library includes a parameterization struct hls::ip_fir::params_t which can be used to initialize all static parameters with default values.

In this example, the coefficients are defined as residing in array coeff_vec and the default values for the number of coefficients, the input width and the quantization mode are over-ridden using a user a user-defined struct myconfig based on the pre-defined struct.

struct myconfig : hls::ip_fir::params_t {
static const double coeff_vec[sg_fir_srrc_coeffs_len];
    static const unsigned num_coeffs = sg_fir_srrc_coeffs_len;
    static const unsigned input_width = INPUT_WIDTH; 
    static const unsigned quantization = hls::ip_fir::quantize_only;
};

Create an instance of the FIR function using the HLS namespace with the defined static parameters (myconfig in this example) and then call the function with the run method to execute the function. The function arguments are, in order, input data and output data.

static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out);

Optionally, a run time input configuration can be used. In some modes of the FIR, the data on this input determines how the coefficients are used during interleaved channels or when coefficient reloading is required. This configuration can be dynamic and is therefore defined as a variable. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).

When the run time input configuration is used, the FIR function is called with three arguments: input data, output data and input configuration.

// Define the configuration type
typedef ap_uint<8> config_t;
// Define the configuration variable
config_t fir_config = 8;
// Use the configuration in the FFT
static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out, &fir_config);

Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FIR.

FIR Static Parameters

The static parameters of the FIR define how the FIR IP is parameterized and specifies non-dynamic items such as the input and output widths, the number of fractional bits, the coefficient values, the interpolation and decimation rates. Most of these configurations have default values: there are no default values for the coefficients.

The hls_fir.h header file defines a struct hls::ip_fir::params_t that can be used to set the default values for most of the static parameters.

IMPORTANT: There are no defaults defined for the coefficients. Therefore, Xilinx does not recommend using the pre-defined struct to directly initialize the FIR. A new user defined struct which specifies the coefficients should always be used to perform the static parameterization.

In this example, a new user struct my_config is defined and with a new value for the coefficients. The coefficients are specified as residing in array coeff_vec. All other parameters to the FIR use the default values.

struct myconfig : hls::ip_fir::params_t {
    static const double coeff_vec[sg_fir_srrc_coeffs_len];
};
static hls::FIR<myconfig> fir1;
fir1.run(fir_in, fir_out);

FIR Static Parameters describes the parameters used for the parametrization struct hls::ip_fir::params_t. FIR Struct Parameter Values provides the default values for the parameters and a list of possible values.

Note: Xilinx highly recommends that you refer to the FIR Compiler LogiCORE IP Product Guide (PG149) for details on the parameters and the implication for their settings.
FIR Struct Parameters
Table 7. FIR Struct Parameters
Parameter Description
input_width Data input port width
input_fractional_bits Number of fractional bits on the input port
output_width Data output port width
output_fractional_bits Number of fractional bits on the output port
coeff_width Bit-width of the coefficients
coeff_fractional_bits Number of fractional bits in the coefficients
num_coeffs Number of coefficients
coeff_sets Number of coefficient sets
input_length Number of samples in the input data
output_length Number of samples in the output data
num_channels Specify the number of channels of data to process
total_num_coeff Total number of coefficients
coeff_vec[total_num_coeff] The coefficient array
filter_type The type implementation used for the filter
rate_change Specifies integer or fractional rate changes
interp_rate The interpolation rate
decim_rate The decimation rate
zero_pack_factor Number of zero coefficients used in interpolation
rate_specification Specify the rate as frequency or period
hardware_oversampling_rate Specify the rate of over-sampling
sample_period The hardware oversample period
sample_frequency The hardware oversample frequency
quantization The quantization method to be used
best_precision Enable or disable the best precision
coeff_structure The type of coefficient structure to be used
output_rounding_mode Type of rounding used on the output
filter_arch Selects a systolic or transposed architecture
optimization_goal Specify a speed or area goal for optimization
inter_column_pipe_length The pipeline length required between DSP columns
column_config Specifies the number of DSP module columns
config_method Specifies how the DSP module columns are configured
coeff_padding Number of zero padding added to the front of the filter

When specifying parameter values that are not integer or boolean, the HLS FIR namespace should be used.

For example the possible values for rate_change are shown in the following table to be integer and fixed_fractional. The values used in the C program should be rate_change = hls::ip_fir::integer and rate_change = hls::ip_fir::fixed_fractional.

FIR Struct Parameter Values

The following table covers all features and functionality of the FIR IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.

Table 8. FIR Struct Parameter Values
Parameter C Type Default Value Valid Values
input_width unsigned 16 No limitation
input_fractional_bits unsigned 0 Limited by size of input_width
output_width unsigned 24 No limitation
output_fractional_bits unsigned 0 Limited by size of output_width
coeff_width unsigned 16 No limitation
coeff_fractional_bits unsigned 0 Limited by size of coeff_width
num_coeffs bool 21 Full
coeff_sets unsigned 1 1-1024
input_length unsigned 21 No limitation
output_length unsigned 21 No limitation
num_channels unsigned 1 1-1024
total_num_coeff unsigned 21 num_coeffs * coeff_sets
coeff_vec[total_num_coeff] double array None Not applicable
filter_type unsigned single_rate single_rate, interpolation, decimation, hilbert_filter, interpolated
rate_change unsigned integer integer, fixed_fractional
interp_rate unsigned 1 1-1024
decim_rate unsigned 1 1-1024
zero_pack_factor unsigned 1 1-8
rate_specification unsigned period frequency, period
hardware_oversampling_rate unsigned 1 No Limitation
sample_period bool 1 No Limitation
sample_frequency unsigned 0.001 No Limitation
quantization unsigned integer_coefficients integer_coefficients, quantize_only, maximize_dynamic_range
best_precision unsigned false

false

true

coeff_structure unsigned non_symmetric inferred, non_symmetric, symmetric, negative_symmetric, half_band, hilbert
output_rounding_mode unsigned full_precision full_precision, truncate_lsbs, non_symmetric_rounding_down, non_symmetric_rounding_up, symmetric_rounding_to_zero, symmetric_rounding_to_infinity, convergent_rounding_to_even, convergent_rounding_to_odd
filter_arch unsigned systolic_multiply_accumulate systolic_multiply_accumulate, transpose_multiply_accumulate
optimization_goal unsigned area area, speed
inter_column_pipe_length unsigned 4 1-16
column_config unsigned 1 Limited by number of DSP48s used
config_method unsigned single single, by_channel
coeff_padding bool false

false

true

Using the FIR Function

The FIR function is defined in the HLS namespace and can be called as follows:


// Create an instance of the FIR 
static hls::FIR<STATIC_PARAM> fir1;
// Execute the FIR instance fir1
fir1.run(INPUT_DATA_ARRAY, OUTPUT_DATA_ARRAY);

The STATIC_PARAM is the static parameterization struct that defines most static parameters for the FIR.

Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY and OUTPUT_DATA_ARRAY). In the final implementation, these ports on the FIR IP will be implemented as AXI4-Stream ports. Xilinx recommends always using the FIR function in a region using the dataflow optimization (set_directive_dataflow), because this ensures the arrays are implemented as streaming arrays. An alternative is to specify both arrays as streaming using the set_directive_stream command.

IMPORTANT: The FIR cannot be used in a region which is pipelined. If high-performance operation is required, pipeline the loops or functions before and after the FIR then use dataflow optimization on all loops and functions in the region.

The multichannel functionality of the FIR is supported through interleaving the data in a single input and single output array.

  • The size of the input array should be large enough to accommodate all samples: num_channels * input_length.
  • The output array size should be specified to contain all output samples: num_channels * output_length.

The following code example demonstrates, for two channels, how the data is interleaved. In this example, the top-level function has two channels of input data (din_i, din_q) and two channels of output data (dout_i, dout_q). Two functions, at the front-end (fe) and back-end (be) are used to correctly order the data in the FIR input array and extract it from the FIR output array.


void dummy_fe(din_t din_i[LENGTH], din_t din_q[LENGTH], din_t out[FIR_LENGTH]) {
    for (unsigned i = 0; i < LENGTH; ++i) {
        out[2*i] = din_i[i];
        out[2*i + 1] = din_q[i];
    }
}
void dummy_be(dout_t in[FIR_LENGTH], dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {   
    for(unsigned i = 0; i < LENGTH; ++i) {
        dout_i[i] = in[2*i];
        dout_q[i] = in[2*i+1];
    }
}
void fir_top(din_t din_i[LENGTH], din_t din_q[LENGTH],
             dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {   

 din_t fir_in[FIR_LENGTH];
    dout_t fir_out[FIR_LENGTH];
    static hls::FIR<myconfig> fir1;

    dummy_fe(din_i, din_q, fir_in);
    fir1.run(fir_in, fir_out);
    dummy_be(fir_out, dout_i, dout_q);
}

Optional FIR Runtime Configuration

In some modes of operation, the FIR requires an additional input to configure how the coefficients are used. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).

This input configuration can be performed in the C code using a standard ap_int.h 8-bit data type. In this example, the header file fir_top.h specifies the use of the FIR and ap_fixed libraries, defines a number of the design parameter values and then defines some fixed-point types based on these:


#include "ap_fixed.h"
#include "hls_fir.h"

const unsigned FIR_LENGTH   = 21;
const unsigned INPUT_WIDTH = 16;
const unsigned INPUT_FRACTIONAL_BITS = 0;
const unsigned OUTPUT_WIDTH = 24;
const unsigned OUTPUT_FRACTIONAL_BITS = 0;
const unsigned COEFF_WIDTH = 16;
const unsigned COEFF_FRACTIONAL_BITS = 0;
const unsigned COEFF_NUM = 7;
const unsigned COEFF_SETS = 3;
const unsigned INPUT_LENGTH = FIR_LENGTH;
const unsigned OUTPUT_LENGTH = FIR_LENGTH;
const unsigned CHAN_NUM = 1;
typedef ap_fixed<INPUT_WIDTH, INPUT_WIDTH - INPUT_FRACTIONAL_BITS> s_data_t;
typedef ap_fixed<OUTPUT_WIDTH, OUTPUT_WIDTH - OUTPUT_FRACTIONAL_BITS> m_data_t;
typedef ap_uint<8> config_t;

In the top-level code, the information in the header file is included, the static parameterization struct is created using the same constant values used to specify the bit-widths, ensuring the C code and FIR configuration match, and the coefficients are specified. At the top-level, an input configuration, defined in the header file as 8-bit data, is passed into the FIR.


#include "fir_top.h"

struct param1 : hls::ip_fir::params_t {
    static const double coeff_vec[total_num_coeff];
    static const unsigned input_length = INPUT_LENGTH;
    static const unsigned output_length = OUTPUT_LENGTH;
    static const unsigned num_coeffs = COEFF_NUM;
    static const unsigned coeff_sets = COEFF_SETS;
};
const double param1::coeff_vec[total_num_coeff] = 
    {6,0,-4,-3,5,6,-6,-13,7,44,64,44,7,-13,-6,6,5,-3,-4,0,6};

void dummy_fe(s_data_t in[INPUT_LENGTH], s_data_t out[INPUT_LENGTH], 
                config_t* config_in, config_t* config_out)
{
    *config_out = *config_in;
    for(unsigned i = 0; i < INPUT_LENGTH; ++i)
        out[i] = in[i];
}

void dummy_be(m_data_t in[OUTPUT_LENGTH], m_data_t out[OUTPUT_LENGTH])
{
    for(unsigned i = 0; i < OUTPUT_LENGTH; ++i)
        out[i] = in[i];
}

// DUT
void fir_top(s_data_t in[INPUT_LENGTH],
             m_data_t out[OUTPUT_LENGTH],
             config_t* config)
{

    s_data_t fir_in[INPUT_LENGTH];
    m_data_t fir_out[OUTPUT_LENGTH];
    config_t fir_config;
    // Create struct for config
    static hls::FIR<param1> fir1;
    
    //==================================================
// Dataflow process
    dummy_fe(in, fir_in, config, &fir_config);
    fir1.run(fir_in, fir_out, &fir_config);
    dummy_be(fir_out, out);
    //==================================================
}

Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FIR.

DDS IP Library

You can use the Xilinx Direct Digital Synthesizer (DDS) IP block within a C++ design using the hls_dds.h library. This section explains how to configure DDS IP in your C++ code.

Note: Xilinx highly recommends that you review the LogiCORE IP DDS Compiler Product Guide (PG141) for information on how to implement and use the features of the IP.
IMPORTANT: The C IP implementation of the DDS IP core supports the fixed mode for the Phase_Increment and Phase_Offset parameters and supports the none mode for Phase_Offset, but it does not support programmable and streaming modes for these parameters.

To use the DDS in the C++ code:

  1. Include the hls_dds.h library in the code.
  2. Set the default parameters using the pre-defined struct hls::ip_dds::params_t.
  3. Call the DDS function.

First, include the DDS library in the source code. This header file resides in the include directory in the Vitis HLS installation area, which is automatically searched when Vitis HLS executes.


#include "hls_dds.h"

Define the static parameters of the DDS. For example, define the phase width, clock rate, and phase and increment offsets. The DDS C library includes a parameterization struct hls::ip_dds::params_t, which is used to initialize all static parameters with default values. By redefining any of the values in this struct, you can customize the implementation.

The following example shows how to override the default values for the phase width, clock rate, phase offset, and the number of channels using a user-defined struct param1, which is based on the existing predefined struct hls::ip_dds::params_t:


struct param1 : hls::ip_dds::params_t {
 static const unsigned Phase_Width = PHASEWIDTH;
 static const double   DDS_Clock_Rate = 25.0;
 static const double PINC[16];
 static const double POFF[16];
}; 

Create an instance of the DDS function using the HLS namespace with the defined static parameters (for example, param1). Then, call the function with the run method to execute the function. Following are the data and phase function arguments shown in order:


static hls::DDS<config1> dds1;
dds1.run(data_channel, phase_channel);

To access design examples that use the DDS C library, select Help > Welcome > Open Example Project > Design Examples > DDS.

DDS Static Parameters

The static parameters of the DDS define how to configure the DDS, such as the clock rate, phase interval, and modes. The hls_dds.h header file defines an hls::ip_dds::params_t struct, which sets the default values for the static parameters. To use the default values, you can use the parameterization struct directly with the DDS function.

static hls::DDS< hls::ip_dds::params_t > dds1;
dds1.run(data_channel, phase_channel);

The following table describes the parameters for the hls::ip_dds::params_t parameterization struct.

Note: Xilinx highly recommends that you review the DDS Compiler LogiCORE IP Product Guide (PG141) for details on the parameters and values.
Table 9. DDS Struct Parameters
Parameter Description
DDS_Clock_Rate Specifies the clock rate for the DDS output.
Channels Specifies the number of channels. The DDS and phase generator can support up to 16 channels. The channels are time-multiplexed, which reduces the effective clock frequency per channel.
Mode_of_Operation

Specifies one of the following operation modes:

Standard mode for use when the accumulated phase can be truncated before it is used to access the SIN/COS LUT.

Rasterized mode for use when the desired frequencies and system clock are related by a rational fraction.

Modulus

Describes the relationship between the system clock frequency and the desired frequencies.

Use this parameter in rasterized mode only.

Spurious_Free_Dynamic_Range Specifies the targeted purity of the tone produced by the DDS.
Frequency_Resolution Specifies the minimum frequency resolution in Hz and determines the Phase Width used by the phase accumulator, including associated phase increment (PINC) and phase offset (POFF) values.
Noise_Shaping Controls whether to use phase truncation, dithering, or Taylor series correction.
Phase_Width

Sets the width of the following:

PHASE_OUT field within m_axis_phase_tdata

Phase field within s_axis_phase_tdata when the DDS is configured to be a SIN/COS LUT only

Phase accumulator

Associated phase increment and offset registers

Phase field in s_axis_config_tdata

For rasterized mode, the phase width is fixed as the number of bits required to describe the valid input range [0, Modulus-1], that is, log2 (Modulus-1) rounded up.

Output_Width Sets the width of SINE and COSINE fields within m_axis_data_tdata. The SFDR provided by this parameter depends on the selected Noise Shaping option.
Phase_Increment Selects the phase increment value.
Phase_Offset Selects the phase offset value.
Output_Selection Sets the output selection to SINE, COSINE, or both in the m_axis_data_tdata bus.
Negative_Sine Negates the SINE field at run time.
Negative_Cosine Negates the COSINE field at run time.
Amplitude_Mode Sets the amplitude to full range or unit circle.
Memory_Type Controls the implementation of the SIN/COS LUT.
Optimization_Goal Controls whether the implementation decisions target highest speed or lowest resource.
DSP48_Use Controls the implementation of the phase accumulator and addition stages for phase offset, dither noise addition, or both.
Latency_Configuration Sets the latency of the core to the optimum value based upon the Optimization Goal.
Latency Specifies the manual latency value.
Output_Form Sets the output form to two’s complement or to sign and magnitude. In general, the output of SINE and COSINE is in two’s complement form. However, when quadrant symmetry is used, the output form can be changed to sign and magnitude.
PINC[XIP_DDS_CHANNELS_MAX] Sets the values for the phase increment for each output channel.
POFF[XIP_DDS_CHANNELS_MAX] Sets the values for the phase offset for each output channel.
DDS Struct Parameter Values

The following table shows the possible values for the hls::ip_dds::params_t parameterization struct parameters.

Table 10. DDS Struct Parameter Values
Parameter C Type Default Value Valid Values
DDS_Clock_Rate double 20.0 Any double value
Channels unsigned 1 1 to 16
Mode_of_Operation unsigned XIP_DDS_MOO_CONVENTIONAL

XIP_DDS_MOO_CONVENTIONAL truncates the accumulated phase.

XIP_DDS_MOO_RASTERIZED selects rasterized mode.

Modulus unsigned 200 129 to 256
Spurious_Free_Dynamic_Range double 20.0 18.0 to 150.0
Frequency_Resolution double 10.0 0.000000001 to 125000000
Noise_Shaping unsigned XIP_DDS_NS_NONE

XIP_DDS_NS_NONE produces phase truncation DDS.

XIP_DDS_NS_DITHER uses phase dither to improve SFDR at the expense of increased noise floor.

XIP_DDS_NS_TAYLOR interpolates sine/cosine values using the otherwise discarded bits from phase truncation

XIP_DDS_NS_AUTO automatically determines noise-shaping.

Phase_Width unsigned 16 Must be an integer multiple of 8
Output_Width unsigned 16 Must be an integer multiple of 8
Phase_Increment unsigned XIP_DDS_PINCPOFF_FIXED

XIP_DDS_PINCPOFF_FIXED fixes PINC at generation time, and PINC cannot be changed at run time.

This is the only value supported.

Phase_Offset unsigned XIP_DDS_PINCPOFF_NONE

XIP_DDS_PINCPOFF_NONE does not generate phase offset.

XIP_DDS_PINCPOFF_FIXED fixes POFF at generation time, and POFF cannot be changed at run time.

Output_Selection unsigned XIP_DDS_OUT_SIN_AND_COS

XIP_DDS_OUT_SIN_ONLY produces sine output only.

XIP_DDS_OUT_COS_ONLY produces cosine output only.

XIP_DDS_OUT_SIN_AND_COS produces both sin and cosine output.

Negative_Sine unsigned XIP_DDS_ABSENT

XIP_DDS_ABSENT produces standard sine wave.

XIP_DDS_PRESENT negates sine wave.

Negative_Cosine bool XIP_DDS_ABSENT

XIP_DDS_ABSENT produces standard sine wave.

XIP_DDS_PRESENT negates sine wave.

Amplitude_Mode unsigned XIP_DDS_FULL_RANGE

XIP_DDS_FULL_RANGE normalizes amplitude to the output width with the binary point in the first place. For example, an 8-bit output has a binary amplitude of 100000000 - 10 giving values between 01111110 and 11111110, which corresponds to just less than 1 and just more than -1 respectively.

XIP_DDS_UNIT_CIRCLE normalizes amplitude to half full range, that is, values range from 01000 .. (+0.5). to 110000 .. (-0.5).

Memory_Type unsigned XIP_DDS_MEM_AUTO

XIP_DDS_MEM_AUTO selects distributed ROM for small cases where the table can be contained in a single layer of memory and selects block ROM for larger cases.

XIP_DDS_MEM_BLOCK always uses block RAM.

XIP_DDS_MEM_DIST always uses distributed RAM.

Optimization_Goal unsigned XIP_DDS_OPTGOAL_AUTO

XIP_DDS_OPTGOAL_AUTO automatically selects the optimization goal.

XIP_DDS_OPTGOAL_AREA optimizes for area.

XIP_DDS_OPTGOAL_SPEED optimizes for performance.

DSP48_Use unsigned XIP_DDS_DSP_MIN

XIP_DDS_DSP_MIN implements the phase accumulator and the stages for phase offset, dither noise addition, or both in FPGA logic.

XIP_DDS_DSP_MAX implements the phase accumulator and the phase offset, dither noise addition, or both using DSP slices. In the case of single channel, the DSP slice can also provide the register to store programmable phase increment, phase offset, or both and thereby, save further fabric resources.

Latency_Configuration unsigned XIP_DDS_LATENCY_AUTO

XIP_DDS_LATENCY_AUTO automatically determines he latency.

XIP_DDS_LATENCY_MANUAL manually specifies the latency using the Latency option.

Latency unsigned 5 Any value
Output_Form unsigned XIP_DDS_OUTPUT_TWOS

XIP_DDS_OUTPUT_TWOS outputs two's complement.

XIP_DDS_OUTPUT_SIGN_MAG outputs signed magnitude.

PINC[XIP_DDS_CHANNELS_MAX] unsigned array {0} Any value for the phase increment for each channel
POFF[XIP_DDS_CHANNELS_MAX] unsigned array {0} Any value for the phase offset for each channel

SRL IP Library

C code is written to satisfy several different requirements: reuse, readability, and performance. Until now, it is unlikely that the C code was written to result in the most ideal hardware after high-level synthesis.

Like the requirements for reuse, readability, and performance, certain coding techniques or pre-defined constructs can ensure that the synthesis output results in more optimal hardware or to better model hardware in C for easier validation of the algorithm.

Mapping Directly into SRL Resources

Many C algorithms sequentially shift data through arrays. They add a new value to the start of the array, shift the existing data through array, and drop the oldest data value. This operation is implemented in hardware as a shift register.

This most common way to implement a shift register from C into hardware is to completely partition the array into individual elements, and allow the data dependencies between the elements in the RTL to imply a shift register.

Logic synthesis typically implements the RTL shift register into a Xilinx SRL resource, which efficiently implements shift registers. The issue is that sometimes logic synthesis does not implement the RTL shift register using an SRL component:

  • When data is accessed in the middle of the shift register, logic synthesis cannot directly infer an SRL.
  • Sometimes, even when the SRL is ideal, logic synthesis may implement the shift-resister in flip-flops, due to other factors. (Logic synthesis is also a complex process).

Vitis HLS provides a C++ class (ap_shift_reg) to ensure that the shift register defined in the C code is always implemented using an SRL resource. The ap_shift_reg class has two methods to perform the various read and write accesses supported by an SRL component.

Read from the Shifter

The read method allows a specified location to be read from the shifter register.

The ap_shift_reg.h header file that defines the ap_shift_reg class is also included with Vitis HLS as a standalone package. You have the right to use it in your own source code. The package xilinx_hls_lib_<release_number>.tgz is located in the include directory in the Vitis HLS installation area.

// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;

// Read location 2 of Sreg into var1
var1 = Sreg.read(2);

Read, Write, and Shift Data

A shift method allows a read, write, and shift operation to be performed.


// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;

// Read location 3 of Sreg into var1
// THEN shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3);

Read, Write, and Enable-Shift

The shift method also supports an enabled input, allowing the shift process to be controlled and enabled by a variable.


// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1, In1;
bool En;

// Read location 3 of Sreg into var1
// THEN if En=1 
// Shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3,En);

When using the ap_shift_reg class, Vitis HLS creates a unique RTL component for each shifter. When logic synthesis is performed, this component is synthesized into an SRL resource.