OpenOnload-201811-u1 ==================== This is a minor update to refresh the sfc net driver included in the Onload package and to provide fixes to various bugs found in openonload-201811. See the accompanying ChangeLog for a list of bugs fixed. Linux distribution support -------------------------- This package is supported on: - Red Hat Enterprise Linux 6.8 - 6.10 - Red Hat Enterprise Linux 7.4 - 7.6 - Red Hat Enterprise Linux 8.0 - SuSE Linux Enterprise Server 12 sp3 and sp4 - SuSE Linux Enterprise Server 15 and sp1 - Canonical Ubuntu Server LTS 18.04 - Canonical Ubuntu Server 19.04 - Debian 8 "Jessie" - Debian 9 "Stretch" - Linux kernels 3.0 - 5.0 Running control plane server as non-root user --------------------------------------------- The control plane server now optionally drops privileges. This can be configured using the ONLOAD_CPLANE_USER setting in /etc/sysconfig/openonload, or by passing the cplane_server_uid and cplane_server_gid parameters to the onload module. The server retains CAP_NET_ADMIN capability. This also means that the server depends on libcap library. Red Hat Enterprise Linux 8.0 ---------------------------- Users of RHEL 8 will need to ensure that the new Codeready-Builder repository is included in their system's repository list to provide the libpcap-devel build dependency. Onload Docker support has not been qualified for RHEL 8.0 in this release. OpenOnload-201811 ================= This is a feature release that adds new features to TCPDirect, extends architecture, NIC and OS support, provides improvements to Onload's performance in lossy network environments and includes many bug fixes. Preview-level support is provided for ARM's 64-bit architecture and support for AMD CPUs prior to the Zen architecture is deprecated. The notes below describe some of the new features. See the Onload user guide for full details of new features and associated configuration options. See the accompanying ChangeLog for a list of bugs fixed. Linux distribution support -------------------------- This package is supported on: - Red Hat Enterprise Linux 6.7 - 6.10 - Red Hat Enterprise Linux 7.3 - 7.6 - SuSE Linux Enterprise Server 12 sp3 and sp4* - SuSE Linux Enterprise Server 15 - Canonical Ubuntu Server LTS 18.04 - Canonical Ubuntu Server 18.10 - Debian 8 "Jessie" - Debian 9 "Stretch" - Linux kernels 3.0 - 4.19 * SLES 12 sp4 was only available in pre-release form at the time this OpenOnload release was tested. TCPDirect bonding ----------------- This release enables TCPDirect to send and receive traffic over a bonded interface. Bonded interfaces can now be specified using the interface attribute, as with regular interfaces. See the user guide for further details. The following bonding modes are supported: - Active-backup - LACP with layer2, layer2+3 and layer3+4 transmit hash policies TCPDirect bonding has the following constraints: - Bonded interfaces of up to 4 Solarflare network adapters are supported. - Failover is supported, however bonds may not be otherwise reconfigured while TCPDirect is running. - TX Alternatives must be disabled in stacks using bonding. - All interfaces in the bond must be in the same network namespace. - VLAN, IPVLAN and MACVLAN interfaces created over a bond are not supported. - As with Onload, bonding of VLAN, IPVLAN and MACVLAN interfaces is not supported. TCPDirect transmit timestamping ------------------------------- Transmit timestamping support was added to TCPDirect and can be enabled by specifying tx_timestamping attribute. Transmit timestamps can be retrieved by calling zft_get_tx_timestamps for TCP and zfut_get_tx_timestamps for UDP zockets. Please see the TCPDirect user guide and zftcppingpong/zfudppingpong sample applications for more details. Onload TCP performance in lossy network environments ---------------------------------------------------- This release makes several improvements to Onload's TCP core in the presence of loss and reordering, as can be the case, for example, where the route to the peer traverses the internet. - EF_TAIL_DROP_PROBE Classical TCP implementations recover poorly from the case where the last segment(s) in flight are dropped. This results in no visible gap in sequence space, and so there is nothing to trigger fast retransmissions; instead, the segments are retransmitted by the RTO mechanism. In order to attempt to trigger a fast retransmission in the case where such tail-loss is suspected, a "tail-drop probe" segment can be sent after a short timeout. This segment would either be the next segment due to be transmitted, or an opportunistic retransmission of the most recent in-flight segment. Previous releases of Onload had a tail-drop probe implementation, but it was not compiled in by default. In this release, the mechanism has been rewritten, and is now built by default. Its use is controlled at runtime by the EF_TAIL_DROP_PROBE environment variable, which previously defaulted to 0 (off) but now defaults to the value read from the kernel configuration at /proc/sys/net/ipv4/tcp_early_retrans, which defaults to on. - EF_TCP_EARLY_RETRANSMIT This release implements the Early Retransmit (RFC 5827) algorithm for TCP, and also the Limited Transmit (RFC 3042) algorithm, on which Early Retransmit depends. As for tail-drop probes, the purpose of these algorithms is to allow fast retransmissions to happen more readily. The use of these algorithms is controlled by the EF_TCP_EARLY_RETRANSMIT environment variable, whose default value is read from the kernel configuration at /proc/sys/net/ipv4/tcp_early_retrans. - SACK improvements Selective acknowledgments received from the peer are now used to grow the congestion window more aggressively when recovering from loss. Onload initial sequence number caching -------------------------------------- Applications which rapidly open and close a large number of connections to other machines may experience occasional connection failures due to the rapid reuse of TCP sequence numbers being detected as retransmits in the TIME-WAIT state. This is most commonly a problem with Windows and FreeBSD TCP stacks. The standard RFC-derived algorithm for avoiding this problem relies on a clock ticking at a rate which is faster than bytes are transmitted. A link running at 100Mb can theoretically transmit faster than the clock can tick, however, and 10Gb+ links can practically do this. Onload has added the EF_TCP_ISN_MODE option to provide a solution. The default "clocked" setting uses the standard best-effort algorithm. The "clocked+cache" setting will store the last sequence number used for every remote endpoint to guarantee that the problem is avoided. This mode is recommended for applications such as proxies which rapidly open and close connections to a variety of unknown, third-party servers. The following settings may be used to fine-tune the clocked+cache mode: - EF_TCP_ISN_CACHE_SIZE Number of entries to allocate in the cache of remote endpoints. The default value of 0 selects a size automatically. - EF_TCP_ISN_INCLUDE_PASSIVE Store data for closed passively-opened connections in the cache. This data would only be needed by an application which closed its listening socket and continued to run, so the option is disabled by default - EF_TCP_ISN_OFFSET Distance by which to step the initial sequence number of new connections relative to the previous connection. Only extremely specialized applications would consider changing the default. - EF_TCP_ISN_2MSL Maximum amount of time that any remote TCP stack's implementation will leave a socket in the TIME-WAIT state. This is configurable in many systems, however the default value of 240 seconds is a maximum common value across a variety of operating systems. Other Onload configuration options changes ------------------------------------------ In addition to those already mentioned, this release of Onload adds or modifies the following environmental options: - EF_TCP_URG_MODE New option to allow disabling TCP urgent data processing. Urgent data is a rarely-used feature which is inconsistently implemented on various operating systems, however applications which are written to the Linux convention will experience corrupt data if they use the "ignore" setting and actually receive urgent data. - EF_TCP_TIME_WAIT_ASSASSINATION New option to implement the RFC 1337 behaviour of replacing old TIME-WAIT sockets with newly-received incoming connections. The default value is read from /proc/sys/net/ipv4/tcp_rfc1337. - EF_TCP_SHARED_LOCAL_PORTS_PER_IP_MAX New option to fine-tune the shared local ports feature. Sets the maximum size of the pool of local shared ports for given local IP address. When used with scalable RSS mode this setting limits the total number within the cluster. - EF_TCP_SHARED_LOCAL_PORTS_STEP New option to fine-tune the shared local ports feature. Controls the number of ports allocated when expanding the pool of shared local ports. TCPDirect errors issued by newer C++ compilers ---------------------------------------------- Applications using TCPDirect may fail to build with g++ version 6 and above with the message "error: flexible array member 'zft_msg::iov' not at end of 'struct my_msg'". To work around this issue, application code may be modified from struct my_msg { zft_msg msg; iovec iov[1]; }; to typedef struct { zft_msg msg; iovec iov[1]; } my_msg;