fi_rxm(7) | @VERSION@ | fi_rxm(7) |
fi_rxm - The RxM (RDM over MSG) Utility Provider
The RxM provider (ofi_rxm) is an utility provider that supports FI_EP_RDM type endpoint emulated over FI_EP_MSG type endpoint(s) of an underlying core provider. FI_EP_RDM endpoints have a reliable datagram interface and RxM emulates this by hiding the connection management of underlying FI_EP_MSG endpoints from the user. Additionally, RxM can hide memory registration requirement from a core provider like verbs if the apps don't support it.
RxM provider requires the core provider to support the following features:
Since RxM emulates RDM endpoints by hiding connection management and connections are established only on-demand (when app tries to send data), the first several data transfer calls would return EAGAIN. Applications should be aware of this and retry until the operation succeeds.
If an application has chosen manual progress for data progress, it should also read the CQ so that the connection establishment progresses. Not doing so would result in a stall. See also the ERRORS section in fi_msg(3).
The RxM provider currently supports FI_MSG, FI_TAGGED, FI_RMA and FI_ATOMIC capabilities.
When using RxM provider, some limitations from the underlying MSG provider could also show up. Please refer to the corresponding MSG provider man pages to find about those limitations.
RxM provider does not support the following features:
When sending large messages, an app doing an sread or waiting on the CQ file descriptor may not get a completion when reading the CQ after being woken up from the wait. The app has to do sread or wait on the file descriptor again. This is needed because RxM uses a rendezvous protocol for large message sends. An app would get woken up from waiting on CQ fd when rendezvous protocol request completes but it would have to wait again to get an ACK from the receiver indicating completion of large message transfer by remote RMA read.
The FI_ATOMIC capability will only be listed in the fi_info if the fi_info hints parameter specifies FI_ATOMIC. If FI_ATOMIC is requested, message order FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_WAR, FI_ORDER_WAW, FI_ORDER_SAR, and FI_ORDER_SAW can not be supported.
The ofi_rxm provider checks for the following environment variables.
To optimize for bandwidth, ensure you use higher values than default for FI_OFI_RXM_TX_SIZE, FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE, FI_OFI_RXM_MSG_RX_SIZE subject to memory limits of the system and the tx and rx sizes supported by the MSG provider.
FI_OFI_RXM_SAR_LIMIT is another knob that can be experimented with to optimze for bandwidth.
To conserve memory, ensure FI_UNIVERSE_SIZE set to what is required. Similarly check that FI_OFI_RXM_TX_SIZE, FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE and FI_OFI_RXM_MSG_RX_SIZE env variables are set to only required values.
The data transfer API may return -FI_EAGAIN during on-demand connection setup of the core provider FI_MSG_EP. See fi_msg(3) for a detailed description of handling FI_EAGAIN.
If an RxM endpoint is expected to communicate with more peers than the default value of FI_UNIVERSE_SIZE (256) CQ overruns can happen. To avoid this set a higher value for FI_UNIVERSE_SIZE. CQ overrun can make a MSG endpoint unusable.
At higher # of ranks, there may be connection errors due to a node running out of memory. The workaround is to use shared receive contexts for the MSG provider (FI_OFI_RXM_USE_SRX=1) or reduce eager message size (FI_OFI_RXM_BUFFER_SIZE) and MSG provider TX/RX queue sizes (FI_OFI_RXM_MSG_TX_SIZE / FI_OFI_RXM_MSG_RX_SIZE).
fabric(7), fi_provider(7), fi_getinfo(3)
OpenFabrics.
2020-06-06 | Libfabric Programmer's Manual |