fi_poll(3) | @VERSION@ | fi_poll(3) |
fi_poll - Polling and wait set operations
#include <rdma/fi_domain.h> int fi_poll_open(struct fid_domain *domain, struct fi_poll_attr *attr, struct fid_poll **pollset); int fi_close(struct fid *pollset); int fi_poll_add(struct fid_poll *pollset, struct fid *event_fid, uint64_t flags); int fi_poll_del(struct fid_poll *pollset, struct fid *event_fid, uint64_t flags); int fi_poll(struct fid_poll *pollset, void **context, int count); int fi_wait_open(struct fid_fabric *fabric, struct fi_wait_attr *attr, struct fid_wait **waitset); int fi_close(struct fid *waitset); int fi_wait(struct fid_wait *waitset, int timeout); int fi_trywait(struct fid_fabric *fabric, struct fid **fids, size_t count); int fi_control(struct fid *waitset, int command, void *arg);
fi_poll_open creates a new polling set. A poll set enables an optimized method for progressing asynchronous operations across multiple completion queues and counters and checking for their completions.
A poll set is defined with the following attributes.
struct fi_poll_attr { uint64_t flags; /* operation flags */ };
The fi_close call releases all resources associated with a poll set. The poll set must not be associated with any other resources prior to being closed, otherwise the call will return -FI_EBUSY.
Associates a completion queue or counter with a poll set.
Removes a completion queue or counter from a poll set.
Progresses all completion queues and counters associated with a poll set and checks for events. If events might have occurred, contexts associated with the completion queues and/or counters are returned. Completion queues will return their context if they are not empty. The context associated with a counter will be returned if the counter's success value or error value have changed since the last time fi_poll, fi_cntr_set, or fi_cntr_add were called. The number of contexts is limited to the size of the context array, indicated by the count parameter.
Note that fi_poll only indicates that events might be available. In some cases, providers may consume such events internally, to drive progress, for example. This can result in fi_poll returning false positives. Applications should drive their progress based on the results of reading events from a completion queue or reading counter values. The fi_poll function will always return all completion queues and counters that do have new events.
fi_wait_open allocates a new wait set. A wait set enables an optimized method of waiting for events across multiple completion queues and counters. Where possible, a wait set uses a single underlying wait object that is signaled when a specified condition occurs on an associated completion queue or counter.
The properties and behavior of a wait set are defined by struct fi_wait_attr.
struct fi_wait_attr { enum fi_wait_obj wait_obj; /* requested wait object */ uint64_t flags; /* operation flags */ };
The fi_close call releases all resources associated with a wait set. The wait set must not be bound to any other opened resources prior to being closed, otherwise the call will return -FI_EBUSY.
Waits on a wait set until one or more of its underlying wait objects is signaled.
The fi_trywait call was introduced in libfabric version 1.3. The behavior of using native wait objects without the use of fi_trywait is provider specific and should be considered non-deterministic.
The fi_trywait() call is used in conjunction with native operating system calls to block on wait objects, such as file descriptors. The application must call fi_trywait and obtain a return value of FI_SUCCESS prior to blocking on a native wait object. Failure to do so may result in the wait object not being signaled, and the application not observing the desired events. The following pseudo-code demonstrates the use of fi_trywait in conjunction with the OS select(2) call.
fi_control(&cq->fid, FI_GETWAIT, (void *) &fd); FD_ZERO(&fds); FD_SET(fd, &fds); while (1) { if (fi_trywait(&cq, 1) == FI_SUCCESS) select(fd + 1, &fds, NULL, &fds, &timeout); do { ret = fi_cq_read(cq, &comp, 1); } while (ret > 0); }
fi_trywait() will return FI_SUCCESS if it is safe to block on the wait object(s) corresponding to the fabric descriptor(s), or -FI_EAGAIN if there are events queued on the fabric descriptor or if blocking could hang the application.
The call takes an array of fabric descriptors. For each wait object that will be passed to the native wait routine, the corresponding fabric descriptor should first be passed to fi_trywait. All fabric descriptors passed into a single fi_trywait call must make use of the same underlying wait object type.
The following types of fabric descriptors may be passed into fi_trywait: event queues, completion queues, counters, and wait sets. Applications that wish to use native wait calls should select specific wait objects when allocating such resources. For example, by setting the item's creation attribute wait_obj value to FI_WAIT_FD.
In the case the wait object to check belongs to a wait set, only the wait set itself needs to be passed into fi_trywait. The fabric resources associated with the wait set do not.
On receiving a return value of -FI_EAGAIN from fi_trywait, an application should read all queued completions and events, and call fi_trywait again before attempting to block. Applications can make use of a fabric poll set to identify completion queues and counters that may require processing.
The fi_control call is used to access provider or implementation specific details of a fids that support blocking calls, such as wait sets, completion queues, counters, and event queues. Access to the wait set or fid should be serialized across all calls when fi_control is invoked, as it may redirect the implementation of wait set operations. The following control commands are usable with a wait set or fid.
Returns FI_SUCCESS on success. On error, a negative value corresponding to fabric errno is returned.
Fabric errno values are defined in rdma/fi_errno.h.
In many situations, blocking calls may need to wait on signals sent to a number of file descriptors. For example, this is the case for socket based providers, such as tcp and udp, as well as utility providers such as multi-rail. For simplicity, when epoll is available, it can be used to limit the number of file descriptors that an application must monitor. The use of epoll may also be required in order to support FI_WAIT_FD.
However, in order to support waiting on multiple file descriptors on systems where epoll support is not available, or where epoll performance may negatively impact performance, FI_WAIT_POLLFD provides this mechanism. A significant different between using POLLFD versus FD wait objects is that with FI_WAIT_POLLFD, the file descriptors may change dynamically. As an example, the file descriptors associated with a completion queues' wait set may change as endpoint associations with the CQ are added and removed.
Struct fi_wait_pollfd is used to retrieve all file descriptors for fids using FI_WAIT_POLLFD to support blocking calls.
struct fi_wait_pollfd { uint64_t change_index; size_t nfds; struct pollfd *fd; };
The change_index is updated only when the file descriptors associated with the pollfd file set has changed. Checking the change_index is an additional step needed when working with FI_WAIT_POLLFD wait objects directly. The use of the fi_trywait() function is still required if accessing wait objects directly.
fi_getinfo(3), fi_domain(3), fi_cntr(3), fi_eq(3)
OpenFabrics.
2020-04-14 | Libfabric Programmer's Manual |