IO_URING_SETUP(2) | Linux Programmer's Manual | IO_URING_SETUP(2) |
io_uring_setup - setup a context for performing asynchronous I/O
#include <linux/io_uring.h>
int io_uring_setup(u32 entries, struct io_uring_params *p);
The io_uring_setup() system call sets up a submission queue (SQ) and completion queue (CQ) with at least entries entries, and returns a file descriptor which can be used to perform subsequent operations on the io_uring instance. The submission and completion queues are shared between userspace and the kernel, which eliminates the need to copy data when initiating and completing I/O.
params is used by the application to pass options to the kernel, and by the kernel to convey information about the ring buffers.
struct io_uring_params {
__u32 sq_entries;
__u32 cq_entries;
__u32 flags;
__u32 sq_thread_cpu;
__u32 sq_thread_idle;
__u32 features;
__u32 resv[4];
struct io_sqring_offsets sq_off;
struct io_cqring_offsets cq_off; };
The flags, sq_thread_cpu, and sq_thread_idle fields are used to configure the io_uring instance. flags is a bit mask of 0 or more of the following values ORed together:
If the kernel thread is idle for more than sq_thread_idle milliseconds, it will set the IORING_SQ_NEED_WAKEUP bit in the flags field of the struct io_sq_ring. When this happens, the application must call io_uring_enter(2) to wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard the io_uring_enter(2) call with the following code sequence:
/*
* Ensure that the wakeup flag is read after the tail pointer has been
* written.
*/ smp_mb(); if (*sq_ring->flags & IORING_SQ_NEED_WAKEUP)
io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);
where sq_ring is a submission queue ring setup using the struct io_sqring_offsets described below.
If no flags are specified, the io_uring instance is setup for interrupt driven I/O. I/O may be submitted using io_uring_enter(2) and can be reaped by polling the completion queue.
The resv array must be initialized to zero.
features is filled in by the kernel, which specifies various features supported by current kernel version.
The rest of the fields in the struct io_uring_params are filled in by the kernel, and provide the information necessary to memory map the submission queue, completion queue, and the array of submission queue entries. sq_entries specifies the number of submission queue entries allocated. sq_off describes the offsets of various ring buffer fields:
struct io_sqring_offsets {
__u32 head;
__u32 tail;
__u32 ring_mask;
__u32 ring_entries;
__u32 flags;
__u32 dropped;
__u32 array;
__u32 resv[3]; };
Taken together, sq_entries and sq_off provide all of the information necessary for accessing the submission queue ring buffer and the submission queue entry array. The submission queue can be mapped with a call like:
ptr = mmap(0, sq_off.array + sq_entries * sizeof(__u32),
PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
ring_fd, IORING_OFF_SQ_RING);
where sq_off is the io_sqring_offsets structure, and ring_fd is the file descriptor returned from io_uring_setup(2). The addition of sq_off.array to the length of the region accounts for the fact that the ring located at the end of the data structure. As an example, the ring buffer head pointer can be accessed by adding sq_off.head to the address returned from mmap(2):
head = ptr + sq_off.head;
The flags field is used by the kernel to communicate state information to the application. Currently, it is used to inform the application when a call to io_uring_enter(2) is necessary. See the documentation for the IORING_SETUP_SQPOLL flag above. The dropped member is incremented for each invalid submission queue entry encountered in the ring buffer.
The head and tail track the ring buffer state. The tail is incremented by the application when submitting new I/O, and the head is incremented by the kernel when the I/O has been successfully submitted. Determining the index of the head or tail into the ring is accomplished by applying a mask:
index = tail & ring_mask;
The array of submission queue entries is mapped with:
sqentries = mmap(0, sq_entries * sizeof(struct io_uring_sqe),
PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
ring_fd, IORING_OFF_SQES);
The completion queue is described by cq_entries and cq_off shown here:
struct io_cqring_offsets {
__u32 head;
__u32 tail;
__u32 ring_mask;
__u32 ring_entries;
__u32 overflow;
__u32 cqes;
__u32 flags;
__u32 resv[3]; };
The completion queue is simpler, since the entries are not separated from the queue itself, and can be mapped with:
ptr = mmap(0, cq_off.cqes + cq_entries * sizeof(struct io_uring_cqe),
PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, ring_fd,
IORING_OFF_CQ_RING);
Closing the file descriptor returned by io_uring_setup(2) will free all resources associated with the io_uring context.
io_uring_setup(2) returns a new file descriptor on success. The application may then provide the file descriptor in a subsequent mmap(2) call to map the submission and completion queues, or to the io_uring_register(2) or io_uring_enter(2) system calls.
On error, -1 is returned and errno is set appropriately.
2019-01-29 | Linux |