Does io_uring work with buffered (non-O_DIRECT) file I/O?

Yes — unlike Linux AIO, io_uring genuinely supports buffered reads and writes. Operations that cannot complete immediately are handed to an internal async worker pool rather than silently falling back to blocking in the calling thread.

Is io_uring safe to enable on a public-facing server?

With care. Set `kernel.io_uring_disabled=1` to limit access to privileged processes, keep your kernel patched (several CVEs landed between 2022–2024), and block the three io_uring syscalls in seccomp profiles for any untrusted workloads such as containers running user-supplied code.

What applications already use io_uring in production?

NGINX (since 1.25.x with a patch), RocksDB, Tokio (the async Rust runtime via tokio-uring), PostgreSQL (experimental), and several high-performance storage daemons like Ceph's Crimson OSD all have io_uring backends or are actively developing them.

How do fixed buffers and registered files help performance?

Every normal read/write forces the kernel to validate, pin, and unpin user-space memory pages and look up the file descriptor on each call. Registering buffers and files once amortises that cost across thousands of operations, saving measurable CPU time at very high IOPS rates.

Can io_uring replace epoll for network servers?

Effectively yes. Since kernel 5.6, io_uring supports accept, connect, send, and recv, and multishot accept (5.19+) lets a single SQE continuously produce new connections without re-arming. Libraries like liburing and frameworks like Glommio are designed around exactly this model.

An Introduction to io_uring

Linux I/O has always been a story of trade-offs. Traditional blocking calls are simple but waste threads; aio was non-blocking but awkward, incomplete, and riddled with silent fallbacks to blocking behaviour. io_uring, merged in Linux 5.1 (2019) and hardened through subsequent kernels, solves both problems with a lock-free shared-memory ring buffer between user space and the kernel. The result is lower syscall overhead, true async I/O for almost any file type, and throughput numbers that close the gap between software and raw hardware limits. This guide explains the mechanics, shows how to measure performance with fio, and covers the security considerations every sysadmin needs to know before enabling advanced features.

Why io_uring Exists

Before io_uring, the async I/O landscape looked like this:

POSIX AIO: thread-pool based in glibc; not truly async at the kernel level for buffered I/O.
Linux AIO (io_submit): works for O_DIRECT reads and writes on block devices, but not for sockets, pipes, or buffered files. Any path that couldn't satisfy the request immediately fell back to blocking.
epoll + non-blocking I/O: genuinely async for sockets, but requires two syscalls per operation (wait, then read/write) and does not help with file I/O at all.

io_uring replaces all of these with a single, unified interface. A submission queue entry (SQE) describes any operation — read, write, accept, send, fsync, openat, even splice and statx. The kernel drains the submission queue, does the work (possibly on an internal async worker pool), and writes a completion queue entry (CQE). User space polls for completions without entering the kernel at all when the SQPOLL feature is active.

Request Flow in Detail

The Two Rings

io_uring_setup(2) allocates two ring buffers in memory shared between kernel and user space. The Submission Queue (SQ) is an array of indices into an SQE array; user space writes SQEs and advances the tail. The Completion Queue (CQ) is written by the kernel; user space reads CQEs and advances the head. Crucially, both sides read the other's head/tail via atomic loads — no locking, no context switch required for the common path.

Syscall Reduction

With a plain setup, submitting work costs one io_uring_enter(2) syscall per batch, regardless of batch size. Enable IORING_SETUP_SQPOLL and the kernel spawns a dedicated thread that polls the SQ; user space never calls into the kernel at all during steady-state operation. The trade-off is a CPU core burning at ~100 % while idle unless you tune sq_thread_idle (milliseconds before the poller sleeps).

Fixed Buffers and Registered Files

Every time you pass a user-space buffer pointer through a normal read/write, the kernel must validate and pin those pages. Register buffers once with io_uring_register(2) using IORING_REGISTER_BUFFERS, and subsequent IORING_OP_READ_FIXED operations skip that overhead entirely. Similarly, IORING_REGISTER_FILES pre-registers file descriptors so each SQE references a slot index rather than triggering repeated fdget calls.

Kernel Version Requirements

The feature set grew rapidly across kernel versions. Use at least:

Kernel	Notable addition
5.1	Initial merge, basic read/write/fsync
5.4	Fixed buffers, SQPOLL stabilised
5.6	Socket ops: send, recv, accept, connect
5.11	Registered ring fds, multishot accept
5.19 / 6.0	IORING_SETUP_DEFER_TASKRUN, zero-copy send
6.1 (LTS)	io_uring_passthrough for NVMe, hardened unprivileged restrictions

Check your running kernel: uname -r. Ubuntu 24.04 LTS ships 6.8; Fedora 40 ships 6.8–6.9; RHEL 9 ships 5.14 (backports included, verify with grep io_uring /boot/config-$(uname -r)).

Benchmarking with fio

Install fio

Debian/Ubuntu:

sudo apt install fio

Fedora/RHEL:

sudo dnf install fio

Arch:

sudo pacman -S fio

Baseline: libaio vs io_uring (random read, O_DIRECT)

Run both engines against the same NVMe device. Replace /dev/nvme0n1 with your target — this writes to the device; use a scratch disk or a test file path.

sudo fio \
  --name=libaio-randread \
  --ioengine=libaio \
  --filename=/dev/nvme0n1 \
  --rw=randread \
  --bs=4k \
  --iodepth=128 \
  --numjobs=4 \
  --direct=1 \
  --runtime=30 \
  --time_based \
  --group_reporting

sudo fio \
  --name=iou-randread \
  --ioengine=io_uring \
  --filename=/dev/nvme0n1 \
  --rw=randread \
  --bs=4k \
  --iodepth=128 \
  --numjobs=4 \
  --direct=1 \
  --runtime=30 \
  --time_based \
  --group_reporting

On a mid-range NVMe (Samsung 980 Pro class), expect libaio to saturate around 650 k IOPS and io_uring to push 750–800 k IOPS at the same queue depth, primarily because of reduced per-request overhead. Results vary significantly by device, CPU, and queue depth.

SQPOLL mode (zero-syscall path)

sudo fio \
  --name=iou-sqpoll \
  --ioengine=io_uring \
  --filename=/dev/nvme0n1 \
  --rw=randread \
  --bs=4k \
  --iodepth=128 \
  --numjobs=4 \
  --direct=1 \
  --runtime=30 \
  --time_based \
  --sqpoll_cpu=0 \
  --group_reporting

Pin the SQPOLL thread to an isolated CPU (sqpoll_cpu) and ensure that core is removed from the kernel scheduler via isolcpus= or cset for the cleanest numbers. SQPOLL requires CAP_SYS_NICE or root.

Reading fio Output

Key fields in the summary line: IOPS, BW (bandwidth), and lat (usec) — specifically the 99th/99.9th percentile clat (completion latency). io_uring's advantage shows most clearly in clat tail latency under deep queues, not just peak IOPS.

Security Considerations

io_uring's power makes it a serious attack surface. Several CVEs (CVE-2022-29582, CVE-2023-2598, and others) have been found in the subsystem. The kernel community has responded, but the threat model for multi-tenant systems is real.

Restricting Unprivileged Access

Since kernel 6.1, the kernel exposes a sysctl to control access:

# 0 = unrestricted (default on most distros)
# 1 = requires CAP_SYS_ADMIN or a task with same uid/gid
# 2 = completely disabled
cat /proc/sys/kernel/io_uring_disabled

To restrict to privileged processes only (recommended for shared servers):

sudo sysctl -w kernel.io_uring_disabled=1
# Persist across reboots:
echo 'kernel.io_uring_disabled=1' | sudo tee /etc/sysctl.d/99-io_uring.conf
sudo sysctl --system

Some container runtimes (Docker, Podman with default seccomp profiles) already block io_uring_setup inside containers. Verify with:

docker run --rm alpine:latest /bin/sh -c 'apk add -q fio && fio --name=test --ioengine=io_uring --filename=/tmp/t --rw=read --size=1m --bs=4k 2>&1 | head -5'

seccomp and Landlock

If you run untrusted code, add io_uring_setup, io_uring_enter, and io_uring_register to your seccomp deny list. For systemd services, you can restrict the syscalls in the unit file:

# In a .service file [Service] section:
SystemCallFilter=~io_uring_setup io_uring_enter io_uring_register

After editing, reload and verify:

sudo systemctl daemon-reload
sudo systemctl restart your-service
systemctl show your-service --property=SystemCallFilter

Verifying io_uring is Functional

A quick smoke test without needing a spare block device — write a 512 MB test file:

fio \
  --name=smoke \
  --ioengine=io_uring \
  --filename=/tmp/iou_smoke \
  --rw=write \
  --bs=64k \
  --size=512m \
  --iodepth=32 \
  --numjobs=1 \
  --direct=0 \
  --output-format=terse | cut -d';' -f7
# Field 7 in terse output is write IOPS; a non-zero value confirms io_uring is working

Check kernel tracepoints to watch the ring in action (requires root):

sudo perf trace -e io_uring:* -- fio --name=t --ioengine=io_uring \
  --filename=/tmp/iou_t --rw=read --size=64m --bs=4k --iodepth=16 --runtime=5 --time_based 2>/dev/null | head -30

Troubleshooting

fio reports engine not available: Your fio was compiled without io_uring support. Install a distro package (not a manually compiled old binary) or build fio from source against a kernel ≥ 5.4 with liburing installed (apt install liburing-dev / dnf install liburing-devel).
SQPOLL fails with EPERM: SQPOLL requires CAP_SYS_NICE. Run as root or grant the capability via setcap cap_sys_nice+ep to your binary.
io_uring_disabled is 2 and cannot be changed: Some hardened kernels (Ubuntu's linux-hardened, certain cloud images) ship with it disabled at build time. Use grep IO_URING /boot/config-$(uname -r) — if CONFIG_IO_URING is not set, you need a different kernel.
Performance no better than libaio: At low queue depths (< 16) or on rotational storage, the overhead difference is negligible. io_uring's gains are most visible at high concurrency on NVMe or fast network I/O.
Kernel OOM or hangs under SQPOLL: SQPOLL kernels < 5.15 had several stability bugs. Update your kernel; the 5.15 LTS branch is the minimum recommendable for production SQPOLL use.