From: wen.yang@linux.dev
To: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Wen Yang <wen.yang@linux.dev>
Subject: [RFC PATCH v5 0/2] eventfd: add configurable maximum counter value for flow control
Date: Thu, 9 Apr 2026 01:24:47 +0800 [thread overview]
Message-ID: <cover.1775668339.git.wen.yang@linux.dev> (raw)
From: Wen Yang <wen.yang@linux.dev>
eventfd's counter is bounded only by ULLONG_MAX (~1.8x10^19). In
non-semaphore mode a fast producer can write continuously while a slow
consumer falls behind: the producer never stalls, the counter grows
without limit, both sides burn CPU at 100%, and consumer lag is
invisible. There is no mechanism to apply back-pressure.
Add EFD_IOC_SET_MAXIMUM and EFD_IOC_GET_MAXIMUM ioctl commands that
set a configurable overflow threshold. A write(2) that would push the
counter to or beyond maximum blocks (EAGAIN for O_NONBLOCK fds). The
kernel-internal eventfd_signal() path may still raise the counter to
maximum (EPOLLERR), preserving the original overflow semantics. The
default is ULLONG_MAX, preserving backward compatibility.
This follows the back-pressure pattern already established in the
kernel: pipe(2) writers block when the buffer is full, capacity is
tunable via fcntl(F_SETPIPE_SZ); mq_send(3) blocks when the queue
depth reaches mq_maxmsg. EFD_IOC_SET_MAXIMUM applies the same
pattern to eventfd.
Measured on a 4-core x86_64, writer and reader pinned to separate CPUs,
reader sleeps 1 ms between reads to simulate processing time:
Bench 1 - burst/CPU (5 s, blocking write)
maximum wcpu_ms rcpu_ms EAGAIN writes reads
--------------------------------------------------------------
ULLONG_MAX 5002 132 0 6517388 4506
10 133 150 0 40456 4496
(O_NONBLOCK+spin bypasses flow control; use O_NONBLOCK+poll(POLLOUT)
to avoid wasting CPU on EAGAIN retries while still multiplexing fds)
Bench 2 - latency tail (EFD_SEMAPHORE, 10 K/s writer, ~8 K/s reader,
5000 events)
maximum p99_us p999_us max_us
----------------------------------------
ULLONG_MAX 141218 142477 142588
10 1719 2378 2381
Bench 3 - coalescing (non-EFD_SEMAPHORE, 10000 writes, 125 us/read
reader; each read drains the full counter)
maximum writes reads avg_batch
-----------------------------------------
ULLONG_MAX 10000 79 126.6
10 10000 1121 8.9
With maximum=10: burst CPU drops >97% (5002 ms -> 133 ms); latency p999
drops ~60x (142 ms -> 2.4 ms); coalescing batch bounded to 9 vs 127,
so the consumer always knows the backlog is small.
Notes:
- Magic 'J': 'E' conflicts with linux/input.h and xen/evtchn.h; 'J' is
unregistered, added to ioctl-number.rst.
- Command numbers 0/1: explicit distinct numbers are clearer than
relying solely on direction bits to disambiguate SET from GET.
- .compat_ioctl = compat_ptr_ioctl handles 32-bit user pointers.
- Writers woken on SET_MAXIMUM: a raised limit takes effect immediately
without waiting for the next read(2).
Changes since v4
(https://lore.kernel.org/all/20250310051832.5658-1-wen.yang@linux.dev/)
- Use ioctl magic 'J' instead of 'E' (conflict with input.h/xen).
- Add .compat_ioctl = compat_ptr_ioctl.
- Expose eventfd-maximum in /proc/self/fdinfo.
- Return -ENOTTY for unrecognised ioctl commands (was -ENOENT).
- Remove the unnecessary !argp guard in eventfd_ioctl().
- Register magic 'J' in Documentation/userspace-api/ioctl/ioctl-number.rst.
- Add kselftest correctness tests.
Wen Yang (2):
eventfd: add configurable per-fd counter maximum for flow control
selftests/eventfd: add EFD_IOC_{SET,GET}_MAXIMUM tests
.../userspace-api/ioctl/ioctl-number.rst | 1 +
fs/eventfd.c | 74 +++++-
include/uapi/linux/eventfd.h | 6 +
.../filesystems/eventfd/eventfd_test.c | 238 +++++++++++++++++-
4 files changed, 306 insertions(+), 13 deletions(-)
--
2.25.1
next reply other threads:[~2026-04-08 17:25 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 17:24 wen.yang [this message]
2026-04-08 17:24 ` [RFC PATCH v5 1/2] eventfd: add configurable per-fd counter maximum for flow control wen.yang
2026-04-08 17:24 ` [RFC PATCH v5 2/2] selftests/eventfd: add EFD_IOC_{SET,GET}_MAXIMUM tests wen.yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1775668339.git.wen.yang@linux.dev \
--to=wen.yang@linux.dev \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox