public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v5 0/2] eventfd: add configurable maximum counter value for flow control
@ 2026-04-08 17:24 wen.yang
  2026-04-08 17:24 ` [RFC PATCH v5 1/2] eventfd: add configurable per-fd counter maximum " wen.yang
  2026-04-08 17:24 ` [RFC PATCH v5 2/2] selftests/eventfd: add EFD_IOC_{SET,GET}_MAXIMUM tests wen.yang
  0 siblings, 2 replies; 3+ messages in thread
From: wen.yang @ 2026-04-08 17:24 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara, Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

eventfd's counter is bounded only by ULLONG_MAX (~1.8x10^19). In
non-semaphore mode a fast producer can write continuously while a slow
consumer falls behind: the producer never stalls, the counter grows
without limit, both sides burn CPU at 100%, and consumer lag is
invisible. There is no mechanism to apply back-pressure.

Add EFD_IOC_SET_MAXIMUM and EFD_IOC_GET_MAXIMUM ioctl commands that
set a configurable overflow threshold. A write(2) that would push the
counter to or beyond maximum blocks (EAGAIN for O_NONBLOCK fds). The
kernel-internal eventfd_signal() path may still raise the counter to
maximum (EPOLLERR), preserving the original overflow semantics. The
default is ULLONG_MAX, preserving backward compatibility.

This follows the back-pressure pattern already established in the
kernel: pipe(2) writers block when the buffer is full, capacity is
tunable via fcntl(F_SETPIPE_SZ); mq_send(3) blocks when the queue
depth reaches mq_maxmsg. EFD_IOC_SET_MAXIMUM applies the same
pattern to eventfd.

Measured on a 4-core x86_64, writer and reader pinned to separate CPUs,
reader sleeps 1 ms between reads to simulate processing time:

  Bench 1 - burst/CPU (5 s, blocking write)
  maximum      wcpu_ms  rcpu_ms      EAGAIN      writes    reads
  --------------------------------------------------------------
  ULLONG_MAX      5002      132           0     6517388     4506
  10               133      150           0       40456     4496
  (O_NONBLOCK+spin bypasses flow control; use O_NONBLOCK+poll(POLLOUT)
   to avoid wasting CPU on EAGAIN retries while still multiplexing fds)

  Bench 2 - latency tail (EFD_SEMAPHORE, 10 K/s writer, ~8 K/s reader,
            5000 events)
  maximum      p99_us   p999_us    max_us
  ----------------------------------------
  ULLONG_MAX   141218   142477    142588
  10             1719     2378      2381

  Bench 3 - coalescing (non-EFD_SEMAPHORE, 10000 writes, 125 us/read
            reader; each read drains the full counter)
  maximum      writes    reads   avg_batch
  -----------------------------------------
  ULLONG_MAX    10000       79       126.6
  10            10000     1121         8.9

With maximum=10: burst CPU drops >97% (5002 ms -> 133 ms); latency p999
drops ~60x (142 ms -> 2.4 ms); coalescing batch bounded to 9 vs 127,
so the consumer always knows the backlog is small.

Notes:
- Magic 'J': 'E' conflicts with linux/input.h and xen/evtchn.h; 'J' is
  unregistered, added to ioctl-number.rst.
- Command numbers 0/1: explicit distinct numbers are clearer than
  relying solely on direction bits to disambiguate SET from GET.
- .compat_ioctl = compat_ptr_ioctl handles 32-bit user pointers.
- Writers woken on SET_MAXIMUM: a raised limit takes effect immediately
  without waiting for the next read(2).

Changes since v4
  (https://lore.kernel.org/all/20250310051832.5658-1-wen.yang@linux.dev/)
- Use ioctl magic 'J' instead of 'E' (conflict with input.h/xen).
- Add .compat_ioctl = compat_ptr_ioctl.
- Expose eventfd-maximum in /proc/self/fdinfo.
- Return -ENOTTY for unrecognised ioctl commands (was -ENOENT).
- Remove the unnecessary !argp guard in eventfd_ioctl().
- Register magic 'J' in Documentation/userspace-api/ioctl/ioctl-number.rst.
- Add kselftest correctness tests.

Wen Yang (2):
  eventfd: add configurable per-fd counter maximum for flow control
  selftests/eventfd: add EFD_IOC_{SET,GET}_MAXIMUM tests

 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 fs/eventfd.c                                  |  74 +++++-
 include/uapi/linux/eventfd.h                  |   6 +
 .../filesystems/eventfd/eventfd_test.c        | 238 +++++++++++++++++-
 4 files changed, 306 insertions(+), 13 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-08 17:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 17:24 [RFC PATCH v5 0/2] eventfd: add configurable maximum counter value for flow control wen.yang
2026-04-08 17:24 ` [RFC PATCH v5 1/2] eventfd: add configurable per-fd counter maximum " wen.yang
2026-04-08 17:24 ` [RFC PATCH v5 2/2] selftests/eventfd: add EFD_IOC_{SET,GET}_MAXIMUM tests wen.yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox