* [PATCH] eventpoll: Add sysctl quirk to avoid synchronous wakeup
@ 2026-04-17 1:46 Gabriel Krisman Bertazi
0 siblings, 0 replies; only message in thread
From: Gabriel Krisman Bertazi @ 2026-04-17 1:46 UTC (permalink / raw)
To: viro, brauner, jack
Cc: corbet, linux-fsdevel, linux-doc, Gabriel Krisman Bertazi,
Mel Gorman
Upstream commit 900bbaae67e9 ("epoll: Add synchronous wakeup support for
ep_poll_callback") fixes a bug where epoll did not honor the "sync" part
of the wake_up_*_sync request by the original waker when waking up the
epoll waiter. That patch is correct, as I understand it, because it lets
the caller decide and the most likely general case for a
producer-consumer application using epoll is "wait on data on the socket
and then consume it".
Nevertheless, it caused a regression in a proprietary database benchmark
that communicates over TCP on localhost. The TCP detail is only relevant
because it will unconditionally use an WF_SYNC (in sock_def_readable) to
wake its waiters. But, in general, for threads that are just signaling
an operation via epoll, and not necessarily consuming that data, pulling
the application closer to a cpu-intensive waker task can actually harm
performance, as there is not much data access to benefit from data
locality. This seems to be the case for this workload.
This is a tricky case for an heuristic, IMO, since it would be hard to
predict what the epoll user wants. I considered adding an epoll_ctl
flag to let the user configure the desired behavior, but it feels too
much of an specific scheduler detail to be exposed in the syscall API,
and it would likely cause user confusion. In addition, it would require
recompilation of user applications needing this behavior.
Instead, this patch adds a new sysctl for a system-wide quirk that can
be enabled only when it is known to benefit the workload. While
different workloads would benefit from different behaviors, it is
unclear these exist in parallel and that reverting to the older behavior
would cause performance regressions.
Cc: Mel Gorman <mgorman@suse.de>
Fixes: 900bbaae67e9 ("epoll: Add synchronous wakeup support for ep_poll_callback")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
---
I get the fixes tag is hardly appropriate here, but it serves as a
reasonable way to link to the original patch.
---
Documentation/admin-guide/sysctl/fs.rst | 10 ++++++++++
fs/eventpoll.c | 12 +++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 9b7f65c3efd8..9052ad3f8404 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -338,6 +338,16 @@ on a 64-bit one.
The current default value for ``max_user_watches`` is 4% of the
available low memory, divided by the "watch" cost in bytes.
+force_async_wake
+----------------
+
+When an epoll event occurs, the kernel will attempt to "pull" the epoll
+waiter task closer to the cpu where the task that initiated the event is
+and switch to it sooner. While most workloads benefit from this
+behavior, this switch allows disabling it, leaving the epoll task where
+it is. Setting it to 1 can harm performance for most applications, but
+might benefit others.
+
5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems
=====================================================================
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 23f3c6ac0bad..aed0dcc50530 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -257,6 +257,9 @@ struct ep_pqueue {
/* Maximum number of epoll watched descriptors, per user */
static long max_user_watches __read_mostly;
+/* Whether wakee should always be waken up asynchronously */
+static bool sysctl_force_async_wake __read_mostly = false;
+
/* Used for cycles detection */
static DEFINE_MUTEX(epnested_mutex);
@@ -332,6 +335,13 @@ static const struct ctl_table epoll_table[] = {
.extra1 = &long_zero,
.extra2 = &long_max,
},
+ {
+ .procname = "force_async_wake",
+ .data = &sysctl_force_async_wake,
+ .maxlen = sizeof(sysctl_force_async_wake),
+ .mode = 0644,
+ .proc_handler = proc_dobool,
+ },
};
static void __init epoll_sysctls_init(void)
@@ -1318,7 +1328,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
break;
}
}
- if (sync)
+ if (sync && !sysctl_force_async_wake)
wake_up_sync(&ep->wq);
else
wake_up(&ep->wq);
--
2.52.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-04-17 1:46 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 1:46 [PATCH] eventpoll: Add sysctl quirk to avoid synchronous wakeup Gabriel Krisman Bertazi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox