From: Max Kellermann <max.kellermann@ionos.com>
To: axboe@kernel.dk, asml.silence@gmail.com,
io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Max Kellermann <max.kellermann@ionos.com>
Subject: [PATCH 8/8] io_uring: skip redundant poll wakeups
Date: Tue, 28 Jan 2025 14:39:27 +0100 [thread overview]
Message-ID: <20250128133927.3989681-9-max.kellermann@ionos.com> (raw)
In-Reply-To: <20250128133927.3989681-1-max.kellermann@ionos.com>
Using io_uring with epoll is very expensive because every completion
leads to a __wake_up() call, most of which are unnecessary because the
polling process has already been woken up but has not had a chance to
process the completions. During this time, wq_has_sleeper() still
returns true, therefore this check is not enough.
Perf diff for a HTTP server pushing a static file with splice() into
the TCP socket:
37.91% -2.00% [kernel.kallsyms] [k] queued_spin_lock_slowpath
1.69% -1.67% [kernel.kallsyms] [k] ep_poll_callback
0.95% +1.64% [kernel.kallsyms] [k] io_wq_free_work
0.88% -0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.66% +0.28% [kernel.kallsyms] [k] io_worker_handle_work
1.14% +0.18% [kernel.kallsyms] [k] _raw_spin_lock
0.24% -0.17% [kernel.kallsyms] [k] __wake_up
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
include/linux/io_uring_types.h | 10 ++++++++++
io_uring/io_uring.c | 4 ++++
io_uring/io_uring.h | 2 +-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 623d8e798a11..01514cb76095 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -384,6 +384,16 @@ struct io_ring_ctx {
struct wait_queue_head poll_wq;
struct io_restriction restrictions;
+ /**
+ * Non-zero if a process is waiting for #poll_wq and reset to
+ * zero when #poll_wq is woken up. This is supposed to
+ * eliminate redundant wakeup calls. Only checking
+ * wq_has_sleeper() is not enough because it will return true
+ * until the sleeper has actually woken up and has been
+ * scheduled.
+ */
+ atomic_t poll_wq_waiting;
+
u32 pers_next;
struct xarray personalities;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 137c2066c5a3..b65efd07e9f0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2760,6 +2760,7 @@ static __cold void io_activate_pollwq_cb(struct callback_head *cb)
* Wake ups for some events between start of polling and activation
* might've been lost due to loose synchronisation.
*/
+ atomic_set_release(&ctx->poll_wq_waiting, 0);
wake_up_all(&ctx->poll_wq);
percpu_ref_put(&ctx->refs);
}
@@ -2793,6 +2794,9 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
if (unlikely(!ctx->poll_activated))
io_activate_pollwq(ctx);
+
+ atomic_set(&ctx->poll_wq_waiting, 1);
+
/*
* provides mb() which pairs with barrier from wq_has_sleeper
* call in io_commit_cqring
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index f65e3f3ede51..186cee066f9f 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -287,7 +287,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
{
- if (wq_has_sleeper(&ctx->poll_wq))
+ if (wq_has_sleeper(&ctx->poll_wq) && atomic_xchg_release(&ctx->poll_wq_waiting, 0) > 0)
__wake_up(&ctx->poll_wq, TASK_NORMAL, 0,
poll_to_key(EPOLL_URING_WAKE | EPOLLIN));
}
--
2.45.2
next prev parent reply other threads:[~2025-01-28 13:39 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 13:39 [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention) Max Kellermann
2025-01-28 13:39 ` [PATCH 1/8] io_uring/io-wq: eliminate redundant io_work_get_acct() calls Max Kellermann
2025-01-28 13:39 ` [PATCH 2/8] io_uring/io-wq: add io_worker.acct pointer Max Kellermann
2025-01-28 13:39 ` [PATCH 3/8] io_uring/io-wq: move worker lists to struct io_wq_acct Max Kellermann
2025-01-28 13:39 ` [PATCH 4/8] io_uring/io-wq: cache work->flags in variable Max Kellermann
2025-01-29 18:57 ` Pavel Begunkov
2025-01-29 19:11 ` Max Kellermann
2025-01-29 23:41 ` Pavel Begunkov
2025-01-30 5:36 ` Max Kellermann
2025-01-30 14:57 ` Jens Axboe
2025-01-31 14:06 ` Pavel Begunkov
2025-01-30 14:54 ` Jens Axboe
2025-01-28 13:39 ` [PATCH 5/8] io_uring/io-wq: do not use bogus hash value Max Kellermann
2025-01-28 13:39 ` [PATCH 6/8] io_uring/io-wq: pass io_wq to io_get_next_work() Max Kellermann
2025-01-28 13:39 ` [PATCH 7/8] io_uring: cache io_kiocb->flags in variable Max Kellermann
2025-01-29 19:11 ` Pavel Begunkov
2025-02-04 12:07 ` Pavel Begunkov
2025-02-04 19:45 ` Jens Axboe
2025-01-28 13:39 ` Max Kellermann [this message]
2025-01-31 13:54 ` [PATCH 8/8] io_uring: skip redundant poll wakeups Pavel Begunkov
2025-01-31 17:16 ` Max Kellermann
2025-01-31 17:25 ` Pavel Begunkov
2025-01-29 17:18 ` [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention) Jens Axboe
2025-01-29 17:39 ` Max Kellermann
2025-01-29 17:45 ` Jens Axboe
2025-01-29 18:01 ` Max Kellermann
2025-01-31 16:13 ` Jens Axboe
2025-02-01 15:25 ` Jens Axboe
2025-02-01 15:30 ` Max Kellermann
2025-02-01 15:38 ` Jens Axboe
2025-01-29 19:30 ` Pavel Begunkov
2025-01-29 19:43 ` Max Kellermann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250128133927.3989681-9-max.kellermann@ionos.com \
--to=max.kellermann@ionos.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.