From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: qemu-block@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
hibriansong@gmail.com, Kevin Wolf <kwolf@redhat.com>,
Hanna Czenczek <hreitz@redhat.com>
Subject: [RFC 01/11] aio-posix: fix polling mode with fdmon-io_uring
Date: Wed, 28 May 2025 15:09:06 -0400 [thread overview]
Message-ID: <20250528190916.35864-2-stefanha@redhat.com> (raw)
In-Reply-To: <20250528190916.35864-1-stefanha@redhat.com>
The io_uring(7) file descriptor monitor cannot enter polling mode
because it needs to submit a POLL_ADD SQE every time a file descriptor
becomes active. Submitting SQEs only happens in FDMonOps->wait() outside
of polling mode.
Fix this using the multi-shot mechanism introduced in Linux 5.13 and
liburing 2.1. Stable and enterprise Linux distros ship 5.14+ as of March
2025, so it is safe to require this. Note that fdmon-io_uring is
currently not enabled at runtime and is not essential, so QEMU can still
be built without it on older hosts.
In multi-shot mode, a POLL_ADD SQE remains active until canceled with
POLL_REMOVE. This avoids the need to submit a new SQE every time a file
descriptor becomes active.
When POLL_REMOVE is processed by the host kernel, the multi-shot
POLL_ADD operation completes with -ECANCELED. Adjust the code slightly
to take this into account.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
meson.build | 2 +-
util/fdmon-io_uring.c | 34 +++++++++++++++++++++-------------
2 files changed, 22 insertions(+), 14 deletions(-)
diff --git a/meson.build b/meson.build
index fdad3fb528..6a362b9209 100644
--- a/meson.build
+++ b/meson.build
@@ -1157,7 +1157,7 @@ linux_io_uring_test = '''
linux_io_uring = not_found
if not get_option('linux_io_uring').auto() or have_block
- linux_io_uring = dependency('liburing', version: '>=0.3',
+ linux_io_uring = dependency('liburing', version: '>=2.1',
required: get_option('linux_io_uring'),
method: 'pkg-config')
if not cc.links(linux_io_uring_test)
diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index b0d68bdc44..6cd665e565 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -124,8 +124,7 @@ static AioHandler *dequeue(AioHandlerSList *head, unsigned *flags)
/*
* Don't clear FDMON_IO_URING_REMOVE. It's sticky so it can serve two
* purposes: telling fill_sq_ring() to submit IORING_OP_POLL_REMOVE and
- * telling process_cqe() to delete the AioHandler when its
- * IORING_OP_POLL_ADD completes.
+ * telling process_cqe() to ignore IORING_OP_POLL_ADD completions.
*/
*flags = qatomic_fetch_and(&node->flags, ~(FDMON_IO_URING_PENDING |
FDMON_IO_URING_ADD));
@@ -166,12 +165,12 @@ static void fdmon_io_uring_update(AioContext *ctx,
}
}
-static void add_poll_add_sqe(AioContext *ctx, AioHandler *node)
+static void add_poll_multishot_sqe(AioContext *ctx, AioHandler *node)
{
struct io_uring_sqe *sqe = get_sqe(ctx);
int events = poll_events_from_pfd(node->pfd.events);
- io_uring_prep_poll_add(sqe, node->pfd.fd, events);
+ io_uring_prep_poll_multishot(sqe, node->pfd.fd, events);
io_uring_sqe_set_data(sqe, node);
}
@@ -213,7 +212,7 @@ static void fill_sq_ring(AioContext *ctx)
while ((node = dequeue(&submit_list, &flags))) {
/* Order matters, just in case both flags were set */
if (flags & FDMON_IO_URING_ADD) {
- add_poll_add_sqe(ctx, node);
+ add_poll_multishot_sqe(ctx, node);
}
if (flags & FDMON_IO_URING_REMOVE) {
add_poll_remove_sqe(ctx, node);
@@ -234,21 +233,30 @@ static bool process_cqe(AioContext *ctx,
return false;
}
+ flags = qatomic_read(&node->flags);
+
/*
- * Deletion can only happen when IORING_OP_POLL_ADD completes. If we race
- * with enqueue() here then we can safely clear the FDMON_IO_URING_REMOVE
- * bit before IORING_OP_POLL_REMOVE is submitted.
+ * poll_multishot cancelled by poll_remove? Or completed early because fd
+ * was closed before poll_remove finished?
*/
- flags = qatomic_fetch_and(&node->flags, ~FDMON_IO_URING_REMOVE);
- if (flags & FDMON_IO_URING_REMOVE) {
+ if (cqe->res == -ECANCELED || cqe->res == -EBADF) {
+ assert(!(cqe->flags & IORING_CQE_F_MORE));
+ assert(flags & FDMON_IO_URING_REMOVE);
QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
return false;
}
- aio_add_ready_handler(ready_list, node, pfd_events_from_poll(cqe->res));
+ /* Ignore if it becomes ready during removal */
+ if (flags & FDMON_IO_URING_REMOVE) {
+ return false;
+ }
- /* IORING_OP_POLL_ADD is one-shot so we must re-arm it */
- add_poll_add_sqe(ctx, node);
+ /* Multi-shot can stop at any time, so re-arm if necessary */
+ if (!(cqe->flags & IORING_CQE_F_MORE)) {
+ add_poll_multishot_sqe(ctx, node);
+ }
+
+ aio_add_ready_handler(ready_list, node, pfd_events_from_poll(cqe->res));
return true;
}
--
2.49.0
next prev parent reply other threads:[~2025-05-28 19:11 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-28 19:09 [RFC 00/11] aio: add the aio_add_sqe() io_uring API Stefan Hajnoczi
2025-05-28 19:09 ` Stefan Hajnoczi [this message]
2025-05-28 20:29 ` [RFC 01/11] aio-posix: fix polling mode with fdmon-io_uring Eric Blake
2025-05-28 19:09 ` [RFC 02/11] aio-posix: keep polling enabled with fdmon-io_uring.c Stefan Hajnoczi
2025-05-28 20:34 ` Eric Blake
2025-05-28 19:09 ` [RFC 03/11] tests/unit: skip test-nested-aio-poll with io_uring Stefan Hajnoczi
2025-05-28 20:40 ` Eric Blake
2025-05-28 19:09 ` [RFC 04/11] aio-posix: integrate fdmon into glib event loop Stefan Hajnoczi
2025-05-28 21:01 ` Eric Blake
2025-05-28 19:09 ` [RFC 05/11] aio: remove aio_context_use_g_source() Stefan Hajnoczi
2025-05-28 21:02 ` Eric Blake
2025-05-28 19:09 ` [RFC 06/11] aio: free AioContext when aio_context_new() fails Stefan Hajnoczi
2025-05-28 21:06 ` Eric Blake
2025-06-05 17:49 ` Stefan Hajnoczi
2025-05-28 19:09 ` [RFC 07/11] aio: add errp argument to aio_context_setup() Stefan Hajnoczi
2025-05-28 21:07 ` Eric Blake
2025-05-28 19:09 ` [RFC 08/11] aio-posix: gracefully handle io_uring_queue_init() failure Stefan Hajnoczi
2025-05-28 22:12 ` Eric Blake
2025-05-29 15:38 ` Stefan Hajnoczi
2025-06-03 6:05 ` Markus Armbruster
2025-06-03 18:48 ` Stefan Hajnoczi
2025-06-02 12:26 ` Brian
2025-06-02 20:20 ` Stefan Hajnoczi
2025-06-02 22:37 ` Brian
2025-05-28 19:09 ` [RFC 09/11] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Stefan Hajnoczi
2025-05-28 22:15 ` Eric Blake
2025-05-29 20:02 ` Eric Blake
2025-06-05 17:52 ` Stefan Hajnoczi
2025-05-28 19:09 ` [RFC 10/11] aio-posix: avoid EventNotifier for cqe_handler_bh Stefan Hajnoczi
2025-05-29 20:09 ` Eric Blake
2025-05-28 19:09 ` [RFC 11/11] block/io_uring: use aio_add_sqe() Stefan Hajnoczi
2025-05-29 21:11 ` Eric Blake
2025-06-05 18:40 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250528190916.35864-2-stefanha@redhat.com \
--to=stefanha@redhat.com \
--cc=hibriansong@gmail.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).