From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Aarushi Mehta <mehta.aaru20@gmail.com>,
Fam Zheng <fam@euphon.net>,
Stefano Garzarella <sgarzare@redhat.com>,
Hanna Czenczek <hreitz@redhat.com>,
eblake@redhat.com, qemu-block@nongnu.org, hibriansong@gmail.com,
Stefan Weil <sw@weilnetz.de>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v4 09/12] aio-posix: add aio_add_sqe() API for user-defined io_uring requests
Date: Thu, 23 Oct 2025 16:18:46 -0400 [thread overview]
Message-ID: <20251023201846.GC62080@fedora> (raw)
In-Reply-To: <aOkyUdEJFhFMlIfD@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]
On Fri, Oct 10, 2025 at 06:20:33PM +0200, Kevin Wolf wrote:
> Am 10.10.2025 um 17:23 hat Kevin Wolf geschrieben:
> > Am 10.09.2025 um 19:57 hat Stefan Hajnoczi geschrieben:
> > > Introduce the aio_add_sqe() API for submitting io_uring requests in the
> > > current AioContext. This allows other components in QEMU, like the block
> > > layer, to take advantage of io_uring features without creating their own
> > > io_uring context.
> > >
> > > This API supports nested event loops just like file descriptor
> > > monitoring and BHs do. This comes at a complexity cost: a BH is required
> > > to dispatch CQE callbacks and they are placed on a list so that a nested
> > > event loop can invoke its parent's pending CQE callbacks. If you're
> > > wondering why CqeHandler exists instead of just a callback function
> > > pointer, this is why.
> >
> > This is a mechanism that we know from other places in the code like the
> > Linux AIO or indeed the old io_uring block driver code, because a BH is
> > the only thing that makes sure that the main loop will call into the
> > code again later.
> >
> > Do we really need it here, though? This _is_ literally the main loop
> > implementation, we don't have to make the main loop call us.
> > .need_wait() checks io_uring_cq_ready(), so as long as there are
> > unprocessed completions, we know that .wait() will be called in nested
> > event loops. We just can't take more than one completion out of the
> > queue to process them later for this to work, but have to process them
> > one by one as we get them from the ring. But that's what we already do.
> >
> > Am I missing something?
>
> I think what I missed is that we probably don't want to call arbitrary
> callbacks from .wait(), but only in the dispatching phase. At the same
Yes, exactly. The polling/wait phase of aio_poll() is sensitive since it
measures time, holds ctx->list_lock, and has ctx->notify_me enabled. In
that state we cannot call arbitrary code.
> time, we need to fill the ready_list during .wait() and can't do that
> later, so we do have to go through all CQEs here. The only way I can see
> to get rid of the extra list - which I would really like to see - would
> be processing the CQEs twice (once during .wait() for the internal ones
> and once during the dispatch phase for the add_sqe() ones). That's a bit
> annoying.
>
> Either way, even if we keep the list, scheduling and cancelling BHs from
> the fdmon still doesn't feel quite right to me. I wonder if we shouldn't
> introduce a .dispatch() callback in FDMonOps that could run the
> cqe_handler_ready_list for fdmon-io_uring. That would also make the
> interface more consistent with the set of callbacks we have for GSource,
> and maybe eventually simplify deduplicating them.
>
> Then you also don't need the ugly optimisation in the next patch that
> fixes the slowness of scheduling BHs in .wait() by moving io_uring code
> to the AioContext core.
You're right, cqe_handler_bh is ugly. Either every dispatched callback
in the event loop should be its own BH (and BHs are the one and only
list of callbacks) or I can add the .dispatch() callback you suggested.
I need to play with the code a bit to compare the approaches. I'll
choose an approach for the next revision of this series.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2025-10-23 21:07 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 17:56 [PATCH v4 00/12] aio: add the aio_add_sqe() io_uring API Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 01/12] aio-posix: fix race between io_uring CQE and AioHandler deletion Stefan Hajnoczi
2025-10-09 14:16 ` Kevin Wolf
2025-09-10 17:56 ` [PATCH v4 02/12] aio-posix: keep polling enabled with fdmon-io_uring.c Stefan Hajnoczi
2025-10-09 14:19 ` Kevin Wolf
2025-09-10 17:56 ` [PATCH v4 03/12] tests/unit: skip test-nested-aio-poll with io_uring Stefan Hajnoczi
2025-10-09 14:20 ` Kevin Wolf
2025-10-20 18:53 ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 04/12] aio-posix: integrate fdmon into glib event loop Stefan Hajnoczi
2025-10-09 15:25 ` Kevin Wolf
2025-10-20 20:08 ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 05/12] aio: remove aio_context_use_g_source() Stefan Hajnoczi
2025-10-09 15:46 ` Kevin Wolf
2025-10-09 16:59 ` Kevin Wolf
2025-10-21 19:10 ` Stefan Hajnoczi
2025-10-22 9:02 ` Kevin Wolf
2025-10-23 19:43 ` Stefan Hajnoczi
2025-10-23 20:47 ` Stefan Hajnoczi
2025-10-21 19:01 ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 06/12] aio: free AioContext when aio_context_new() fails Stefan Hajnoczi
2025-10-09 16:06 ` Kevin Wolf
2025-10-21 20:42 ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 07/12] aio: add errp argument to aio_context_setup() Stefan Hajnoczi
2025-10-09 16:16 ` Kevin Wolf
2025-10-14 19:48 ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 08/12] aio-posix: gracefully handle io_uring_queue_init() failure Stefan Hajnoczi
2025-10-09 16:19 ` Kevin Wolf
2025-10-14 19:49 ` Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 09/12] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Stefan Hajnoczi
2025-10-10 15:23 ` Kevin Wolf
2025-10-10 16:20 ` Kevin Wolf
2025-10-23 20:18 ` Stefan Hajnoczi [this message]
2025-10-23 20:09 ` Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 10/12] aio-posix: avoid EventNotifier for cqe_handler_bh Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 11/12] block/io_uring: use aio_add_sqe() Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 12/12] block/io_uring: use non-vectored read/write when possible Stefan Hajnoczi
2025-10-10 16:33 ` Kevin Wolf
2025-10-14 19:52 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251023201846.GC62080@fedora \
--to=stefanha@redhat.com \
--cc=eblake@redhat.com \
--cc=fam@euphon.net \
--cc=hibriansong@gmail.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=mehta.aaru20@gmail.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).