All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Aarushi Mehta <mehta.aaru20@gmail.com>,
	Fam Zheng <fam@euphon.net>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Hanna Czenczek <hreitz@redhat.com>,
	eblake@redhat.com, qemu-block@nongnu.org, hibriansong@gmail.com,
	Stefan Weil <sw@weilnetz.de>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v4 09/12] aio-posix: add aio_add_sqe() API for user-defined io_uring requests
Date: Thu, 23 Oct 2025 16:18:46 -0400	[thread overview]
Message-ID: <20251023201846.GC62080@fedora> (raw)
In-Reply-To: <aOkyUdEJFhFMlIfD@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]

On Fri, Oct 10, 2025 at 06:20:33PM +0200, Kevin Wolf wrote:
> Am 10.10.2025 um 17:23 hat Kevin Wolf geschrieben:
> > Am 10.09.2025 um 19:57 hat Stefan Hajnoczi geschrieben:
> > > Introduce the aio_add_sqe() API for submitting io_uring requests in the
> > > current AioContext. This allows other components in QEMU, like the block
> > > layer, to take advantage of io_uring features without creating their own
> > > io_uring context.
> > > 
> > > This API supports nested event loops just like file descriptor
> > > monitoring and BHs do. This comes at a complexity cost: a BH is required
> > > to dispatch CQE callbacks and they are placed on a list so that a nested
> > > event loop can invoke its parent's pending CQE callbacks. If you're
> > > wondering why CqeHandler exists instead of just a callback function
> > > pointer, this is why.
> > 
> > This is a mechanism that we know from other places in the code like the
> > Linux AIO or indeed the old io_uring block driver code, because a BH is
> > the only thing that makes sure that the main loop will call into the
> > code again later.
> > 
> > Do we really need it here, though? This _is_ literally the main loop
> > implementation, we don't have to make the main loop call us.
> > .need_wait() checks io_uring_cq_ready(), so as long as there are
> > unprocessed completions, we know that .wait() will be called in nested
> > event loops. We just can't take more than one completion out of the
> > queue to process them later for this to work, but have to process them
> > one by one as we get them from the ring. But that's what we already do.
> > 
> > Am I missing something?
> 
> I think what I missed is that we probably don't want to call arbitrary
> callbacks from .wait(), but only in the dispatching phase. At the same

Yes, exactly. The polling/wait phase of aio_poll() is sensitive since it
measures time, holds ctx->list_lock, and has ctx->notify_me enabled. In
that state we cannot call arbitrary code.

> time, we need to fill the ready_list during .wait() and can't do that
> later, so we do have to go through all CQEs here. The only way I can see
> to get rid of the extra list - which I would really like to see - would
> be processing the CQEs twice (once during .wait() for the internal ones
> and once during the dispatch phase for the add_sqe() ones). That's a bit
> annoying.
> 
> Either way, even if we keep the list, scheduling and cancelling BHs from
> the fdmon still doesn't feel quite right to me. I wonder if we shouldn't
> introduce a .dispatch() callback in FDMonOps that could run the
> cqe_handler_ready_list for fdmon-io_uring. That would also make the
> interface more consistent with the set of callbacks we have for GSource,
> and maybe eventually simplify deduplicating them.
> 
> Then you also don't need the ugly optimisation in the next patch that
> fixes the slowness of scheduling BHs in .wait() by moving io_uring code
> to the AioContext core.

You're right, cqe_handler_bh is ugly. Either every dispatched callback
in the event loop should be its own BH (and BHs are the one and only
list of callbacks) or I can add the .dispatch() callback you suggested.
I need to play with the code a bit to compare the approaches. I'll
choose an approach for the next revision of this series.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-10-23 21:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 17:56 [PATCH v4 00/12] aio: add the aio_add_sqe() io_uring API Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 01/12] aio-posix: fix race between io_uring CQE and AioHandler deletion Stefan Hajnoczi
2025-10-09 14:16   ` Kevin Wolf
2025-09-10 17:56 ` [PATCH v4 02/12] aio-posix: keep polling enabled with fdmon-io_uring.c Stefan Hajnoczi
2025-10-09 14:19   ` Kevin Wolf
2025-09-10 17:56 ` [PATCH v4 03/12] tests/unit: skip test-nested-aio-poll with io_uring Stefan Hajnoczi
2025-10-09 14:20   ` Kevin Wolf
2025-10-20 18:53     ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 04/12] aio-posix: integrate fdmon into glib event loop Stefan Hajnoczi
2025-10-09 15:25   ` Kevin Wolf
2025-10-20 20:08     ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 05/12] aio: remove aio_context_use_g_source() Stefan Hajnoczi
2025-10-09 15:46   ` Kevin Wolf
2025-10-09 16:59     ` Kevin Wolf
2025-10-21 19:10       ` Stefan Hajnoczi
2025-10-22  9:02         ` Kevin Wolf
2025-10-23 19:43           ` Stefan Hajnoczi
2025-10-23 20:47       ` Stefan Hajnoczi
2025-10-21 19:01     ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 06/12] aio: free AioContext when aio_context_new() fails Stefan Hajnoczi
2025-10-09 16:06   ` Kevin Wolf
2025-10-21 20:42     ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 07/12] aio: add errp argument to aio_context_setup() Stefan Hajnoczi
2025-10-09 16:16   ` Kevin Wolf
2025-10-14 19:48     ` Stefan Hajnoczi
2025-09-10 17:56 ` [PATCH v4 08/12] aio-posix: gracefully handle io_uring_queue_init() failure Stefan Hajnoczi
2025-10-09 16:19   ` Kevin Wolf
2025-10-14 19:49     ` Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 09/12] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Stefan Hajnoczi
2025-10-10 15:23   ` Kevin Wolf
2025-10-10 16:20     ` Kevin Wolf
2025-10-23 20:18       ` Stefan Hajnoczi [this message]
2025-10-23 20:09     ` Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 10/12] aio-posix: avoid EventNotifier for cqe_handler_bh Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 11/12] block/io_uring: use aio_add_sqe() Stefan Hajnoczi
2025-09-10 17:57 ` [PATCH v4 12/12] block/io_uring: use non-vectored read/write when possible Stefan Hajnoczi
2025-10-10 16:33   ` Kevin Wolf
2025-10-14 19:52     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251023201846.GC62080@fedora \
    --to=stefanha@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=hibriansong@gmail.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mehta.aaru20@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.