From: Stefan Hajnoczi <stefanha@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
hibriansong@gmail.com, Kevin Wolf <kwolf@redhat.com>,
Hanna Czenczek <hreitz@redhat.com>
Subject: Re: [RFC 09/11] aio-posix: add aio_add_sqe() API for user-defined io_uring requests
Date: Thu, 5 Jun 2025 13:52:24 -0400 [thread overview]
Message-ID: <20250605175224.GA481264@fedora> (raw)
In-Reply-To: <lwn6k4zy3rovxboe4lia46islqxaagpklba2mggqxinsvy2u7k@yhthtmjlh2mn>
[-- Attachment #1: Type: text/plain, Size: 5883 bytes --]
On Thu, May 29, 2025 at 03:02:16PM -0500, Eric Blake wrote:
> On Wed, May 28, 2025 at 03:09:14PM -0400, Stefan Hajnoczi wrote:
> > Introduce the aio_add_sqe() API for submitting io_uring requests in the
> > current AioContext. This allows other components in QEMU, like the block
> > layer, to take advantage of io_uring features without creating their own
> > io_uring context.
> >
> > This API supports nested event loops just like file descriptor
> > monitoring and BHs do. This comes at a complexity cost: a BH is required
> > to dispatch CQE callbacks and they are placed on a list so that a nested
> > event loop can invoke its parent's pending CQE callbacks. If you're
> > wondering why CqeHandler exists instead of just a callback function
> > pointer, this is why.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
>
> Large patch. I found a couple of nits, but the overall design looks
> sound.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> > include/block/aio.h | 82 ++++++++++++++++++++++++
> > util/aio-posix.h | 1 +
> > util/aio-posix.c | 9 +++
> > util/fdmon-io_uring.c | 145 +++++++++++++++++++++++++++++++-----------
> > 4 files changed, 200 insertions(+), 37 deletions(-)
> >
> > diff --git a/include/block/aio.h b/include/block/aio.h
> > index d919d7c8f4..95beef28c3 100644
> > --- a/include/block/aio.h
> > +++ b/include/block/aio.h
> > @@ -61,6 +61,27 @@ typedef struct LuringState LuringState;
> > /* Is polling disabled? */
> > bool aio_poll_disabled(AioContext *ctx);
> >
> > +#ifdef CONFIG_LINUX_IO_URING
> > +/*
> > + * Each io_uring request must have a unique CqeHandler that processes the cqe.
> > + * The lifetime of a CqeHandler must be at least from aio_add_sqe() until
> > + * ->cb() invocation.
> > + */
> > +typedef struct CqeHandler CqeHandler;
> > +struct CqeHandler {
> > + /* Called by the AioContext when the request has completed */
> > + void (*cb)(CqeHandler *handler);
>
> I see an opaque callback pointer in prep_cqe below, but not one here.
> Is that because callers can write their own struct that includes a
> CqeHandler as its first member, if more state is needed?
Yes.
>
> > +
> > + /* Used internally, do not access this */
> > + QSIMPLEQ_ENTRY(CqeHandler) next;
> > +
> > + /* This field is filled in before ->cb() is called */
> > + struct io_uring_cqe cqe;
> > +};
> > +
> > +typedef QSIMPLEQ_HEAD(, CqeHandler) CqeHandlerSimpleQ;
> > +#endif /* CONFIG_LINUX_IO_URING */
> > +
> > /* Callbacks for file descriptor monitoring implementations */
> > typedef struct {
> > /*
> > @@ -138,6 +159,27 @@ typedef struct {
> > * Called with list_lock incremented.
> > */
> > void (*gsource_dispatch)(AioContext *ctx, AioHandlerList *ready_list);
> > +
> > +#ifdef CONFIG_LINUX_IO_URING
> > + /**
> > + * aio_add_sqe: Add an io_uring sqe for submission.
> > + * @prep_sqe: invoked with an sqe that should be prepared for submission
> > + * @opaque: user-defined argument to @prep_sqe()
> > + * @cqe_handler: the unique cqe handler associated with this request
> > + *
> > + * The caller's @prep_sqe() function is invoked to fill in the details of
> > + * the sqe. Do not call io_uring_sqe_set_data() on this sqe.
> > + *
> > + * The kernel may see the sqe as soon as @pre_sqe() returns or it may take
>
> prep_sqe
Oops, will fix.
>
> > + * until the next event loop iteration.
> > + *
> > + * This function is called from the current AioContext and is not
> > + * thread-safe.
> > + */
> > + void (*add_sqe)(AioContext *ctx,
> > + void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaque),
> > + void *opaque, CqeHandler *cqe_handler);
> > +#endif /* CONFIG_LINUX_IO_URING */
> > } FDMonOps;
> >
> > /*
> > @@ -255,6 +297,10 @@ struct AioContext {
> > struct io_uring fdmon_io_uring;
> > AioHandlerSList submit_list;
> > gpointer io_uring_fd_tag;
> > +
> > + /* Pending callback state for cqe handlers */
> > + CqeHandlerSimpleQ cqe_handler_ready_list;
> > + QEMUBH *cqe_handler_bh;
> > #endif
>
> While here, is it worth adding a comment to state which matching #if
> it ends (similar to what you did above in FDMonOps add_sqe)?
Sounds good.
>
> >
> > /* TimerLists for calling timers - one per clock type. Has its own
> > @@ -761,4 +807,40 @@ void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch);
> > */
> > void aio_context_set_thread_pool_params(AioContext *ctx, int64_t min,
> > int64_t max, Error **errp);
> > +
> > +#ifdef CONFIG_LINUX_IO_URING
> > +/**
> > + * aio_has_io_uring: Return whether io_uring is available.
> > + *
> > + * io_uring is either available in all AioContexts or in none, so this only
> > + * needs to be called once from within any thread's AioContext.
> > + */
> > +static inline bool aio_has_io_uring(void)
> > +{
> > + AioContext *ctx = qemu_get_current_aio_context();
> > + return ctx->fdmon_ops->add_sqe;
> > +}
> > +
> > +/**
> > + * aio_add_sqe: Add an io_uring sqe for submission.
> > + * @prep_sqe: invoked with an sqe that should be prepared for submission
> > + * @opaque: user-defined argument to @prep_sqe()
> > + * @cqe_handler: the unique cqe handler associated with this request
> > + *
> > + * The caller's @prep_sqe() function is invoked to fill in the details of the
> > + * sqe. Do not call io_uring_sqe_set_data() on this sqe.
> > + *
> > + * The sqe is submitted by the current AioContext. The kernel may see the sqe
> > + * as soon as @pre_sqe() returns or it may take until the next event loop
>
> prep_sqe
Will fix.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2025-06-05 17:53 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-28 19:09 [RFC 00/11] aio: add the aio_add_sqe() io_uring API Stefan Hajnoczi
2025-05-28 19:09 ` [RFC 01/11] aio-posix: fix polling mode with fdmon-io_uring Stefan Hajnoczi
2025-05-28 20:29 ` Eric Blake
2025-05-28 19:09 ` [RFC 02/11] aio-posix: keep polling enabled with fdmon-io_uring.c Stefan Hajnoczi
2025-05-28 20:34 ` Eric Blake
2025-05-28 19:09 ` [RFC 03/11] tests/unit: skip test-nested-aio-poll with io_uring Stefan Hajnoczi
2025-05-28 20:40 ` Eric Blake
2025-05-28 19:09 ` [RFC 04/11] aio-posix: integrate fdmon into glib event loop Stefan Hajnoczi
2025-05-28 21:01 ` Eric Blake
2025-05-28 19:09 ` [RFC 05/11] aio: remove aio_context_use_g_source() Stefan Hajnoczi
2025-05-28 21:02 ` Eric Blake
2025-05-28 19:09 ` [RFC 06/11] aio: free AioContext when aio_context_new() fails Stefan Hajnoczi
2025-05-28 21:06 ` Eric Blake
2025-06-05 17:49 ` Stefan Hajnoczi
2025-05-28 19:09 ` [RFC 07/11] aio: add errp argument to aio_context_setup() Stefan Hajnoczi
2025-05-28 21:07 ` Eric Blake
2025-05-28 19:09 ` [RFC 08/11] aio-posix: gracefully handle io_uring_queue_init() failure Stefan Hajnoczi
2025-05-28 22:12 ` Eric Blake
2025-05-29 15:38 ` Stefan Hajnoczi
2025-06-03 6:05 ` Markus Armbruster
2025-06-03 18:48 ` Stefan Hajnoczi
2025-06-02 12:26 ` Brian
2025-06-02 20:20 ` Stefan Hajnoczi
2025-06-02 22:37 ` Brian
2025-05-28 19:09 ` [RFC 09/11] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Stefan Hajnoczi
2025-05-28 22:15 ` Eric Blake
2025-05-29 20:02 ` Eric Blake
2025-06-05 17:52 ` Stefan Hajnoczi [this message]
2025-05-28 19:09 ` [RFC 10/11] aio-posix: avoid EventNotifier for cqe_handler_bh Stefan Hajnoczi
2025-05-29 20:09 ` Eric Blake
2025-05-28 19:09 ` [RFC 11/11] block/io_uring: use aio_add_sqe() Stefan Hajnoczi
2025-05-29 21:11 ` Eric Blake
2025-06-05 18:40 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250605175224.GA481264@fedora \
--to=stefanha@redhat.com \
--cc=eblake@redhat.com \
--cc=hibriansong@gmail.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.