All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/2] io_uring fixes
@ 2026-02-13 14:26 Jens Axboe
  2026-02-13 14:26 ` [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jens Axboe @ 2026-02-13 14:26 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, fam, stefanha

Hi,

Patch 1 here is the real meat of this, patch 2 is just a slight
improvement. For patch 1, it can literally yield a 50-80x improvement
on the io_uring side for idle systems, where ppoll() ends up sleeping
for 500 msec while there's IO to submit! I noticed this running the
io_uring regression tests in a vm, where I use a variety of block
devices for some of the tests. They would often randomly time out on
AHCI devices, while running them on a virtio-blk or nvme device would
finish in one second or so. I then wrote a reproducer to try and grok
this and had claude dive into this, which helped me better grasp the
various event loops.

Please take a look and tell me what you think. Some variant of patch 1
should definitely be considered, but let me know if this is the right
approach. I can easily test anything.

Also note - this seems to trigger more easily or consistently on
aarch64, which is where I run most of my local/immediate testing.

 util/fdmon-io_uring.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-13 14:26 [PATCHSET 0/2] io_uring fixes Jens Axboe
@ 2026-02-13 14:26 ` Jens Axboe
  2026-02-13 16:04   ` Kevin Wolf
  2026-02-13 14:26 ` [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2026-02-13 14:26 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, fam, stefanha, Jens Axboe

When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
block I/O coroutine inline on the vCPU thread because
qemu_get_current_aio_context() returns the main AioContext when BQL is
held. The coroutine calls luring_co_submit() which queues an SQE via
fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
in gsource_prepare() on the main loop thread.

Since the coroutine ran inline (not via aio_co_schedule()), no BH is
scheduled and aio_notify() is never called. The main loop remains asleep
in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
the next timer fires.

Fix this by calling aio_notify() after queuing the SQE. This wakes the
main loop via the eventfd so it can run gsource_prepare() and submit the
pending SQE promptly.

This is a generic fix that benefits all devices using aio=io_uring.
Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
main loop after queuing block I/O.

This is usually a bit hard to detect, as it also relies on the ppoll
loop not waking up for other activity, and micro benchmarks tend not to
see it because they don't have any real processing time. With a
synthetic test case that has a few usleep() to simulate processing of
read data, it's very noticeable. The below example reads 128MB with
O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
each batch submit, and a 1ms delay after processing each completion.
Running it on /dev/sda yields:

time sudo ./iotest /dev/sda

________________________________________________________
Executed in   25.76 secs      fish           external
   usr time    6.19 millis  783.00 micros    5.41 millis
   sys time   12.43 millis  642.00 micros   11.79 millis

while on a virtio-blk or NVMe device we get:

time sudo ./iotest /dev/vdb

________________________________________________________
Executed in    1.25 secs      fish           external
   usr time    1.40 millis    0.30 millis    1.10 millis
   sys time   17.61 millis    1.43 millis   16.18 millis

time sudo ./iotest /dev/nvme0n1

________________________________________________________
Executed in    1.26 secs      fish           external
   usr time    6.11 millis    0.52 millis    5.59 millis
   sys time   13.94 millis    1.50 millis   12.43 millis

where the latter are consistent. If we run the same test but keep the
socket for the ssh connection active by having activity there, then
the sda test looks as follows:

time sudo ./iotest /dev/sda

________________________________________________________
Executed in    1.23 secs      fish           external
   usr time    2.70 millis   39.00 micros    2.66 millis
   sys time    4.97 millis  977.00 micros    3.99 millis

as now the ppoll loop is woken all the time anyway.

After this fix, on an idle system:

time sudo ./iotest /dev/sda

________________________________________________________
Executed in    1.30 secs      fish           external
   usr time    2.14 millis    0.14 millis    2.00 millis
   sys time   16.93 millis    1.16 millis   15.76 millis

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 util/fdmon-io_uring.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index d0b56127c670..96392876b490 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
 
     trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
                                  cqe_handler);
+
+    /*
+     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
+     * runs a coroutine inline (holding BQL), it queues SQEs here but the
+     * actual io_uring_submit() only happens in gsource_prepare().  Without
+     * this notify, ppoll() can sleep up to 499ms before submitting.
+     */
+    aio_notify(ctx);
 }
 
 static void fdmon_special_cqe_handler(CqeHandler *cqe_handler)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check
  2026-02-13 14:26 [PATCHSET 0/2] io_uring fixes Jens Axboe
  2026-02-13 14:26 ` [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued Jens Axboe
@ 2026-02-13 14:26 ` Jens Axboe
  2026-02-13 16:22   ` Kevin Wolf
  2026-02-18 16:24   ` Stefan Hajnoczi
  2026-02-18 10:07 ` [PATCHSET 0/2] io_uring fixes Fiona Ebner
  2026-03-03 11:52 ` Fiona Ebner
  3 siblings, 2 replies; 22+ messages in thread
From: Jens Axboe @ 2026-02-13 14:26 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, fam, stefanha, Jens Axboe

gsource_check() only looks at the ppoll revents for the io_uring fd,
but CQEs can be posted during gsource_prepare()'s io_uring_submit()
call via kernel task_work processing on syscall exit. These completions
are already sitting in the CQ ring but the ring fd may not be signaled
yet, causing gsource_check() to return false.

Add a fallback io_uring_cq_ready() check so completions that arrive
during submission are dispatched immediately rather than waiting for
the next ppoll() cycle.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 util/fdmon-io_uring.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index 96392876b490..124e40594c17 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -352,7 +352,19 @@ static void fdmon_io_uring_gsource_prepare(AioContext *ctx)
 static bool fdmon_io_uring_gsource_check(AioContext *ctx)
 {
     gpointer tag = ctx->io_uring_fd_tag;
-    return g_source_query_unix_fd(&ctx->source, tag) & G_IO_IN;
+
+    /* Check ppoll revents (normal path) */
+    if (g_source_query_unix_fd(&ctx->source, tag) & G_IO_IN) {
+        return true;
+    }
+
+    /*
+     * Also check for CQEs that may have been posted during prepare's
+     * io_uring_submit() via task_work on syscall exit.  Without this,
+     * the main loop can miss completions and sleep in ppoll() until the
+     * next timer fires.
+     */
+    return io_uring_cq_ready(&ctx->fdmon_io_uring);
 }
 
 /* Dispatch CQE handlers that are ready */
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-13 14:26 ` [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued Jens Axboe
@ 2026-02-13 16:04   ` Kevin Wolf
  2026-02-18  9:57     ` Fiona Ebner
  2026-02-18 15:56     ` [PATCH 1/2] fdmon-io_uring: " Stefan Hajnoczi
  0 siblings, 2 replies; 22+ messages in thread
From: Kevin Wolf @ 2026-02-13 16:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: qemu-block, qemu-devel, fam, stefanha, f.ebner

Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> block I/O coroutine inline on the vCPU thread because
> qemu_get_current_aio_context() returns the main AioContext when BQL is
> held. The coroutine calls luring_co_submit() which queues an SQE via
> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> in gsource_prepare() on the main loop thread.

Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
in the recent changes (or I guess worker threads in theory, but I don't
think there any that actually make use of aio_add_sqe()).

> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> scheduled and aio_notify() is never called. The main loop remains asleep
> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> the next timer fires.
> 
> Fix this by calling aio_notify() after queuing the SQE. This wakes the
> main loop via the eventfd so it can run gsource_prepare() and submit the
> pending SQE promptly.
> 
> This is a generic fix that benefits all devices using aio=io_uring.
> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> main loop after queuing block I/O.
> 
> This is usually a bit hard to detect, as it also relies on the ppoll
> loop not waking up for other activity, and micro benchmarks tend not to
> see it because they don't have any real processing time. With a
> synthetic test case that has a few usleep() to simulate processing of
> read data, it's very noticeable. The below example reads 128MB with
> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> each batch submit, and a 1ms delay after processing each completion.
> Running it on /dev/sda yields:
> 
> time sudo ./iotest /dev/sda
> 
> ________________________________________________________
> Executed in   25.76 secs      fish           external
>    usr time    6.19 millis  783.00 micros    5.41 millis
>    sys time   12.43 millis  642.00 micros   11.79 millis
> 
> while on a virtio-blk or NVMe device we get:
> 
> time sudo ./iotest /dev/vdb
> 
> ________________________________________________________
> Executed in    1.25 secs      fish           external
>    usr time    1.40 millis    0.30 millis    1.10 millis
>    sys time   17.61 millis    1.43 millis   16.18 millis
> 
> time sudo ./iotest /dev/nvme0n1
> 
> ________________________________________________________
> Executed in    1.26 secs      fish           external
>    usr time    6.11 millis    0.52 millis    5.59 millis
>    sys time   13.94 millis    1.50 millis   12.43 millis
> 
> where the latter are consistent. If we run the same test but keep the
> socket for the ssh connection active by having activity there, then
> the sda test looks as follows:
> 
> time sudo ./iotest /dev/sda
> 
> ________________________________________________________
> Executed in    1.23 secs      fish           external
>    usr time    2.70 millis   39.00 micros    2.66 millis
>    sys time    4.97 millis  977.00 micros    3.99 millis
> 
> as now the ppoll loop is woken all the time anyway.
> 
> After this fix, on an idle system:
> 
> time sudo ./iotest /dev/sda
> 
> ________________________________________________________
> Executed in    1.30 secs      fish           external
>    usr time    2.14 millis    0.14 millis    2.00 millis
>    sys time   16.93 millis    1.16 millis   15.76 millis
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  util/fdmon-io_uring.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> index d0b56127c670..96392876b490 100644
> --- a/util/fdmon-io_uring.c
> +++ b/util/fdmon-io_uring.c
> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
>  
>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
>                                   cqe_handler);
> +
> +    /*
> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> +     */
> +    aio_notify(ctx);
>  }

Makes sense to me.

At first I wondered if we should use defer_call() for the aio_notify()
to batch the submission, but of course holding the BQL will already take
care of that. And in iothreads where there is no BQL, the aio_notify()
shouldn't make a difference anyway because we're already in the right
thread.

I suppose the other variation could be have another io_uring_enter()
call here (but then probably really through defer_call()) to avoid
waiting for another CPU to submit the request in its main loop. But I
don't really have an intuition if that would make things better or worse
in the common case.

Fiona, does this fix your case, too?

Kevin



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check
  2026-02-13 14:26 ` [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check Jens Axboe
@ 2026-02-13 16:22   ` Kevin Wolf
  2026-02-18 16:24   ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2026-02-13 16:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: qemu-block, qemu-devel, fam, stefanha

Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> gsource_check() only looks at the ppoll revents for the io_uring fd,
> but CQEs can be posted during gsource_prepare()'s io_uring_submit()
> call via kernel task_work processing on syscall exit. These completions
> are already sitting in the CQ ring but the ring fd may not be signaled
> yet, causing gsource_check() to return false.
> 
> Add a fallback io_uring_cq_ready() check so completions that arrive
> during submission are dispatched immediately rather than waiting for
> the next ppoll() cycle.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-13 16:04   ` Kevin Wolf
@ 2026-02-18  9:57     ` Fiona Ebner
  2026-02-18 16:06       ` Stefan Hajnoczi
  2026-02-18 16:11       ` Stefan Hajnoczi
  2026-02-18 15:56     ` [PATCH 1/2] fdmon-io_uring: " Stefan Hajnoczi
  1 sibling, 2 replies; 22+ messages in thread
From: Fiona Ebner @ 2026-02-18  9:57 UTC (permalink / raw)
  To: Kevin Wolf, Jens Axboe; +Cc: qemu-block, qemu-devel, fam, stefanha

Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
>> block I/O coroutine inline on the vCPU thread because
>> qemu_get_current_aio_context() returns the main AioContext when BQL is
>> held. The coroutine calls luring_co_submit() which queues an SQE via
>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
>> in gsource_prepare() on the main loop thread.
> 
> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> in the recent changes (or I guess worker threads in theory, but I don't
> think there any that actually make use of aio_add_sqe()).
> 
>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
>> scheduled and aio_notify() is never called. The main loop remains asleep
>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
>> the next timer fires.
>>
>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
>> main loop via the eventfd so it can run gsource_prepare() and submit the
>> pending SQE promptly.
>>
>> This is a generic fix that benefits all devices using aio=io_uring.
>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
>> main loop after queuing block I/O.
>>
>> This is usually a bit hard to detect, as it also relies on the ppoll
>> loop not waking up for other activity, and micro benchmarks tend not to
>> see it because they don't have any real processing time. With a
>> synthetic test case that has a few usleep() to simulate processing of
>> read data, it's very noticeable. The below example reads 128MB with
>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
>> each batch submit, and a 1ms delay after processing each completion.
>> Running it on /dev/sda yields:
>>
>> time sudo ./iotest /dev/sda
>>
>> ________________________________________________________
>> Executed in   25.76 secs      fish           external
>>    usr time    6.19 millis  783.00 micros    5.41 millis
>>    sys time   12.43 millis  642.00 micros   11.79 millis
>>
>> while on a virtio-blk or NVMe device we get:
>>
>> time sudo ./iotest /dev/vdb
>>
>> ________________________________________________________
>> Executed in    1.25 secs      fish           external
>>    usr time    1.40 millis    0.30 millis    1.10 millis
>>    sys time   17.61 millis    1.43 millis   16.18 millis
>>
>> time sudo ./iotest /dev/nvme0n1
>>
>> ________________________________________________________
>> Executed in    1.26 secs      fish           external
>>    usr time    6.11 millis    0.52 millis    5.59 millis
>>    sys time   13.94 millis    1.50 millis   12.43 millis
>>
>> where the latter are consistent. If we run the same test but keep the
>> socket for the ssh connection active by having activity there, then
>> the sda test looks as follows:
>>
>> time sudo ./iotest /dev/sda
>>
>> ________________________________________________________
>> Executed in    1.23 secs      fish           external
>>    usr time    2.70 millis   39.00 micros    2.66 millis
>>    sys time    4.97 millis  977.00 micros    3.99 millis
>>
>> as now the ppoll loop is woken all the time anyway.
>>
>> After this fix, on an idle system:
>>
>> time sudo ./iotest /dev/sda
>>
>> ________________________________________________________
>> Executed in    1.30 secs      fish           external
>>    usr time    2.14 millis    0.14 millis    2.00 millis
>>    sys time   16.93 millis    1.16 millis   15.76 millis
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  util/fdmon-io_uring.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
>> index d0b56127c670..96392876b490 100644
>> --- a/util/fdmon-io_uring.c
>> +++ b/util/fdmon-io_uring.c
>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
>>  
>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
>>                                   cqe_handler);
>> +
>> +    /*
>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
>> +     */
>> +    aio_notify(ctx);
>>  }
> 
> Makes sense to me.
> 
> At first I wondered if we should use defer_call() for the aio_notify()
> to batch the submission, but of course holding the BQL will already take
> care of that. And in iothreads where there is no BQL, the aio_notify()
> shouldn't make a difference anyway because we're already in the right
> thread.
> 
> I suppose the other variation could be have another io_uring_enter()
> call here (but then probably really through defer_call()) to avoid
> waiting for another CPU to submit the request in its main loop. But I
> don't really have an intuition if that would make things better or worse
> in the common case.
> 
> Fiona, does this fix your case, too?

Yes, it does fix my issue [0] and the second patch gives another small
improvement :)

Would it be slightly cleaner to have aio_add_sqe() call aio_notify()
itself? Since aio-posix.c calls downwards into fdmon-io_uring.c, it
would feel nicer to me to not have fdmon-io_uring.c call "back up". I
guess it also depends on whether we expect another future fdmon
implementation with .add_sqe() to also benefit from it.

[0]:
https://lore.kernel.org/qemu-devel/9901305b-fbdf-4893-8e80-3bc0d1d645b0@proxmox.com/

Best Regards,
Fiona



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET 0/2] io_uring fixes
  2026-02-13 14:26 [PATCHSET 0/2] io_uring fixes Jens Axboe
  2026-02-13 14:26 ` [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued Jens Axboe
  2026-02-13 14:26 ` [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check Jens Axboe
@ 2026-02-18 10:07 ` Fiona Ebner
  2026-03-03 11:52 ` Fiona Ebner
  3 siblings, 0 replies; 22+ messages in thread
From: Fiona Ebner @ 2026-02-18 10:07 UTC (permalink / raw)
  To: Jens Axboe, qemu-block; +Cc: qemu-devel, fam, stefanha

Am 13.02.26 um 3:33 PM schrieb Jens Axboe:
> Hi,
> 
> Patch 1 here is the real meat of this, patch 2 is just a slight
> improvement. For patch 1, it can literally yield a 50-80x improvement
> on the io_uring side for idle systems, where ppoll() ends up sleeping
> for 500 msec while there's IO to submit! I noticed this running the
> io_uring regression tests in a vm, where I use a variety of block
> devices for some of the tests. They would often randomly time out on
> AHCI devices, while running them on a virtio-blk or nvme device would
> finish in one second or so. I then wrote a reproducer to try and grok
> this and had claude dive into this, which helped me better grasp the
> various event loops.
> 
> Please take a look and tell me what you think. Some variant of patch 1
> should definitely be considered, but let me know if this is the right
> approach. I can easily test anything.
> 
> Also note - this seems to trigger more easily or consistently on
> aarch64, which is where I run most of my local/immediate testing.
> 
>  util/fdmon-io_uring.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-13 16:04   ` Kevin Wolf
  2026-02-18  9:57     ` Fiona Ebner
@ 2026-02-18 15:56     ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2026-02-18 15:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jens Axboe, qemu-block, qemu-devel, fam, f.ebner

[-- Attachment #1: Type: text/plain, Size: 834 bytes --]

On Fri, Feb 13, 2026 at 05:04:31PM +0100, Kevin Wolf wrote:
> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> > When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> > block I/O coroutine inline on the vCPU thread because
> > qemu_get_current_aio_context() returns the main AioContext when BQL is
> > held. The coroutine calls luring_co_submit() which queues an SQE via
> > fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> > in gsource_prepare() on the main loop thread.
> 
> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> in the recent changes (or I guess worker threads in theory, but I don't
> think there any that actually make use of aio_add_sqe()).

Worker threads don't have an AioContext, so they cannot call
aio_add_sqe().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-18  9:57     ` Fiona Ebner
@ 2026-02-18 16:06       ` Stefan Hajnoczi
  2026-02-18 16:17         ` Jens Axboe
  2026-02-18 16:11       ` Stefan Hajnoczi
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2026-02-18 16:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kevin Wolf, qemu-block, qemu-devel, fam, Fiona Ebner

[-- Attachment #1: Type: text/plain, Size: 6085 bytes --]

On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
> > Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> >> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> >> block I/O coroutine inline on the vCPU thread because
> >> qemu_get_current_aio_context() returns the main AioContext when BQL is
> >> held. The coroutine calls luring_co_submit() which queues an SQE via
> >> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> >> in gsource_prepare() on the main loop thread.
> > 
> > Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> > in the recent changes (or I guess worker threads in theory, but I don't
> > think there any that actually make use of aio_add_sqe()).
> > 
> >> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> >> scheduled and aio_notify() is never called. The main loop remains asleep
> >> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> >> the next timer fires.
> >>
> >> Fix this by calling aio_notify() after queuing the SQE. This wakes the
> >> main loop via the eventfd so it can run gsource_prepare() and submit the
> >> pending SQE promptly.
> >>
> >> This is a generic fix that benefits all devices using aio=io_uring.
> >> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> >> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> >> main loop after queuing block I/O.
> >>
> >> This is usually a bit hard to detect, as it also relies on the ppoll
> >> loop not waking up for other activity, and micro benchmarks tend not to
> >> see it because they don't have any real processing time. With a
> >> synthetic test case that has a few usleep() to simulate processing of
> >> read data, it's very noticeable. The below example reads 128MB with
> >> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> >> each batch submit, and a 1ms delay after processing each completion.
> >> Running it on /dev/sda yields:
> >>
> >> time sudo ./iotest /dev/sda
> >>
> >> ________________________________________________________
> >> Executed in   25.76 secs      fish           external
> >>    usr time    6.19 millis  783.00 micros    5.41 millis
> >>    sys time   12.43 millis  642.00 micros   11.79 millis
> >>
> >> while on a virtio-blk or NVMe device we get:
> >>
> >> time sudo ./iotest /dev/vdb
> >>
> >> ________________________________________________________
> >> Executed in    1.25 secs      fish           external
> >>    usr time    1.40 millis    0.30 millis    1.10 millis
> >>    sys time   17.61 millis    1.43 millis   16.18 millis
> >>
> >> time sudo ./iotest /dev/nvme0n1
> >>
> >> ________________________________________________________
> >> Executed in    1.26 secs      fish           external
> >>    usr time    6.11 millis    0.52 millis    5.59 millis
> >>    sys time   13.94 millis    1.50 millis   12.43 millis
> >>
> >> where the latter are consistent. If we run the same test but keep the
> >> socket for the ssh connection active by having activity there, then
> >> the sda test looks as follows:
> >>
> >> time sudo ./iotest /dev/sda
> >>
> >> ________________________________________________________
> >> Executed in    1.23 secs      fish           external
> >>    usr time    2.70 millis   39.00 micros    2.66 millis
> >>    sys time    4.97 millis  977.00 micros    3.99 millis
> >>
> >> as now the ppoll loop is woken all the time anyway.
> >>
> >> After this fix, on an idle system:
> >>
> >> time sudo ./iotest /dev/sda
> >>
> >> ________________________________________________________
> >> Executed in    1.30 secs      fish           external
> >>    usr time    2.14 millis    0.14 millis    2.00 millis
> >>    sys time   16.93 millis    1.16 millis   15.76 millis
> >>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >> ---
> >>  util/fdmon-io_uring.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> >> index d0b56127c670..96392876b490 100644
> >> --- a/util/fdmon-io_uring.c
> >> +++ b/util/fdmon-io_uring.c
> >> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
> >>  
> >>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
> >>                                   cqe_handler);
> >> +
> >> +    /*
> >> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> >> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> >> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> >> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> >> +     */
> >> +    aio_notify(ctx);
> >>  }
> > 
> > Makes sense to me.
> > 
> > At first I wondered if we should use defer_call() for the aio_notify()
> > to batch the submission, but of course holding the BQL will already take
> > care of that. And in iothreads where there is no BQL, the aio_notify()
> > shouldn't make a difference anyway because we're already in the right
> > thread.
> > 
> > I suppose the other variation could be have another io_uring_enter()
> > call here (but then probably really through defer_call()) to avoid
> > waiting for another CPU to submit the request in its main loop. But I
> > don't really have an intuition if that would make things better or worse
> > in the common case.

It's possible to call io_uring_enter(). QEMU currently doesn't use
IORING_SETUP_SINGLE_ISSUER, so it's okay for multiple threads to call
io_uring_enter() on the same io_uring fd.

I experimented with IORING_SETUP_SINGLE_ISSUER (as well as
IORING_SETUP_COOP_TASKRUN and IORING_SETUP_TASKRUN_FLAG) in the past and
didn't measure a performance improvement:
https://lore.kernel.org/qemu-devel/20250724204702.576637-1-stefanha@redhat.com/

Jens, any advice regarding these flags?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-18  9:57     ` Fiona Ebner
  2026-02-18 16:06       ` Stefan Hajnoczi
@ 2026-02-18 16:11       ` Stefan Hajnoczi
  2026-02-18 16:19         ` Jens Axboe
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2026-02-18 16:11 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: Kevin Wolf, Jens Axboe, qemu-block, qemu-devel, fam

[-- Attachment #1: Type: text/plain, Size: 6387 bytes --]

On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
> > Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> >> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> >> block I/O coroutine inline on the vCPU thread because
> >> qemu_get_current_aio_context() returns the main AioContext when BQL is
> >> held. The coroutine calls luring_co_submit() which queues an SQE via
> >> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> >> in gsource_prepare() on the main loop thread.
> > 
> > Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> > in the recent changes (or I guess worker threads in theory, but I don't
> > think there any that actually make use of aio_add_sqe()).
> > 
> >> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> >> scheduled and aio_notify() is never called. The main loop remains asleep
> >> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> >> the next timer fires.
> >>
> >> Fix this by calling aio_notify() after queuing the SQE. This wakes the
> >> main loop via the eventfd so it can run gsource_prepare() and submit the
> >> pending SQE promptly.
> >>
> >> This is a generic fix that benefits all devices using aio=io_uring.
> >> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> >> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> >> main loop after queuing block I/O.
> >>
> >> This is usually a bit hard to detect, as it also relies on the ppoll
> >> loop not waking up for other activity, and micro benchmarks tend not to
> >> see it because they don't have any real processing time. With a
> >> synthetic test case that has a few usleep() to simulate processing of
> >> read data, it's very noticeable. The below example reads 128MB with
> >> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> >> each batch submit, and a 1ms delay after processing each completion.
> >> Running it on /dev/sda yields:
> >>
> >> time sudo ./iotest /dev/sda
> >>
> >> ________________________________________________________
> >> Executed in   25.76 secs      fish           external
> >>    usr time    6.19 millis  783.00 micros    5.41 millis
> >>    sys time   12.43 millis  642.00 micros   11.79 millis
> >>
> >> while on a virtio-blk or NVMe device we get:
> >>
> >> time sudo ./iotest /dev/vdb
> >>
> >> ________________________________________________________
> >> Executed in    1.25 secs      fish           external
> >>    usr time    1.40 millis    0.30 millis    1.10 millis
> >>    sys time   17.61 millis    1.43 millis   16.18 millis
> >>
> >> time sudo ./iotest /dev/nvme0n1
> >>
> >> ________________________________________________________
> >> Executed in    1.26 secs      fish           external
> >>    usr time    6.11 millis    0.52 millis    5.59 millis
> >>    sys time   13.94 millis    1.50 millis   12.43 millis
> >>
> >> where the latter are consistent. If we run the same test but keep the
> >> socket for the ssh connection active by having activity there, then
> >> the sda test looks as follows:
> >>
> >> time sudo ./iotest /dev/sda
> >>
> >> ________________________________________________________
> >> Executed in    1.23 secs      fish           external
> >>    usr time    2.70 millis   39.00 micros    2.66 millis
> >>    sys time    4.97 millis  977.00 micros    3.99 millis
> >>
> >> as now the ppoll loop is woken all the time anyway.
> >>
> >> After this fix, on an idle system:
> >>
> >> time sudo ./iotest /dev/sda
> >>
> >> ________________________________________________________
> >> Executed in    1.30 secs      fish           external
> >>    usr time    2.14 millis    0.14 millis    2.00 millis
> >>    sys time   16.93 millis    1.16 millis   15.76 millis
> >>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >> ---
> >>  util/fdmon-io_uring.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> >> index d0b56127c670..96392876b490 100644
> >> --- a/util/fdmon-io_uring.c
> >> +++ b/util/fdmon-io_uring.c
> >> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
> >>  
> >>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
> >>                                   cqe_handler);
> >> +
> >> +    /*
> >> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> >> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> >> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> >> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> >> +     */
> >> +    aio_notify(ctx);
> >>  }
> > 
> > Makes sense to me.
> > 
> > At first I wondered if we should use defer_call() for the aio_notify()
> > to batch the submission, but of course holding the BQL will already take
> > care of that. And in iothreads where there is no BQL, the aio_notify()
> > shouldn't make a difference anyway because we're already in the right
> > thread.
> > 
> > I suppose the other variation could be have another io_uring_enter()
> > call here (but then probably really through defer_call()) to avoid
> > waiting for another CPU to submit the request in its main loop. But I
> > don't really have an intuition if that would make things better or worse
> > in the common case.
> > 
> > Fiona, does this fix your case, too?
> 
> Yes, it does fix my issue [0] and the second patch gives another small
> improvement :)
> 
> Would it be slightly cleaner to have aio_add_sqe() call aio_notify()
> itself? Since aio-posix.c calls downwards into fdmon-io_uring.c, it
> would feel nicer to me to not have fdmon-io_uring.c call "back up". I
> guess it also depends on whether we expect another future fdmon
> implementation with .add_sqe() to also benefit from it.

Calling aio_notify() from aio-posix.c:aio_add_sqe() sounds better to me
because fdmon-io_uring.c has to be careful about calling aio_*() APIs to
avoid loops.

Stefan

> 
> [0]:
> https://lore.kernel.org/qemu-devel/9901305b-fbdf-4893-8e80-3bc0d1d645b0@proxmox.com/
> 
> Best Regards,
> Fiona
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-18 16:06       ` Stefan Hajnoczi
@ 2026-02-18 16:17         ` Jens Axboe
  2026-02-18 20:02           ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2026-02-18 16:17 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-block, qemu-devel, fam, Fiona Ebner

On 2/18/26 9:06 AM, Stefan Hajnoczi wrote:
> On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
>> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
>>> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
>>>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
>>>> block I/O coroutine inline on the vCPU thread because
>>>> qemu_get_current_aio_context() returns the main AioContext when BQL is
>>>> held. The coroutine calls luring_co_submit() which queues an SQE via
>>>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
>>>> in gsource_prepare() on the main loop thread.
>>>
>>> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
>>> in the recent changes (or I guess worker threads in theory, but I don't
>>> think there any that actually make use of aio_add_sqe()).
>>>
>>>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
>>>> scheduled and aio_notify() is never called. The main loop remains asleep
>>>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
>>>> the next timer fires.
>>>>
>>>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
>>>> main loop via the eventfd so it can run gsource_prepare() and submit the
>>>> pending SQE promptly.
>>>>
>>>> This is a generic fix that benefits all devices using aio=io_uring.
>>>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
>>>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
>>>> main loop after queuing block I/O.
>>>>
>>>> This is usually a bit hard to detect, as it also relies on the ppoll
>>>> loop not waking up for other activity, and micro benchmarks tend not to
>>>> see it because they don't have any real processing time. With a
>>>> synthetic test case that has a few usleep() to simulate processing of
>>>> read data, it's very noticeable. The below example reads 128MB with
>>>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
>>>> each batch submit, and a 1ms delay after processing each completion.
>>>> Running it on /dev/sda yields:
>>>>
>>>> time sudo ./iotest /dev/sda
>>>>
>>>> ________________________________________________________
>>>> Executed in   25.76 secs      fish           external
>>>>    usr time    6.19 millis  783.00 micros    5.41 millis
>>>>    sys time   12.43 millis  642.00 micros   11.79 millis
>>>>
>>>> while on a virtio-blk or NVMe device we get:
>>>>
>>>> time sudo ./iotest /dev/vdb
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.25 secs      fish           external
>>>>    usr time    1.40 millis    0.30 millis    1.10 millis
>>>>    sys time   17.61 millis    1.43 millis   16.18 millis
>>>>
>>>> time sudo ./iotest /dev/nvme0n1
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.26 secs      fish           external
>>>>    usr time    6.11 millis    0.52 millis    5.59 millis
>>>>    sys time   13.94 millis    1.50 millis   12.43 millis
>>>>
>>>> where the latter are consistent. If we run the same test but keep the
>>>> socket for the ssh connection active by having activity there, then
>>>> the sda test looks as follows:
>>>>
>>>> time sudo ./iotest /dev/sda
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.23 secs      fish           external
>>>>    usr time    2.70 millis   39.00 micros    2.66 millis
>>>>    sys time    4.97 millis  977.00 micros    3.99 millis
>>>>
>>>> as now the ppoll loop is woken all the time anyway.
>>>>
>>>> After this fix, on an idle system:
>>>>
>>>> time sudo ./iotest /dev/sda
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.30 secs      fish           external
>>>>    usr time    2.14 millis    0.14 millis    2.00 millis
>>>>    sys time   16.93 millis    1.16 millis   15.76 millis
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> ---
>>>>  util/fdmon-io_uring.c | 8 ++++++++
>>>>  1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
>>>> index d0b56127c670..96392876b490 100644
>>>> --- a/util/fdmon-io_uring.c
>>>> +++ b/util/fdmon-io_uring.c
>>>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
>>>>  
>>>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
>>>>                                   cqe_handler);
>>>> +
>>>> +    /*
>>>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
>>>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
>>>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
>>>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
>>>> +     */
>>>> +    aio_notify(ctx);
>>>>  }
>>>
>>> Makes sense to me.
>>>
>>> At first I wondered if we should use defer_call() for the aio_notify()
>>> to batch the submission, but of course holding the BQL will already take
>>> care of that. And in iothreads where there is no BQL, the aio_notify()
>>> shouldn't make a difference anyway because we're already in the right
>>> thread.
>>>
>>> I suppose the other variation could be have another io_uring_enter()
>>> call here (but then probably really through defer_call()) to avoid
>>> waiting for another CPU to submit the request in its main loop. But I
>>> don't really have an intuition if that would make things better or worse
>>> in the common case.
> 
> It's possible to call io_uring_enter(). QEMU currently doesn't use
> IORING_SETUP_SINGLE_ISSUER, so it's okay for multiple threads to call
> io_uring_enter() on the same io_uring fd.

I would not recommend that, see below.

> I experimented with IORING_SETUP_SINGLE_ISSUER (as well as
> IORING_SETUP_COOP_TASKRUN and IORING_SETUP_TASKRUN_FLAG) in the past and
> didn't measure a performance improvement:
> https://lore.kernel.org/qemu-devel/20250724204702.576637-1-stefanha@redhat.com/
> 
> Jens, any advice regarding these flags?

None other than "yes you should use them" - it's an expanding area of
"let's make that faster", so if you tested something older, then that
may be why as we didn't have a lot earlier. We're toying with getting
rid of the uring_lock for SINGLE_ISSUER, for example.

Hence I think having multiple threads do enter is a design mistake, and
one that might snowball down the line and make it harder to step back
and make SINGLE_ISSUER work for you. Certain features also end up being
gated behing DEFER_TASKRUN, which requires SINGLE_ISSUER as well.

tldr - don't have multiple threads do enter on the same ring, ever, if
it can be avoided. It's a design mistake.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-18 16:11       ` Stefan Hajnoczi
@ 2026-02-18 16:19         ` Jens Axboe
  2026-02-18 16:41           ` [PATCH v2] aio-posix: " Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2026-02-18 16:19 UTC (permalink / raw)
  To: Stefan Hajnoczi, Fiona Ebner; +Cc: Kevin Wolf, qemu-block, qemu-devel, fam

On 2/18/26 9:11 AM, Stefan Hajnoczi wrote:
> On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
>> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
>>> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
>>>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
>>>> block I/O coroutine inline on the vCPU thread because
>>>> qemu_get_current_aio_context() returns the main AioContext when BQL is
>>>> held. The coroutine calls luring_co_submit() which queues an SQE via
>>>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
>>>> in gsource_prepare() on the main loop thread.
>>>
>>> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
>>> in the recent changes (or I guess worker threads in theory, but I don't
>>> think there any that actually make use of aio_add_sqe()).
>>>
>>>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
>>>> scheduled and aio_notify() is never called. The main loop remains asleep
>>>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
>>>> the next timer fires.
>>>>
>>>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
>>>> main loop via the eventfd so it can run gsource_prepare() and submit the
>>>> pending SQE promptly.
>>>>
>>>> This is a generic fix that benefits all devices using aio=io_uring.
>>>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
>>>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
>>>> main loop after queuing block I/O.
>>>>
>>>> This is usually a bit hard to detect, as it also relies on the ppoll
>>>> loop not waking up for other activity, and micro benchmarks tend not to
>>>> see it because they don't have any real processing time. With a
>>>> synthetic test case that has a few usleep() to simulate processing of
>>>> read data, it's very noticeable. The below example reads 128MB with
>>>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
>>>> each batch submit, and a 1ms delay after processing each completion.
>>>> Running it on /dev/sda yields:
>>>>
>>>> time sudo ./iotest /dev/sda
>>>>
>>>> ________________________________________________________
>>>> Executed in   25.76 secs      fish           external
>>>>    usr time    6.19 millis  783.00 micros    5.41 millis
>>>>    sys time   12.43 millis  642.00 micros   11.79 millis
>>>>
>>>> while on a virtio-blk or NVMe device we get:
>>>>
>>>> time sudo ./iotest /dev/vdb
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.25 secs      fish           external
>>>>    usr time    1.40 millis    0.30 millis    1.10 millis
>>>>    sys time   17.61 millis    1.43 millis   16.18 millis
>>>>
>>>> time sudo ./iotest /dev/nvme0n1
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.26 secs      fish           external
>>>>    usr time    6.11 millis    0.52 millis    5.59 millis
>>>>    sys time   13.94 millis    1.50 millis   12.43 millis
>>>>
>>>> where the latter are consistent. If we run the same test but keep the
>>>> socket for the ssh connection active by having activity there, then
>>>> the sda test looks as follows:
>>>>
>>>> time sudo ./iotest /dev/sda
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.23 secs      fish           external
>>>>    usr time    2.70 millis   39.00 micros    2.66 millis
>>>>    sys time    4.97 millis  977.00 micros    3.99 millis
>>>>
>>>> as now the ppoll loop is woken all the time anyway.
>>>>
>>>> After this fix, on an idle system:
>>>>
>>>> time sudo ./iotest /dev/sda
>>>>
>>>> ________________________________________________________
>>>> Executed in    1.30 secs      fish           external
>>>>    usr time    2.14 millis    0.14 millis    2.00 millis
>>>>    sys time   16.93 millis    1.16 millis   15.76 millis
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> ---
>>>>  util/fdmon-io_uring.c | 8 ++++++++
>>>>  1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
>>>> index d0b56127c670..96392876b490 100644
>>>> --- a/util/fdmon-io_uring.c
>>>> +++ b/util/fdmon-io_uring.c
>>>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
>>>>  
>>>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
>>>>                                   cqe_handler);
>>>> +
>>>> +    /*
>>>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
>>>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
>>>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
>>>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
>>>> +     */
>>>> +    aio_notify(ctx);
>>>>  }
>>>
>>> Makes sense to me.
>>>
>>> At first I wondered if we should use defer_call() for the aio_notify()
>>> to batch the submission, but of course holding the BQL will already take
>>> care of that. And in iothreads where there is no BQL, the aio_notify()
>>> shouldn't make a difference anyway because we're already in the right
>>> thread.
>>>
>>> I suppose the other variation could be have another io_uring_enter()
>>> call here (but then probably really through defer_call()) to avoid
>>> waiting for another CPU to submit the request in its main loop. But I
>>> don't really have an intuition if that would make things better or worse
>>> in the common case.
>>>
>>> Fiona, does this fix your case, too?
>>
>> Yes, it does fix my issue [0] and the second patch gives another small
>> improvement :)
>>
>> Would it be slightly cleaner to have aio_add_sqe() call aio_notify()
>> itself? Since aio-posix.c calls downwards into fdmon-io_uring.c, it
>> would feel nicer to me to not have fdmon-io_uring.c call "back up". I
>> guess it also depends on whether we expect another future fdmon
>> implementation with .add_sqe() to also benefit from it.
> 
> Calling aio_notify() from aio-posix.c:aio_add_sqe() sounds better to me
> because fdmon-io_uring.c has to be careful about calling aio_*() APIs to
> avoid loops.

Would anyone care to make that edit? I'm on a plane and gone for a bit,
so won't get back to this for the next week. But I would love to see a
fix go in, as this issue has been plaguing me with test timeouts for
quite a while on the CI front. And seems like I'm not alone, if the
patches fix Fiona's issues as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check
  2026-02-13 14:26 ` [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check Jens Axboe
  2026-02-13 16:22   ` Kevin Wolf
@ 2026-02-18 16:24   ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2026-02-18 16:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: qemu-block, qemu-devel, fam

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

On Fri, Feb 13, 2026 at 07:26:37AM -0700, Jens Axboe wrote:
> gsource_check() only looks at the ppoll revents for the io_uring fd,
> but CQEs can be posted during gsource_prepare()'s io_uring_submit()
> call via kernel task_work processing on syscall exit. These completions
> are already sitting in the CQ ring but the ring fd may not be signaled
> yet, causing gsource_check() to return false.
> 
> Add a fallback io_uring_cq_ready() check so completions that arrive
> during submission are dispatched immediately rather than waiting for
> the next ppoll() cycle.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  util/fdmon-io_uring.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] aio-posix: notify main loop when SQEs are queued
  2026-02-18 16:19         ` Jens Axboe
@ 2026-02-18 16:41           ` Jens Axboe
  2026-02-18 20:57             ` Stefan Hajnoczi
  2026-02-19 15:49             ` Kevin Wolf
  0 siblings, 2 replies; 22+ messages in thread
From: Jens Axboe @ 2026-02-18 16:41 UTC (permalink / raw)
  To: Stefan Hajnoczi, Fiona Ebner; +Cc: Kevin Wolf, qemu-block, qemu-devel, fam

On 2/18/26 9:19 AM, Jens Axboe wrote:
> On 2/18/26 9:11 AM, Stefan Hajnoczi wrote:
>> On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
>>> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
>>>> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
>>>>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
>>>>> block I/O coroutine inline on the vCPU thread because
>>>>> qemu_get_current_aio_context() returns the main AioContext when BQL is
>>>>> held. The coroutine calls luring_co_submit() which queues an SQE via
>>>>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
>>>>> in gsource_prepare() on the main loop thread.
>>>>
>>>> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
>>>> in the recent changes (or I guess worker threads in theory, but I don't
>>>> think there any that actually make use of aio_add_sqe()).
>>>>
>>>>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
>>>>> scheduled and aio_notify() is never called. The main loop remains asleep
>>>>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
>>>>> the next timer fires.
>>>>>
>>>>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
>>>>> main loop via the eventfd so it can run gsource_prepare() and submit the
>>>>> pending SQE promptly.
>>>>>
>>>>> This is a generic fix that benefits all devices using aio=io_uring.
>>>>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
>>>>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
>>>>> main loop after queuing block I/O.
>>>>>
>>>>> This is usually a bit hard to detect, as it also relies on the ppoll
>>>>> loop not waking up for other activity, and micro benchmarks tend not to
>>>>> see it because they don't have any real processing time. With a
>>>>> synthetic test case that has a few usleep() to simulate processing of
>>>>> read data, it's very noticeable. The below example reads 128MB with
>>>>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
>>>>> each batch submit, and a 1ms delay after processing each completion.
>>>>> Running it on /dev/sda yields:
>>>>>
>>>>> time sudo ./iotest /dev/sda
>>>>>
>>>>> ________________________________________________________
>>>>> Executed in   25.76 secs      fish           external
>>>>>    usr time    6.19 millis  783.00 micros    5.41 millis
>>>>>    sys time   12.43 millis  642.00 micros   11.79 millis
>>>>>
>>>>> while on a virtio-blk or NVMe device we get:
>>>>>
>>>>> time sudo ./iotest /dev/vdb
>>>>>
>>>>> ________________________________________________________
>>>>> Executed in    1.25 secs      fish           external
>>>>>    usr time    1.40 millis    0.30 millis    1.10 millis
>>>>>    sys time   17.61 millis    1.43 millis   16.18 millis
>>>>>
>>>>> time sudo ./iotest /dev/nvme0n1
>>>>>
>>>>> ________________________________________________________
>>>>> Executed in    1.26 secs      fish           external
>>>>>    usr time    6.11 millis    0.52 millis    5.59 millis
>>>>>    sys time   13.94 millis    1.50 millis   12.43 millis
>>>>>
>>>>> where the latter are consistent. If we run the same test but keep the
>>>>> socket for the ssh connection active by having activity there, then
>>>>> the sda test looks as follows:
>>>>>
>>>>> time sudo ./iotest /dev/sda
>>>>>
>>>>> ________________________________________________________
>>>>> Executed in    1.23 secs      fish           external
>>>>>    usr time    2.70 millis   39.00 micros    2.66 millis
>>>>>    sys time    4.97 millis  977.00 micros    3.99 millis
>>>>>
>>>>> as now the ppoll loop is woken all the time anyway.
>>>>>
>>>>> After this fix, on an idle system:
>>>>>
>>>>> time sudo ./iotest /dev/sda
>>>>>
>>>>> ________________________________________________________
>>>>> Executed in    1.30 secs      fish           external
>>>>>    usr time    2.14 millis    0.14 millis    2.00 millis
>>>>>    sys time   16.93 millis    1.16 millis   15.76 millis
>>>>>
>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>> ---
>>>>>  util/fdmon-io_uring.c | 8 ++++++++
>>>>>  1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
>>>>> index d0b56127c670..96392876b490 100644
>>>>> --- a/util/fdmon-io_uring.c
>>>>> +++ b/util/fdmon-io_uring.c
>>>>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
>>>>>  
>>>>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
>>>>>                                   cqe_handler);
>>>>> +
>>>>> +    /*
>>>>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
>>>>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
>>>>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
>>>>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
>>>>> +     */
>>>>> +    aio_notify(ctx);
>>>>>  }
>>>>
>>>> Makes sense to me.
>>>>
>>>> At first I wondered if we should use defer_call() for the aio_notify()
>>>> to batch the submission, but of course holding the BQL will already take
>>>> care of that. And in iothreads where there is no BQL, the aio_notify()
>>>> shouldn't make a difference anyway because we're already in the right
>>>> thread.
>>>>
>>>> I suppose the other variation could be have another io_uring_enter()
>>>> call here (but then probably really through defer_call()) to avoid
>>>> waiting for another CPU to submit the request in its main loop. But I
>>>> don't really have an intuition if that would make things better or worse
>>>> in the common case.
>>>>
>>>> Fiona, does this fix your case, too?
>>>
>>> Yes, it does fix my issue [0] and the second patch gives another small
>>> improvement :)
>>>
>>> Would it be slightly cleaner to have aio_add_sqe() call aio_notify()
>>> itself? Since aio-posix.c calls downwards into fdmon-io_uring.c, it
>>> would feel nicer to me to not have fdmon-io_uring.c call "back up". I
>>> guess it also depends on whether we expect another future fdmon
>>> implementation with .add_sqe() to also benefit from it.
>>
>> Calling aio_notify() from aio-posix.c:aio_add_sqe() sounds better to me
>> because fdmon-io_uring.c has to be careful about calling aio_*() APIs to
>> avoid loops.
> 
> Would anyone care to make that edit? I'm on a plane and gone for a bit,
> so won't get back to this for the next week. But I would love to see a
> fix go in, as this issue has been plaguing me with test timeouts for
> quite a while on the CI front. And seems like I'm not alone, if the
> patches fix Fiona's issues as well.

Still on a plane but tested this one and it works for me too. Does seem
like a better approach, rather than stuff it in the fdmon part.

Feel free to run with this one and also to update the commit message if
you want. Thanks!


commit a8a94e7a05964d470b8fba50c9d4769489c21752
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Feb 13 06:52:14 2026 -0700

    aio-posix: notify main loop when SQEs are queued
    
    When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
    block I/O coroutine inline on the vCPU thread because
    qemu_get_current_aio_context() returns the main AioContext when BQL is
    held. The coroutine calls luring_co_submit() which queues an SQE via
    fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
    in gsource_prepare() on the main loop thread.
    
    Since the coroutine ran inline (not via aio_co_schedule()), no BH is
    scheduled and aio_notify() is never called. The main loop remains asleep
    in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
    the next timer fires.
    
    Fix this by calling aio_notify() after queuing the SQE. This wakes the
    main loop via the eventfd so it can run gsource_prepare() and submit the
    pending SQE promptly.
    
    This is a generic fix that benefits all devices using aio=io_uring.
    Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
    MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
    main loop after queuing block I/O.
    
    This is usually a bit hard to detect, as it also relies on the ppoll
    loop not waking up for other activity, and micro benchmarks tend not to
    see it because they don't have any real processing time. With a
    synthetic test case that has a few usleep() to simulate processing of
    read data, it's very noticeable. The below example reads 128MB with
    O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
    each batch submit, and a 1ms delay after processing each completion.
    Running it on /dev/sda yields:
    
    time sudo ./iotest /dev/sda
    
    ________________________________________________________
    Executed in   25.76 secs      fish           external
       usr time    6.19 millis  783.00 micros    5.41 millis
       sys time   12.43 millis  642.00 micros   11.79 millis
    
    while on a virtio-blk or NVMe device we get:
    
    time sudo ./iotest /dev/vdb
    
    ________________________________________________________
    Executed in    1.25 secs      fish           external
       usr time    1.40 millis    0.30 millis    1.10 millis
       sys time   17.61 millis    1.43 millis   16.18 millis
    
    time sudo ./iotest /dev/nvme0n1
    
    ________________________________________________________
    Executed in    1.26 secs      fish           external
       usr time    6.11 millis    0.52 millis    5.59 millis
       sys time   13.94 millis    1.50 millis   12.43 millis
    
    where the latter are consistent. If we run the same test but keep the
    socket for the ssh connection active by having activity there, then
    the sda test looks as follows:
    
    time sudo ./iotest /dev/sda
    
    ________________________________________________________
    Executed in    1.23 secs      fish           external
       usr time    2.70 millis   39.00 micros    2.66 millis
       sys time    4.97 millis  977.00 micros    3.99 millis
    
    as now the ppoll loop is woken all the time anyway.
    
    After this fix, on an idle system:
    
    time sudo ./iotest /dev/sda
    
    ________________________________________________________
    Executed in    1.30 secs      fish           external
       usr time    2.14 millis    0.14 millis    2.00 millis
       sys time   16.93 millis    1.16 millis   15.76 millis
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/util/aio-posix.c b/util/aio-posix.c
index e24b955fd91a..8c7b3795c82d 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -813,5 +813,13 @@ void aio_add_sqe(void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaque),
 {
     AioContext *ctx = qemu_get_current_aio_context();
     ctx->fdmon_ops->add_sqe(ctx, prep_sqe, opaque, cqe_handler);
+
+    /*
+     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
+     * runs a coroutine inline (holding BQL), it queues SQEs here but the
+     * actual io_uring_submit() only happens in gsource_prepare().  Without
+     * this notify, ppoll() can sleep up to 499ms before submitting.
+     */
+    aio_notify(ctx);
 }
 #endif /* CONFIG_LINUX_IO_URING */

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued
  2026-02-18 16:17         ` Jens Axboe
@ 2026-02-18 20:02           ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2026-02-18 20:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kevin Wolf, qemu-block, qemu-devel, fam, Fiona Ebner

[-- Attachment #1: Type: text/plain, Size: 7400 bytes --]

On Wed, Feb 18, 2026 at 09:17:57AM -0700, Jens Axboe wrote:
> On 2/18/26 9:06 AM, Stefan Hajnoczi wrote:
> > On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
> >> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
> >>> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> >>>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> >>>> block I/O coroutine inline on the vCPU thread because
> >>>> qemu_get_current_aio_context() returns the main AioContext when BQL is
> >>>> held. The coroutine calls luring_co_submit() which queues an SQE via
> >>>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> >>>> in gsource_prepare() on the main loop thread.
> >>>
> >>> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> >>> in the recent changes (or I guess worker threads in theory, but I don't
> >>> think there any that actually make use of aio_add_sqe()).
> >>>
> >>>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> >>>> scheduled and aio_notify() is never called. The main loop remains asleep
> >>>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> >>>> the next timer fires.
> >>>>
> >>>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
> >>>> main loop via the eventfd so it can run gsource_prepare() and submit the
> >>>> pending SQE promptly.
> >>>>
> >>>> This is a generic fix that benefits all devices using aio=io_uring.
> >>>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> >>>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> >>>> main loop after queuing block I/O.
> >>>>
> >>>> This is usually a bit hard to detect, as it also relies on the ppoll
> >>>> loop not waking up for other activity, and micro benchmarks tend not to
> >>>> see it because they don't have any real processing time. With a
> >>>> synthetic test case that has a few usleep() to simulate processing of
> >>>> read data, it's very noticeable. The below example reads 128MB with
> >>>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> >>>> each batch submit, and a 1ms delay after processing each completion.
> >>>> Running it on /dev/sda yields:
> >>>>
> >>>> time sudo ./iotest /dev/sda
> >>>>
> >>>> ________________________________________________________
> >>>> Executed in   25.76 secs      fish           external
> >>>>    usr time    6.19 millis  783.00 micros    5.41 millis
> >>>>    sys time   12.43 millis  642.00 micros   11.79 millis
> >>>>
> >>>> while on a virtio-blk or NVMe device we get:
> >>>>
> >>>> time sudo ./iotest /dev/vdb
> >>>>
> >>>> ________________________________________________________
> >>>> Executed in    1.25 secs      fish           external
> >>>>    usr time    1.40 millis    0.30 millis    1.10 millis
> >>>>    sys time   17.61 millis    1.43 millis   16.18 millis
> >>>>
> >>>> time sudo ./iotest /dev/nvme0n1
> >>>>
> >>>> ________________________________________________________
> >>>> Executed in    1.26 secs      fish           external
> >>>>    usr time    6.11 millis    0.52 millis    5.59 millis
> >>>>    sys time   13.94 millis    1.50 millis   12.43 millis
> >>>>
> >>>> where the latter are consistent. If we run the same test but keep the
> >>>> socket for the ssh connection active by having activity there, then
> >>>> the sda test looks as follows:
> >>>>
> >>>> time sudo ./iotest /dev/sda
> >>>>
> >>>> ________________________________________________________
> >>>> Executed in    1.23 secs      fish           external
> >>>>    usr time    2.70 millis   39.00 micros    2.66 millis
> >>>>    sys time    4.97 millis  977.00 micros    3.99 millis
> >>>>
> >>>> as now the ppoll loop is woken all the time anyway.
> >>>>
> >>>> After this fix, on an idle system:
> >>>>
> >>>> time sudo ./iotest /dev/sda
> >>>>
> >>>> ________________________________________________________
> >>>> Executed in    1.30 secs      fish           external
> >>>>    usr time    2.14 millis    0.14 millis    2.00 millis
> >>>>    sys time   16.93 millis    1.16 millis   15.76 millis
> >>>>
> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>> ---
> >>>>  util/fdmon-io_uring.c | 8 ++++++++
> >>>>  1 file changed, 8 insertions(+)
> >>>>
> >>>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> >>>> index d0b56127c670..96392876b490 100644
> >>>> --- a/util/fdmon-io_uring.c
> >>>> +++ b/util/fdmon-io_uring.c
> >>>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
> >>>>  
> >>>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
> >>>>                                   cqe_handler);
> >>>> +
> >>>> +    /*
> >>>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> >>>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> >>>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> >>>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> >>>> +     */
> >>>> +    aio_notify(ctx);
> >>>>  }
> >>>
> >>> Makes sense to me.
> >>>
> >>> At first I wondered if we should use defer_call() for the aio_notify()
> >>> to batch the submission, but of course holding the BQL will already take
> >>> care of that. And in iothreads where there is no BQL, the aio_notify()
> >>> shouldn't make a difference anyway because we're already in the right
> >>> thread.
> >>>
> >>> I suppose the other variation could be have another io_uring_enter()
> >>> call here (but then probably really through defer_call()) to avoid
> >>> waiting for another CPU to submit the request in its main loop. But I
> >>> don't really have an intuition if that would make things better or worse
> >>> in the common case.
> > 
> > It's possible to call io_uring_enter(). QEMU currently doesn't use
> > IORING_SETUP_SINGLE_ISSUER, so it's okay for multiple threads to call
> > io_uring_enter() on the same io_uring fd.
> 
> I would not recommend that, see below.
> 
> > I experimented with IORING_SETUP_SINGLE_ISSUER (as well as
> > IORING_SETUP_COOP_TASKRUN and IORING_SETUP_TASKRUN_FLAG) in the past and
> > didn't measure a performance improvement:
> > https://lore.kernel.org/qemu-devel/20250724204702.576637-1-stefanha@redhat.com/
> > 
> > Jens, any advice regarding these flags?
> 
> None other than "yes you should use them" - it's an expanding area of
> "let's make that faster", so if you tested something older, then that
> may be why as we didn't have a lot earlier. We're toying with getting
> rid of the uring_lock for SINGLE_ISSUER, for example.
> 
> Hence I think having multiple threads do enter is a design mistake, and
> one that might snowball down the line and make it harder to step back
> and make SINGLE_ISSUER work for you. Certain features also end up being
> gated behing DEFER_TASKRUN, which requires SINGLE_ISSUER as well.
> 
> tldr - don't have multiple threads do enter on the same ring, ever, if
> it can be avoided. It's a design mistake.

That's useful information, thanks. I will resurrect the patches to add
modern io_uring_setup() flags and we'll document the assumption that
only one thread invokes io_uring_enter().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] aio-posix: notify main loop when SQEs are queued
  2026-02-18 16:41           ` [PATCH v2] aio-posix: " Jens Axboe
@ 2026-02-18 20:57             ` Stefan Hajnoczi
  2026-02-19 14:27               ` Jens Axboe
  2026-02-19 15:49             ` Kevin Wolf
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2026-02-18 20:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Fiona Ebner, Kevin Wolf, qemu-block, qemu-devel, fam

[-- Attachment #1: Type: text/plain, Size: 11440 bytes --]

On Wed, Feb 18, 2026 at 09:41:49AM -0700, Jens Axboe wrote:
> On 2/18/26 9:19 AM, Jens Axboe wrote:
> > On 2/18/26 9:11 AM, Stefan Hajnoczi wrote:
> >> On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
> >>> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
> >>>> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> >>>>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> >>>>> block I/O coroutine inline on the vCPU thread because
> >>>>> qemu_get_current_aio_context() returns the main AioContext when BQL is
> >>>>> held. The coroutine calls luring_co_submit() which queues an SQE via
> >>>>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> >>>>> in gsource_prepare() on the main loop thread.
> >>>>
> >>>> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> >>>> in the recent changes (or I guess worker threads in theory, but I don't
> >>>> think there any that actually make use of aio_add_sqe()).
> >>>>
> >>>>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> >>>>> scheduled and aio_notify() is never called. The main loop remains asleep
> >>>>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> >>>>> the next timer fires.
> >>>>>
> >>>>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
> >>>>> main loop via the eventfd so it can run gsource_prepare() and submit the
> >>>>> pending SQE promptly.
> >>>>>
> >>>>> This is a generic fix that benefits all devices using aio=io_uring.
> >>>>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> >>>>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> >>>>> main loop after queuing block I/O.
> >>>>>
> >>>>> This is usually a bit hard to detect, as it also relies on the ppoll
> >>>>> loop not waking up for other activity, and micro benchmarks tend not to
> >>>>> see it because they don't have any real processing time. With a
> >>>>> synthetic test case that has a few usleep() to simulate processing of
> >>>>> read data, it's very noticeable. The below example reads 128MB with
> >>>>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> >>>>> each batch submit, and a 1ms delay after processing each completion.
> >>>>> Running it on /dev/sda yields:
> >>>>>
> >>>>> time sudo ./iotest /dev/sda
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in   25.76 secs      fish           external
> >>>>>    usr time    6.19 millis  783.00 micros    5.41 millis
> >>>>>    sys time   12.43 millis  642.00 micros   11.79 millis
> >>>>>
> >>>>> while on a virtio-blk or NVMe device we get:
> >>>>>
> >>>>> time sudo ./iotest /dev/vdb
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.25 secs      fish           external
> >>>>>    usr time    1.40 millis    0.30 millis    1.10 millis
> >>>>>    sys time   17.61 millis    1.43 millis   16.18 millis
> >>>>>
> >>>>> time sudo ./iotest /dev/nvme0n1
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.26 secs      fish           external
> >>>>>    usr time    6.11 millis    0.52 millis    5.59 millis
> >>>>>    sys time   13.94 millis    1.50 millis   12.43 millis
> >>>>>
> >>>>> where the latter are consistent. If we run the same test but keep the
> >>>>> socket for the ssh connection active by having activity there, then
> >>>>> the sda test looks as follows:
> >>>>>
> >>>>> time sudo ./iotest /dev/sda
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.23 secs      fish           external
> >>>>>    usr time    2.70 millis   39.00 micros    2.66 millis
> >>>>>    sys time    4.97 millis  977.00 micros    3.99 millis
> >>>>>
> >>>>> as now the ppoll loop is woken all the time anyway.
> >>>>>
> >>>>> After this fix, on an idle system:
> >>>>>
> >>>>> time sudo ./iotest /dev/sda
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.30 secs      fish           external
> >>>>>    usr time    2.14 millis    0.14 millis    2.00 millis
> >>>>>    sys time   16.93 millis    1.16 millis   15.76 millis
> >>>>>
> >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>> ---
> >>>>>  util/fdmon-io_uring.c | 8 ++++++++
> >>>>>  1 file changed, 8 insertions(+)
> >>>>>
> >>>>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> >>>>> index d0b56127c670..96392876b490 100644
> >>>>> --- a/util/fdmon-io_uring.c
> >>>>> +++ b/util/fdmon-io_uring.c
> >>>>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
> >>>>>  
> >>>>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
> >>>>>                                   cqe_handler);
> >>>>> +
> >>>>> +    /*
> >>>>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> >>>>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> >>>>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> >>>>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> >>>>> +     */
> >>>>> +    aio_notify(ctx);
> >>>>>  }
> >>>>
> >>>> Makes sense to me.
> >>>>
> >>>> At first I wondered if we should use defer_call() for the aio_notify()
> >>>> to batch the submission, but of course holding the BQL will already take
> >>>> care of that. And in iothreads where there is no BQL, the aio_notify()
> >>>> shouldn't make a difference anyway because we're already in the right
> >>>> thread.
> >>>>
> >>>> I suppose the other variation could be have another io_uring_enter()
> >>>> call here (but then probably really through defer_call()) to avoid
> >>>> waiting for another CPU to submit the request in its main loop. But I
> >>>> don't really have an intuition if that would make things better or worse
> >>>> in the common case.
> >>>>
> >>>> Fiona, does this fix your case, too?
> >>>
> >>> Yes, it does fix my issue [0] and the second patch gives another small
> >>> improvement :)
> >>>
> >>> Would it be slightly cleaner to have aio_add_sqe() call aio_notify()
> >>> itself? Since aio-posix.c calls downwards into fdmon-io_uring.c, it
> >>> would feel nicer to me to not have fdmon-io_uring.c call "back up". I
> >>> guess it also depends on whether we expect another future fdmon
> >>> implementation with .add_sqe() to also benefit from it.
> >>
> >> Calling aio_notify() from aio-posix.c:aio_add_sqe() sounds better to me
> >> because fdmon-io_uring.c has to be careful about calling aio_*() APIs to
> >> avoid loops.
> > 
> > Would anyone care to make that edit? I'm on a plane and gone for a bit,
> > so won't get back to this for the next week. But I would love to see a
> > fix go in, as this issue has been plaguing me with test timeouts for
> > quite a while on the CI front. And seems like I'm not alone, if the
> > patches fix Fiona's issues as well.
> 
> Still on a plane but tested this one and it works for me too. Does seem
> like a better approach, rather than stuff it in the fdmon part.
> 
> Feel free to run with this one and also to update the commit message if
> you want. Thanks!
> 
> 
> commit a8a94e7a05964d470b8fba50c9d4769489c21752
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Fri Feb 13 06:52:14 2026 -0700
> 
>     aio-posix: notify main loop when SQEs are queued
>     
>     When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
>     block I/O coroutine inline on the vCPU thread because
>     qemu_get_current_aio_context() returns the main AioContext when BQL is
>     held. The coroutine calls luring_co_submit() which queues an SQE via
>     fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
>     in gsource_prepare() on the main loop thread.
>     
>     Since the coroutine ran inline (not via aio_co_schedule()), no BH is
>     scheduled and aio_notify() is never called. The main loop remains asleep
>     in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
>     the next timer fires.
>     
>     Fix this by calling aio_notify() after queuing the SQE. This wakes the
>     main loop via the eventfd so it can run gsource_prepare() and submit the
>     pending SQE promptly.
>     
>     This is a generic fix that benefits all devices using aio=io_uring.
>     Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
>     MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
>     main loop after queuing block I/O.
>     
>     This is usually a bit hard to detect, as it also relies on the ppoll
>     loop not waking up for other activity, and micro benchmarks tend not to
>     see it because they don't have any real processing time. With a
>     synthetic test case that has a few usleep() to simulate processing of
>     read data, it's very noticeable. The below example reads 128MB with
>     O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
>     each batch submit, and a 1ms delay after processing each completion.
>     Running it on /dev/sda yields:
>     
>     time sudo ./iotest /dev/sda
>     
>     ________________________________________________________
>     Executed in   25.76 secs      fish           external
>        usr time    6.19 millis  783.00 micros    5.41 millis
>        sys time   12.43 millis  642.00 micros   11.79 millis
>     
>     while on a virtio-blk or NVMe device we get:
>     
>     time sudo ./iotest /dev/vdb
>     
>     ________________________________________________________
>     Executed in    1.25 secs      fish           external
>        usr time    1.40 millis    0.30 millis    1.10 millis
>        sys time   17.61 millis    1.43 millis   16.18 millis
>     
>     time sudo ./iotest /dev/nvme0n1
>     
>     ________________________________________________________
>     Executed in    1.26 secs      fish           external
>        usr time    6.11 millis    0.52 millis    5.59 millis
>        sys time   13.94 millis    1.50 millis   12.43 millis
>     
>     where the latter are consistent. If we run the same test but keep the
>     socket for the ssh connection active by having activity there, then
>     the sda test looks as follows:
>     
>     time sudo ./iotest /dev/sda
>     
>     ________________________________________________________
>     Executed in    1.23 secs      fish           external
>        usr time    2.70 millis   39.00 micros    2.66 millis
>        sys time    4.97 millis  977.00 micros    3.99 millis
>     
>     as now the ppoll loop is woken all the time anyway.
>     
>     After this fix, on an idle system:
>     
>     time sudo ./iotest /dev/sda
>     
>     ________________________________________________________
>     Executed in    1.30 secs      fish           external
>        usr time    2.14 millis    0.14 millis    2.00 millis
>        sys time   16.93 millis    1.16 millis   15.76 millis
>     
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>

Thanks, applied to my block tree together with Patch 2 from v1:
https://gitlab.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] aio-posix: notify main loop when SQEs are queued
  2026-02-18 20:57             ` Stefan Hajnoczi
@ 2026-02-19 14:27               ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2026-02-19 14:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Fiona Ebner, Kevin Wolf, qemu-block, qemu-devel, fam

On 2/18/26 1:57 PM, Stefan Hajnoczi wrote:
> Thanks, applied to my block tree together with Patch 2 from v1:
> https://gitlab.com/stefanha/qemu/commits/block

Great, thank you!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] aio-posix: notify main loop when SQEs are queued
  2026-02-18 16:41           ` [PATCH v2] aio-posix: " Jens Axboe
  2026-02-18 20:57             ` Stefan Hajnoczi
@ 2026-02-19 15:49             ` Kevin Wolf
  2026-02-23 13:53               ` Stefan Hajnoczi
  1 sibling, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2026-02-19 15:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Stefan Hajnoczi, Fiona Ebner, qemu-block, qemu-devel, fam

Am 18.02.2026 um 17:41 hat Jens Axboe geschrieben:
> On 2/18/26 9:19 AM, Jens Axboe wrote:
> > On 2/18/26 9:11 AM, Stefan Hajnoczi wrote:
> >> On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
> >>> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
> >>>> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> >>>>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> >>>>> block I/O coroutine inline on the vCPU thread because
> >>>>> qemu_get_current_aio_context() returns the main AioContext when BQL is
> >>>>> held. The coroutine calls luring_co_submit() which queues an SQE via
> >>>>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> >>>>> in gsource_prepare() on the main loop thread.
> >>>>
> >>>> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> >>>> in the recent changes (or I guess worker threads in theory, but I don't
> >>>> think there any that actually make use of aio_add_sqe()).
> >>>>
> >>>>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> >>>>> scheduled and aio_notify() is never called. The main loop remains asleep
> >>>>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> >>>>> the next timer fires.
> >>>>>
> >>>>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
> >>>>> main loop via the eventfd so it can run gsource_prepare() and submit the
> >>>>> pending SQE promptly.
> >>>>>
> >>>>> This is a generic fix that benefits all devices using aio=io_uring.
> >>>>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> >>>>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> >>>>> main loop after queuing block I/O.
> >>>>>
> >>>>> This is usually a bit hard to detect, as it also relies on the ppoll
> >>>>> loop not waking up for other activity, and micro benchmarks tend not to
> >>>>> see it because they don't have any real processing time. With a
> >>>>> synthetic test case that has a few usleep() to simulate processing of
> >>>>> read data, it's very noticeable. The below example reads 128MB with
> >>>>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> >>>>> each batch submit, and a 1ms delay after processing each completion.
> >>>>> Running it on /dev/sda yields:
> >>>>>
> >>>>> time sudo ./iotest /dev/sda
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in   25.76 secs      fish           external
> >>>>>    usr time    6.19 millis  783.00 micros    5.41 millis
> >>>>>    sys time   12.43 millis  642.00 micros   11.79 millis
> >>>>>
> >>>>> while on a virtio-blk or NVMe device we get:
> >>>>>
> >>>>> time sudo ./iotest /dev/vdb
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.25 secs      fish           external
> >>>>>    usr time    1.40 millis    0.30 millis    1.10 millis
> >>>>>    sys time   17.61 millis    1.43 millis   16.18 millis
> >>>>>
> >>>>> time sudo ./iotest /dev/nvme0n1
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.26 secs      fish           external
> >>>>>    usr time    6.11 millis    0.52 millis    5.59 millis
> >>>>>    sys time   13.94 millis    1.50 millis   12.43 millis
> >>>>>
> >>>>> where the latter are consistent. If we run the same test but keep the
> >>>>> socket for the ssh connection active by having activity there, then
> >>>>> the sda test looks as follows:
> >>>>>
> >>>>> time sudo ./iotest /dev/sda
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.23 secs      fish           external
> >>>>>    usr time    2.70 millis   39.00 micros    2.66 millis
> >>>>>    sys time    4.97 millis  977.00 micros    3.99 millis
> >>>>>
> >>>>> as now the ppoll loop is woken all the time anyway.
> >>>>>
> >>>>> After this fix, on an idle system:
> >>>>>
> >>>>> time sudo ./iotest /dev/sda
> >>>>>
> >>>>> ________________________________________________________
> >>>>> Executed in    1.30 secs      fish           external
> >>>>>    usr time    2.14 millis    0.14 millis    2.00 millis
> >>>>>    sys time   16.93 millis    1.16 millis   15.76 millis
> >>>>>
> >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>> ---
> >>>>>  util/fdmon-io_uring.c | 8 ++++++++
> >>>>>  1 file changed, 8 insertions(+)
> >>>>>
> >>>>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> >>>>> index d0b56127c670..96392876b490 100644
> >>>>> --- a/util/fdmon-io_uring.c
> >>>>> +++ b/util/fdmon-io_uring.c
> >>>>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
> >>>>>  
> >>>>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
> >>>>>                                   cqe_handler);
> >>>>> +
> >>>>> +    /*
> >>>>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> >>>>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> >>>>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> >>>>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> >>>>> +     */
> >>>>> +    aio_notify(ctx);
> >>>>>  }
> >>>>
> >>>> Makes sense to me.
> >>>>
> >>>> At first I wondered if we should use defer_call() for the aio_notify()
> >>>> to batch the submission, but of course holding the BQL will already take
> >>>> care of that. And in iothreads where there is no BQL, the aio_notify()
> >>>> shouldn't make a difference anyway because we're already in the right
> >>>> thread.
> >>>>
> >>>> I suppose the other variation could be have another io_uring_enter()
> >>>> call here (but then probably really through defer_call()) to avoid
> >>>> waiting for another CPU to submit the request in its main loop. But I
> >>>> don't really have an intuition if that would make things better or worse
> >>>> in the common case.
> >>>>
> >>>> Fiona, does this fix your case, too?
> >>>
> >>> Yes, it does fix my issue [0] and the second patch gives another small
> >>> improvement :)
> >>>
> >>> Would it be slightly cleaner to have aio_add_sqe() call aio_notify()
> >>> itself? Since aio-posix.c calls downwards into fdmon-io_uring.c, it
> >>> would feel nicer to me to not have fdmon-io_uring.c call "back up". I
> >>> guess it also depends on whether we expect another future fdmon
> >>> implementation with .add_sqe() to also benefit from it.
> >>
> >> Calling aio_notify() from aio-posix.c:aio_add_sqe() sounds better to me
> >> because fdmon-io_uring.c has to be careful about calling aio_*() APIs to
> >> avoid loops.
> > 
> > Would anyone care to make that edit? I'm on a plane and gone for a bit,
> > so won't get back to this for the next week. But I would love to see a
> > fix go in, as this issue has been plaguing me with test timeouts for
> > quite a while on the CI front. And seems like I'm not alone, if the
> > patches fix Fiona's issues as well.
> 
> Still on a plane but tested this one and it works for me too. Does seem
> like a better approach, rather than stuff it in the fdmon part.
> 
> Feel free to run with this one and also to update the commit message if
> you want. Thanks!
> 
> 
> commit a8a94e7a05964d470b8fba50c9d4769489c21752
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Fri Feb 13 06:52:14 2026 -0700
> 
>     aio-posix: notify main loop when SQEs are queued
>     
>     When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
>     block I/O coroutine inline on the vCPU thread because
>     qemu_get_current_aio_context() returns the main AioContext when BQL is
>     held. The coroutine calls luring_co_submit() which queues an SQE via
>     fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
>     in gsource_prepare() on the main loop thread.
>     
>     Since the coroutine ran inline (not via aio_co_schedule()), no BH is
>     scheduled and aio_notify() is never called. The main loop remains asleep
>     in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
>     the next timer fires.
>     
>     Fix this by calling aio_notify() after queuing the SQE. This wakes the
>     main loop via the eventfd so it can run gsource_prepare() and submit the
>     pending SQE promptly.
>     
>     This is a generic fix that benefits all devices using aio=io_uring.
>     Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
>     MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
>     main loop after queuing block I/O.
>     
>     This is usually a bit hard to detect, as it also relies on the ppoll
>     loop not waking up for other activity, and micro benchmarks tend not to
>     see it because they don't have any real processing time. With a
>     synthetic test case that has a few usleep() to simulate processing of
>     read data, it's very noticeable. The below example reads 128MB with
>     O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
>     each batch submit, and a 1ms delay after processing each completion.
>     Running it on /dev/sda yields:
>     
>     time sudo ./iotest /dev/sda
>     
>     ________________________________________________________
>     Executed in   25.76 secs      fish           external
>        usr time    6.19 millis  783.00 micros    5.41 millis
>        sys time   12.43 millis  642.00 micros   11.79 millis
>     
>     while on a virtio-blk or NVMe device we get:
>     
>     time sudo ./iotest /dev/vdb
>     
>     ________________________________________________________
>     Executed in    1.25 secs      fish           external
>        usr time    1.40 millis    0.30 millis    1.10 millis
>        sys time   17.61 millis    1.43 millis   16.18 millis
>     
>     time sudo ./iotest /dev/nvme0n1
>     
>     ________________________________________________________
>     Executed in    1.26 secs      fish           external
>        usr time    6.11 millis    0.52 millis    5.59 millis
>        sys time   13.94 millis    1.50 millis   12.43 millis
>     
>     where the latter are consistent. If we run the same test but keep the
>     socket for the ssh connection active by having activity there, then
>     the sda test looks as follows:
>     
>     time sudo ./iotest /dev/sda
>     
>     ________________________________________________________
>     Executed in    1.23 secs      fish           external
>        usr time    2.70 millis   39.00 micros    2.66 millis
>        sys time    4.97 millis  977.00 micros    3.99 millis
>     
>     as now the ppoll loop is woken all the time anyway.
>     
>     After this fix, on an idle system:
>     
>     time sudo ./iotest /dev/sda
>     
>     ________________________________________________________
>     Executed in    1.30 secs      fish           external
>        usr time    2.14 millis    0.14 millis    2.00 millis
>        sys time   16.93 millis    1.16 millis   15.76 millis
>     
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index e24b955fd91a..8c7b3795c82d 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -813,5 +813,13 @@ void aio_add_sqe(void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaque),
>  {
>      AioContext *ctx = qemu_get_current_aio_context();
>      ctx->fdmon_ops->add_sqe(ctx, prep_sqe, opaque, cqe_handler);
> +
> +    /*
> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the

I think the comment could even be more generic here. This is not
specific to coroutines, but the scenario is just that a vCPU thread
holding the BQL performs I/O.

> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> +     */
> +    aio_notify(ctx);
>  }
>  #endif /* CONFIG_LINUX_IO_URING */

With or without a changed comment to that effect:

Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] aio-posix: notify main loop when SQEs are queued
  2026-02-19 15:49             ` Kevin Wolf
@ 2026-02-23 13:53               ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2026-02-23 13:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jens Axboe, Fiona Ebner, qemu-block, qemu-devel, fam

[-- Attachment #1: Type: text/plain, Size: 12664 bytes --]

On Thu, Feb 19, 2026 at 04:49:11PM +0100, Kevin Wolf wrote:
> Am 18.02.2026 um 17:41 hat Jens Axboe geschrieben:
> > On 2/18/26 9:19 AM, Jens Axboe wrote:
> > > On 2/18/26 9:11 AM, Stefan Hajnoczi wrote:
> > >> On Wed, Feb 18, 2026 at 10:57:02AM +0100, Fiona Ebner wrote:
> > >>> Am 13.02.26 um 5:05 PM schrieb Kevin Wolf:
> > >>>> Am 13.02.2026 um 15:26 hat Jens Axboe geschrieben:
> > >>>>> When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> > >>>>> block I/O coroutine inline on the vCPU thread because
> > >>>>> qemu_get_current_aio_context() returns the main AioContext when BQL is
> > >>>>> held. The coroutine calls luring_co_submit() which queues an SQE via
> > >>>>> fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> > >>>>> in gsource_prepare() on the main loop thread.
> > >>>>
> > >>>> Ouch! Yes, looks like we completely missed I/O submitted in vCPU threads
> > >>>> in the recent changes (or I guess worker threads in theory, but I don't
> > >>>> think there any that actually make use of aio_add_sqe()).
> > >>>>
> > >>>>> Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> > >>>>> scheduled and aio_notify() is never called. The main loop remains asleep
> > >>>>> in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> > >>>>> the next timer fires.
> > >>>>>
> > >>>>> Fix this by calling aio_notify() after queuing the SQE. This wakes the
> > >>>>> main loop via the eventfd so it can run gsource_prepare() and submit the
> > >>>>> pending SQE promptly.
> > >>>>>
> > >>>>> This is a generic fix that benefits all devices using aio=io_uring.
> > >>>>> Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> > >>>>> MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> > >>>>> main loop after queuing block I/O.
> > >>>>>
> > >>>>> This is usually a bit hard to detect, as it also relies on the ppoll
> > >>>>> loop not waking up for other activity, and micro benchmarks tend not to
> > >>>>> see it because they don't have any real processing time. With a
> > >>>>> synthetic test case that has a few usleep() to simulate processing of
> > >>>>> read data, it's very noticeable. The below example reads 128MB with
> > >>>>> O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> > >>>>> each batch submit, and a 1ms delay after processing each completion.
> > >>>>> Running it on /dev/sda yields:
> > >>>>>
> > >>>>> time sudo ./iotest /dev/sda
> > >>>>>
> > >>>>> ________________________________________________________
> > >>>>> Executed in   25.76 secs      fish           external
> > >>>>>    usr time    6.19 millis  783.00 micros    5.41 millis
> > >>>>>    sys time   12.43 millis  642.00 micros   11.79 millis
> > >>>>>
> > >>>>> while on a virtio-blk or NVMe device we get:
> > >>>>>
> > >>>>> time sudo ./iotest /dev/vdb
> > >>>>>
> > >>>>> ________________________________________________________
> > >>>>> Executed in    1.25 secs      fish           external
> > >>>>>    usr time    1.40 millis    0.30 millis    1.10 millis
> > >>>>>    sys time   17.61 millis    1.43 millis   16.18 millis
> > >>>>>
> > >>>>> time sudo ./iotest /dev/nvme0n1
> > >>>>>
> > >>>>> ________________________________________________________
> > >>>>> Executed in    1.26 secs      fish           external
> > >>>>>    usr time    6.11 millis    0.52 millis    5.59 millis
> > >>>>>    sys time   13.94 millis    1.50 millis   12.43 millis
> > >>>>>
> > >>>>> where the latter are consistent. If we run the same test but keep the
> > >>>>> socket for the ssh connection active by having activity there, then
> > >>>>> the sda test looks as follows:
> > >>>>>
> > >>>>> time sudo ./iotest /dev/sda
> > >>>>>
> > >>>>> ________________________________________________________
> > >>>>> Executed in    1.23 secs      fish           external
> > >>>>>    usr time    2.70 millis   39.00 micros    2.66 millis
> > >>>>>    sys time    4.97 millis  977.00 micros    3.99 millis
> > >>>>>
> > >>>>> as now the ppoll loop is woken all the time anyway.
> > >>>>>
> > >>>>> After this fix, on an idle system:
> > >>>>>
> > >>>>> time sudo ./iotest /dev/sda
> > >>>>>
> > >>>>> ________________________________________________________
> > >>>>> Executed in    1.30 secs      fish           external
> > >>>>>    usr time    2.14 millis    0.14 millis    2.00 millis
> > >>>>>    sys time   16.93 millis    1.16 millis   15.76 millis
> > >>>>>
> > >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > >>>>> ---
> > >>>>>  util/fdmon-io_uring.c | 8 ++++++++
> > >>>>>  1 file changed, 8 insertions(+)
> > >>>>>
> > >>>>> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> > >>>>> index d0b56127c670..96392876b490 100644
> > >>>>> --- a/util/fdmon-io_uring.c
> > >>>>> +++ b/util/fdmon-io_uring.c
> > >>>>> @@ -181,6 +181,14 @@ static void fdmon_io_uring_add_sqe(AioContext *ctx,
> > >>>>>  
> > >>>>>      trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->off,
> > >>>>>                                   cqe_handler);
> > >>>>> +
> > >>>>> +    /*
> > >>>>> +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> > >>>>> +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> > >>>>> +     * actual io_uring_submit() only happens in gsource_prepare().  Without
> > >>>>> +     * this notify, ppoll() can sleep up to 499ms before submitting.
> > >>>>> +     */
> > >>>>> +    aio_notify(ctx);
> > >>>>>  }
> > >>>>
> > >>>> Makes sense to me.
> > >>>>
> > >>>> At first I wondered if we should use defer_call() for the aio_notify()
> > >>>> to batch the submission, but of course holding the BQL will already take
> > >>>> care of that. And in iothreads where there is no BQL, the aio_notify()
> > >>>> shouldn't make a difference anyway because we're already in the right
> > >>>> thread.
> > >>>>
> > >>>> I suppose the other variation could be have another io_uring_enter()
> > >>>> call here (but then probably really through defer_call()) to avoid
> > >>>> waiting for another CPU to submit the request in its main loop. But I
> > >>>> don't really have an intuition if that would make things better or worse
> > >>>> in the common case.
> > >>>>
> > >>>> Fiona, does this fix your case, too?
> > >>>
> > >>> Yes, it does fix my issue [0] and the second patch gives another small
> > >>> improvement :)
> > >>>
> > >>> Would it be slightly cleaner to have aio_add_sqe() call aio_notify()
> > >>> itself? Since aio-posix.c calls downwards into fdmon-io_uring.c, it
> > >>> would feel nicer to me to not have fdmon-io_uring.c call "back up". I
> > >>> guess it also depends on whether we expect another future fdmon
> > >>> implementation with .add_sqe() to also benefit from it.
> > >>
> > >> Calling aio_notify() from aio-posix.c:aio_add_sqe() sounds better to me
> > >> because fdmon-io_uring.c has to be careful about calling aio_*() APIs to
> > >> avoid loops.
> > > 
> > > Would anyone care to make that edit? I'm on a plane and gone for a bit,
> > > so won't get back to this for the next week. But I would love to see a
> > > fix go in, as this issue has been plaguing me with test timeouts for
> > > quite a while on the CI front. And seems like I'm not alone, if the
> > > patches fix Fiona's issues as well.
> > 
> > Still on a plane but tested this one and it works for me too. Does seem
> > like a better approach, rather than stuff it in the fdmon part.
> > 
> > Feel free to run with this one and also to update the commit message if
> > you want. Thanks!
> > 
> > 
> > commit a8a94e7a05964d470b8fba50c9d4769489c21752
> > Author: Jens Axboe <axboe@kernel.dk>
> > Date:   Fri Feb 13 06:52:14 2026 -0700
> > 
> >     aio-posix: notify main loop when SQEs are queued
> >     
> >     When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
> >     block I/O coroutine inline on the vCPU thread because
> >     qemu_get_current_aio_context() returns the main AioContext when BQL is
> >     held. The coroutine calls luring_co_submit() which queues an SQE via
> >     fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
> >     in gsource_prepare() on the main loop thread.
> >     
> >     Since the coroutine ran inline (not via aio_co_schedule()), no BH is
> >     scheduled and aio_notify() is never called. The main loop remains asleep
> >     in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
> >     the next timer fires.
> >     
> >     Fix this by calling aio_notify() after queuing the SQE. This wakes the
> >     main loop via the eventfd so it can run gsource_prepare() and submit the
> >     pending SQE promptly.
> >     
> >     This is a generic fix that benefits all devices using aio=io_uring.
> >     Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
> >     MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
> >     main loop after queuing block I/O.
> >     
> >     This is usually a bit hard to detect, as it also relies on the ppoll
> >     loop not waking up for other activity, and micro benchmarks tend not to
> >     see it because they don't have any real processing time. With a
> >     synthetic test case that has a few usleep() to simulate processing of
> >     read data, it's very noticeable. The below example reads 128MB with
> >     O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
> >     each batch submit, and a 1ms delay after processing each completion.
> >     Running it on /dev/sda yields:
> >     
> >     time sudo ./iotest /dev/sda
> >     
> >     ________________________________________________________
> >     Executed in   25.76 secs      fish           external
> >        usr time    6.19 millis  783.00 micros    5.41 millis
> >        sys time   12.43 millis  642.00 micros   11.79 millis
> >     
> >     while on a virtio-blk or NVMe device we get:
> >     
> >     time sudo ./iotest /dev/vdb
> >     
> >     ________________________________________________________
> >     Executed in    1.25 secs      fish           external
> >        usr time    1.40 millis    0.30 millis    1.10 millis
> >        sys time   17.61 millis    1.43 millis   16.18 millis
> >     
> >     time sudo ./iotest /dev/nvme0n1
> >     
> >     ________________________________________________________
> >     Executed in    1.26 secs      fish           external
> >        usr time    6.11 millis    0.52 millis    5.59 millis
> >        sys time   13.94 millis    1.50 millis   12.43 millis
> >     
> >     where the latter are consistent. If we run the same test but keep the
> >     socket for the ssh connection active by having activity there, then
> >     the sda test looks as follows:
> >     
> >     time sudo ./iotest /dev/sda
> >     
> >     ________________________________________________________
> >     Executed in    1.23 secs      fish           external
> >        usr time    2.70 millis   39.00 micros    2.66 millis
> >        sys time    4.97 millis  977.00 micros    3.99 millis
> >     
> >     as now the ppoll loop is woken all the time anyway.
> >     
> >     After this fix, on an idle system:
> >     
> >     time sudo ./iotest /dev/sda
> >     
> >     ________________________________________________________
> >     Executed in    1.30 secs      fish           external
> >        usr time    2.14 millis    0.14 millis    2.00 millis
> >        sys time   16.93 millis    1.16 millis   15.76 millis
> >     
> >     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > diff --git a/util/aio-posix.c b/util/aio-posix.c
> > index e24b955fd91a..8c7b3795c82d 100644
> > --- a/util/aio-posix.c
> > +++ b/util/aio-posix.c
> > @@ -813,5 +813,13 @@ void aio_add_sqe(void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaque),
> >  {
> >      AioContext *ctx = qemu_get_current_aio_context();
> >      ctx->fdmon_ops->add_sqe(ctx, prep_sqe, opaque, cqe_handler);
> > +
> > +    /*
> > +     * Wake the main loop if it is sleeping in ppoll().  When a vCPU thread
> > +     * runs a coroutine inline (holding BQL), it queues SQEs here but the
> 
> I think the comment could even be more generic here. This is not
> specific to coroutines, but the scenario is just that a vCPU thread
> holding the BQL performs I/O.

Good idea, I generalized the comment when merging the patch.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET 0/2] io_uring fixes
  2026-02-13 14:26 [PATCHSET 0/2] io_uring fixes Jens Axboe
                   ` (2 preceding siblings ...)
  2026-02-18 10:07 ` [PATCHSET 0/2] io_uring fixes Fiona Ebner
@ 2026-03-03 11:52 ` Fiona Ebner
  2026-03-03 16:51   ` Jens Axboe
  2026-03-08 12:11   ` Michael Tokarev
  3 siblings, 2 replies; 22+ messages in thread
From: Fiona Ebner @ 2026-03-03 11:52 UTC (permalink / raw)
  To: Jens Axboe, qemu-block; +Cc: qemu-devel, fam, stefanha, qemu-stable

Am 13.02.26 um 3:33 PM schrieb Jens Axboe:
> Hi,
> 
> Patch 1 here is the real meat of this, patch 2 is just a slight
> improvement. For patch 1, it can literally yield a 50-80x improvement
> on the io_uring side for idle systems, where ppoll() ends up sleeping
> for 500 msec while there's IO to submit! I noticed this running the
> io_uring regression tests in a vm, where I use a variety of block
> devices for some of the tests. They would often randomly time out on
> AHCI devices, while running them on a virtio-blk or nvme device would
> finish in one second or so. I then wrote a reproducer to try and grok
> this and had claude dive into this, which helped me better grasp the
> various event loops.
> 
> Please take a look and tell me what you think. Some variant of patch 1
> should definitely be considered, but let me know if this is the right
> approach. I can easily test anything.
> 
> Also note - this seems to trigger more easily or consistently on
> aarch64, which is where I run most of my local/immediate testing.
> 
>  util/fdmon-io_uring.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 

CC-ing qemu-stable since this affects the 10.2 branch. The fixes are
already applied to master as

2ae361ef1d aio-posix: notify main loop when SQEs are queued
961fcc0f22 fdmon-io_uring: check CQ ring directly in gsource_check

Best Regards,
Fiona



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET 0/2] io_uring fixes
  2026-03-03 11:52 ` Fiona Ebner
@ 2026-03-03 16:51   ` Jens Axboe
  2026-03-08 12:11   ` Michael Tokarev
  1 sibling, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2026-03-03 16:51 UTC (permalink / raw)
  To: Fiona Ebner, qemu-block; +Cc: qemu-devel, fam, stefanha, qemu-stable

On 3/3/26 4:52 AM, Fiona Ebner wrote:
> Am 13.02.26 um 3:33 PM schrieb Jens Axboe:
>> Hi,
>>
>> Patch 1 here is the real meat of this, patch 2 is just a slight
>> improvement. For patch 1, it can literally yield a 50-80x improvement
>> on the io_uring side for idle systems, where ppoll() ends up sleeping
>> for 500 msec while there's IO to submit! I noticed this running the
>> io_uring regression tests in a vm, where I use a variety of block
>> devices for some of the tests. They would often randomly time out on
>> AHCI devices, while running them on a virtio-blk or nvme device would
>> finish in one second or so. I then wrote a reproducer to try and grok
>> this and had claude dive into this, which helped me better grasp the
>> various event loops.
>>
>> Please take a look and tell me what you think. Some variant of patch 1
>> should definitely be considered, but let me know if this is the right
>> approach. I can easily test anything.
>>
>> Also note - this seems to trigger more easily or consistently on
>> aarch64, which is where I run most of my local/immediate testing.
>>
>>  util/fdmon-io_uring.c | 22 +++++++++++++++++++++-
>>  1 file changed, 21 insertions(+), 1 deletion(-)
>>
> 
> CC-ing qemu-stable since this affects the 10.2 branch. The fixes are
> already applied to master as
> 
> 2ae361ef1d aio-posix: notify main loop when SQEs are queued
> 961fcc0f22 fdmon-io_uring: check CQ ring directly in gsource_check

Yes please, would be nice for them to hit the stable release(s) as
well. Thanks Fiona!

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET 0/2] io_uring fixes
  2026-03-03 11:52 ` Fiona Ebner
  2026-03-03 16:51   ` Jens Axboe
@ 2026-03-08 12:11   ` Michael Tokarev
  1 sibling, 0 replies; 22+ messages in thread
From: Michael Tokarev @ 2026-03-08 12:11 UTC (permalink / raw)
  To: Fiona Ebner, Jens Axboe, qemu-block
  Cc: qemu-devel, fam, stefanha, qemu-stable

On 03.03.2026 14:52, Fiona Ebner wrote:

> CC-ing qemu-stable since this affects the 10.2 branch. The fixes are
> already applied to master as
> 
> 2ae361ef1d aio-posix: notify main loop when SQEs are queued
> 961fcc0f22 fdmon-io_uring: check CQ ring directly in gsource_check

Picked up.  Thank you!

/mjt


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-08 12:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 14:26 [PATCHSET 0/2] io_uring fixes Jens Axboe
2026-02-13 14:26 ` [PATCH 1/2] fdmon-io_uring: notify main loop when SQEs are queued Jens Axboe
2026-02-13 16:04   ` Kevin Wolf
2026-02-18  9:57     ` Fiona Ebner
2026-02-18 16:06       ` Stefan Hajnoczi
2026-02-18 16:17         ` Jens Axboe
2026-02-18 20:02           ` Stefan Hajnoczi
2026-02-18 16:11       ` Stefan Hajnoczi
2026-02-18 16:19         ` Jens Axboe
2026-02-18 16:41           ` [PATCH v2] aio-posix: " Jens Axboe
2026-02-18 20:57             ` Stefan Hajnoczi
2026-02-19 14:27               ` Jens Axboe
2026-02-19 15:49             ` Kevin Wolf
2026-02-23 13:53               ` Stefan Hajnoczi
2026-02-18 15:56     ` [PATCH 1/2] fdmon-io_uring: " Stefan Hajnoczi
2026-02-13 14:26 ` [PATCH 2/2] fdmon-io_uring: check CQ ring directly in gsource_check Jens Axboe
2026-02-13 16:22   ` Kevin Wolf
2026-02-18 16:24   ` Stefan Hajnoczi
2026-02-18 10:07 ` [PATCHSET 0/2] io_uring fixes Fiona Ebner
2026-03-03 11:52 ` Fiona Ebner
2026-03-03 16:51   ` Jens Axboe
2026-03-08 12:11   ` Michael Tokarev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.