qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] aio: add the aio_add_sqe() io_uring API
@ 2025-06-20  0:08 Stefan Hajnoczi
  2025-06-20  0:08 ` [PATCH v2 01/12] aio-posix: fix race between io_uring CQE and AioHandler deletion Stefan Hajnoczi
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2025-06-20  0:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Hanna Reitz, Kevin Wolf, Stefan Weil, Paolo Bonzini, Fam Zheng,
	eblake, Stefano Garzarella, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta, hibriansong

v2:
- Performance improvements
- Fix pre_sqe -> prep_sqe typo [Eric]
- Add #endif terminator comment [Eric]
- Fix spacing in aio_ctx_finalize() argument list [Eric]
- Add new "block/io_uring: use non-vectored read/write when possible" patch [Eric]
- Drop Patch 1 because multi-shot POLL_ADD has edge-triggered semantics instead
  of level-triggered semantics required by QEMU's AioContext APIs. The
  qemu-iotests 308 test case was hanging because block/export/fuse.c relies on
  level-triggered semantics. Luckily the performance reason for switching from
  one-shot to multi-shot has been solved by Patch 2 ("aio-posix: keep polling
  enabled with fdmon-io_uring.c"), so it's okay to use single-shot.
- Add a new Patch 1. It's a bug fix for a user-after-free in fdmon-io_uring.c
  triggered by qemu-iotests iothreads-nbd-export.

This patch series contains io_uring improvements:

1. Support the glib event loop in fdmon-io_uring.
   - aio-posix: fix race between io_uring CQE and AioHandler deletion
   - aio-posix: keep polling enabled with fdmon-io_uring.c
   - tests/unit: skip test-nested-aio-poll with io_uring
   - aio-posix: integrate fdmon into glib event loop

2. Enable fdmon-io_uring on hosts where io_uring is available at runtime.
   Otherwise continue using ppoll(2) or epoll(7).
   - aio: remove aio_context_use_g_source()

3. Add the new aio_add_sqe() API for submitting io_uring requests in the QEMU
   event loop.
   - aio: free AioContext when aio_context_new() fails
   - aio: add errp argument to aio_context_setup()
   - aio-posix: gracefully handle io_uring_queue_init() failure
   - aio-posix: add aio_add_sqe() API for user-defined io_uring requests
   - aio-posix: avoid EventNotifier for cqe_handler_bh

4. Use aio_add_sqe() in block/io_uring.c instead of creating a dedicated
   io_uring context for --blockdev aio=io_uring. This simplifies the code,
   reduces the number of file descriptors, and demonstrates the aio_add_sqe()
   API.
   - block/io_uring: use aio_add_sqe()
   - block/io_uring: use non-vectored read/write when possible

The highlight is aio_add_sqe(), which is needed for the FUSE-over-io_uring
Google Summer of Code project and other future QEMU features that natively use
Linux io_uring functionality.

rw        bs iodepth aio    iothread before after  diff
randread  4k       1 native        0  78353  84860 +8.3%
randread  4k      64 native        0 262370 269823 +2.8%
randwrite 4k       1 native        0 142703 144348 +1.2%
randwrite 4k      64 native        0 259947 263895 +1.5%
randread  4k       1 io_uring      0  76883  78270 +1.8%
randread  4k      64 io_uring      0 269712 250513 -7.1%
randwrite 4k       1 io_uring      0 143657 131481 -8.5%
randwrite 4k      64 io_uring      0 274461 264785 -3.5%
randread  4k       1 native        1  84080  84097 0.0%
randread  4k      64 native        1 314650 311193 -1.1%
randwrite 4k       1 native        1 172463 159993 -7.2%
randwrite 4k      64 native        1 303091 299726 -1.1%
randread  4k       1 io_uring      1  83415  84081 +0.8%
randread  4k      64 io_uring      1 324797 318429 -2.0%
randwrite 4k       1 io_uring      1 174421 172809 -0.9%
randwrite 4k      64 io_uring      1 323394 312286 -3.4%

Performance is in the same ballpark as without fdmon-io_uring. Results vary
from run to run due to the timing/batching of requests (even with iodepth=1 due
to 8 vCPUs using a single IOThread).

Here is the performance from v1 for reference:
rw        bs iodepth aio    iothread before after  diff
randread  4k       1 native        0  76281 79707  +4.5%
randread  4k      64 native        0 255078 247293 -3.1%
randwrite 4k       1 native        0 132706 123337 -7.1%
randwrite 4k      64 native        0 275589 245192 -11%
randread  4k       1 io_uring      0  75284 78023  +3.5%
randread  4k      64 io_uring      0 254637 248222 -2.5%
randwrite 4k       1 io_uring      0 126519 128641 +1.7%
randwrite 4k      64 io_uring      0 258967 249266 -3.7%
randread  4k       1 native        1  90557 88436  -2.3%
randread  4k      64 native        1 290673 280456 -3.5%
randwrite 4k       1 native        1 183015 169106 -7.6%
randwrite 4k      64 native        1 281316 280078 -0.4%
randread  4k       1 io_uring      1  92479 86983  -5.9%
randread  4k      64 io_uring      1 304229 257730 -15.3%
randwrite 4k       1 io_uring      1 183983 157425 -14.4%
randwrite 4k      64 io_uring      1 299979 264156 -11.9%

This series replaces the following older series that were held off from merging
until the QEMU 10.1 development window opened and the performance results were
collected:
- "[PATCH 0/3] [RESEND] block: unify block and fdmon io_uring"
- "[PATCH 0/4] aio-posix: integrate fdmon into glib event loop"

Stefan Hajnoczi (12):
  aio-posix: fix race between io_uring CQE and AioHandler deletion
  aio-posix: keep polling enabled with fdmon-io_uring.c
  tests/unit: skip test-nested-aio-poll with io_uring
  aio-posix: integrate fdmon into glib event loop
  aio: remove aio_context_use_g_source()
  aio: free AioContext when aio_context_new() fails
  aio: add errp argument to aio_context_setup()
  aio-posix: gracefully handle io_uring_queue_init() failure
  aio-posix: add aio_add_sqe() API for user-defined io_uring requests
  aio-posix: avoid EventNotifier for cqe_handler_bh
  block/io_uring: use aio_add_sqe()
  block/io_uring: use non-vectored read/write when possible

 include/block/aio.h               | 136 +++++++-
 include/block/raw-aio.h           |   5 -
 util/aio-posix.h                  |  18 +-
 block/file-posix.c                |  40 +--
 block/io_uring.c                  | 508 ++++++++----------------------
 stubs/io_uring.c                  |  32 --
 tests/unit/test-aio.c             |   7 +-
 tests/unit/test-nested-aio-poll.c |  13 +-
 util/aio-posix.c                  | 143 +++++----
 util/aio-win32.c                  |   6 +-
 util/async.c                      |  55 +---
 util/fdmon-epoll.c                |  52 ++-
 util/fdmon-io_uring.c             | 219 ++++++++++---
 util/fdmon-poll.c                 |  88 +++++-
 block/trace-events                |  12 +-
 stubs/meson.build                 |   3 -
 util/trace-events                 |   4 +
 17 files changed, 703 insertions(+), 638 deletions(-)
 delete mode 100644 stubs/io_uring.c

-- 
2.49.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-07-21 20:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20  0:08 [PATCH v2 00/12] aio: add the aio_add_sqe() io_uring API Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 01/12] aio-posix: fix race between io_uring CQE and AioHandler deletion Stefan Hajnoczi
2025-06-23 20:25   ` Eric Blake
2025-07-02 12:10   ` Kevin Wolf
2025-07-21 18:14     ` Stefan Hajnoczi
2025-07-21 20:47     ` Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 02/12] aio-posix: keep polling enabled with fdmon-io_uring.c Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 03/12] tests/unit: skip test-nested-aio-poll with io_uring Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 04/12] aio-posix: integrate fdmon into glib event loop Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 05/12] aio: remove aio_context_use_g_source() Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 06/12] aio: free AioContext when aio_context_new() fails Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 07/12] aio: add errp argument to aio_context_setup() Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 08/12] aio-posix: gracefully handle io_uring_queue_init() failure Stefan Hajnoczi
2025-06-23 20:39   ` Eric Blake
2025-06-20  0:08 ` [PATCH v2 09/12] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 10/12] aio-posix: avoid EventNotifier for cqe_handler_bh Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 11/12] block/io_uring: use aio_add_sqe() Stefan Hajnoczi
2025-06-20  0:08 ` [PATCH v2 12/12] block/io_uring: use non-vectored read/write when possible Stefan Hajnoczi
2025-06-23 20:40   ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).