From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: Hanna Reitz <hreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>,
Stefan Weil <sw@weilnetz.de>, Paolo Bonzini <pbonzini@redhat.com>,
Fam Zheng <fam@euphon.net>,
eblake@redhat.com, Stefano Garzarella <sgarzare@redhat.com>,
qemu-block@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
Aarushi Mehta <mehta.aaru20@gmail.com>,
hibriansong@gmail.com
Subject: [PATCH v2 00/12] aio: add the aio_add_sqe() io_uring API
Date: Thu, 19 Jun 2025 20:08:16 -0400 [thread overview]
Message-ID: <20250620000829.1426291-1-stefanha@redhat.com> (raw)
v2:
- Performance improvements
- Fix pre_sqe -> prep_sqe typo [Eric]
- Add #endif terminator comment [Eric]
- Fix spacing in aio_ctx_finalize() argument list [Eric]
- Add new "block/io_uring: use non-vectored read/write when possible" patch [Eric]
- Drop Patch 1 because multi-shot POLL_ADD has edge-triggered semantics instead
of level-triggered semantics required by QEMU's AioContext APIs. The
qemu-iotests 308 test case was hanging because block/export/fuse.c relies on
level-triggered semantics. Luckily the performance reason for switching from
one-shot to multi-shot has been solved by Patch 2 ("aio-posix: keep polling
enabled with fdmon-io_uring.c"), so it's okay to use single-shot.
- Add a new Patch 1. It's a bug fix for a user-after-free in fdmon-io_uring.c
triggered by qemu-iotests iothreads-nbd-export.
This patch series contains io_uring improvements:
1. Support the glib event loop in fdmon-io_uring.
- aio-posix: fix race between io_uring CQE and AioHandler deletion
- aio-posix: keep polling enabled with fdmon-io_uring.c
- tests/unit: skip test-nested-aio-poll with io_uring
- aio-posix: integrate fdmon into glib event loop
2. Enable fdmon-io_uring on hosts where io_uring is available at runtime.
Otherwise continue using ppoll(2) or epoll(7).
- aio: remove aio_context_use_g_source()
3. Add the new aio_add_sqe() API for submitting io_uring requests in the QEMU
event loop.
- aio: free AioContext when aio_context_new() fails
- aio: add errp argument to aio_context_setup()
- aio-posix: gracefully handle io_uring_queue_init() failure
- aio-posix: add aio_add_sqe() API for user-defined io_uring requests
- aio-posix: avoid EventNotifier for cqe_handler_bh
4. Use aio_add_sqe() in block/io_uring.c instead of creating a dedicated
io_uring context for --blockdev aio=io_uring. This simplifies the code,
reduces the number of file descriptors, and demonstrates the aio_add_sqe()
API.
- block/io_uring: use aio_add_sqe()
- block/io_uring: use non-vectored read/write when possible
The highlight is aio_add_sqe(), which is needed for the FUSE-over-io_uring
Google Summer of Code project and other future QEMU features that natively use
Linux io_uring functionality.
rw bs iodepth aio iothread before after diff
randread 4k 1 native 0 78353 84860 +8.3%
randread 4k 64 native 0 262370 269823 +2.8%
randwrite 4k 1 native 0 142703 144348 +1.2%
randwrite 4k 64 native 0 259947 263895 +1.5%
randread 4k 1 io_uring 0 76883 78270 +1.8%
randread 4k 64 io_uring 0 269712 250513 -7.1%
randwrite 4k 1 io_uring 0 143657 131481 -8.5%
randwrite 4k 64 io_uring 0 274461 264785 -3.5%
randread 4k 1 native 1 84080 84097 0.0%
randread 4k 64 native 1 314650 311193 -1.1%
randwrite 4k 1 native 1 172463 159993 -7.2%
randwrite 4k 64 native 1 303091 299726 -1.1%
randread 4k 1 io_uring 1 83415 84081 +0.8%
randread 4k 64 io_uring 1 324797 318429 -2.0%
randwrite 4k 1 io_uring 1 174421 172809 -0.9%
randwrite 4k 64 io_uring 1 323394 312286 -3.4%
Performance is in the same ballpark as without fdmon-io_uring. Results vary
from run to run due to the timing/batching of requests (even with iodepth=1 due
to 8 vCPUs using a single IOThread).
Here is the performance from v1 for reference:
rw bs iodepth aio iothread before after diff
randread 4k 1 native 0 76281 79707 +4.5%
randread 4k 64 native 0 255078 247293 -3.1%
randwrite 4k 1 native 0 132706 123337 -7.1%
randwrite 4k 64 native 0 275589 245192 -11%
randread 4k 1 io_uring 0 75284 78023 +3.5%
randread 4k 64 io_uring 0 254637 248222 -2.5%
randwrite 4k 1 io_uring 0 126519 128641 +1.7%
randwrite 4k 64 io_uring 0 258967 249266 -3.7%
randread 4k 1 native 1 90557 88436 -2.3%
randread 4k 64 native 1 290673 280456 -3.5%
randwrite 4k 1 native 1 183015 169106 -7.6%
randwrite 4k 64 native 1 281316 280078 -0.4%
randread 4k 1 io_uring 1 92479 86983 -5.9%
randread 4k 64 io_uring 1 304229 257730 -15.3%
randwrite 4k 1 io_uring 1 183983 157425 -14.4%
randwrite 4k 64 io_uring 1 299979 264156 -11.9%
This series replaces the following older series that were held off from merging
until the QEMU 10.1 development window opened and the performance results were
collected:
- "[PATCH 0/3] [RESEND] block: unify block and fdmon io_uring"
- "[PATCH 0/4] aio-posix: integrate fdmon into glib event loop"
Stefan Hajnoczi (12):
aio-posix: fix race between io_uring CQE and AioHandler deletion
aio-posix: keep polling enabled with fdmon-io_uring.c
tests/unit: skip test-nested-aio-poll with io_uring
aio-posix: integrate fdmon into glib event loop
aio: remove aio_context_use_g_source()
aio: free AioContext when aio_context_new() fails
aio: add errp argument to aio_context_setup()
aio-posix: gracefully handle io_uring_queue_init() failure
aio-posix: add aio_add_sqe() API for user-defined io_uring requests
aio-posix: avoid EventNotifier for cqe_handler_bh
block/io_uring: use aio_add_sqe()
block/io_uring: use non-vectored read/write when possible
include/block/aio.h | 136 +++++++-
include/block/raw-aio.h | 5 -
util/aio-posix.h | 18 +-
block/file-posix.c | 40 +--
block/io_uring.c | 508 ++++++++----------------------
stubs/io_uring.c | 32 --
tests/unit/test-aio.c | 7 +-
tests/unit/test-nested-aio-poll.c | 13 +-
util/aio-posix.c | 143 +++++----
util/aio-win32.c | 6 +-
util/async.c | 55 +---
util/fdmon-epoll.c | 52 ++-
util/fdmon-io_uring.c | 219 ++++++++++---
util/fdmon-poll.c | 88 +++++-
block/trace-events | 12 +-
stubs/meson.build | 3 -
util/trace-events | 4 +
17 files changed, 703 insertions(+), 638 deletions(-)
delete mode 100644 stubs/io_uring.c
--
2.49.0
next reply other threads:[~2025-06-20 0:12 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-20 0:08 Stefan Hajnoczi [this message]
2025-06-20 0:08 ` [PATCH v2 01/12] aio-posix: fix race between io_uring CQE and AioHandler deletion Stefan Hajnoczi
2025-06-23 20:25 ` Eric Blake
2025-07-02 12:10 ` Kevin Wolf
2025-07-21 18:14 ` Stefan Hajnoczi
2025-07-21 20:47 ` Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 02/12] aio-posix: keep polling enabled with fdmon-io_uring.c Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 03/12] tests/unit: skip test-nested-aio-poll with io_uring Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 04/12] aio-posix: integrate fdmon into glib event loop Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 05/12] aio: remove aio_context_use_g_source() Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 06/12] aio: free AioContext when aio_context_new() fails Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 07/12] aio: add errp argument to aio_context_setup() Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 08/12] aio-posix: gracefully handle io_uring_queue_init() failure Stefan Hajnoczi
2025-06-23 20:39 ` Eric Blake
2025-06-20 0:08 ` [PATCH v2 09/12] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 10/12] aio-posix: avoid EventNotifier for cqe_handler_bh Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 11/12] block/io_uring: use aio_add_sqe() Stefan Hajnoczi
2025-06-20 0:08 ` [PATCH v2 12/12] block/io_uring: use non-vectored read/write when possible Stefan Hajnoczi
2025-06-23 20:40 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250620000829.1426291-1-stefanha@redhat.com \
--to=stefanha@redhat.com \
--cc=eblake@redhat.com \
--cc=fam@euphon.net \
--cc=hibriansong@gmail.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=mehta.aaru20@gmail.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).