From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: Fam Zheng <fam@euphon.net>, Paolo Bonzini <pbonzini@redhat.com>,
Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH 0/7] aio-posix: polling scalability improvements
Date: Mon, 9 Mar 2020 16:47:01 +0000 [thread overview]
Message-ID: <20200309164701.GA46812@stefanha-x1.localdomain> (raw)
In-Reply-To: <20200305170806.1313245-1-stefanha@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]
On Thu, Mar 05, 2020 at 05:07:59PM +0000, Stefan Hajnoczi wrote:
> A guest with 100 virtio-blk-pci,num-queues=32 devices only reaches 10k IOPS
> while a guest with a single device reaches 105k IOPS
> (rw=randread,bs=4k,iodepth=1,ioengine=libaio).
>
> The bottleneck is that aio_poll() userspace polling iterates over all
> AioHandlers to invoke their ->io_poll() callbacks. All AioHandlers are polled
> even if only one of them was recently active. Therefore a guest with many
> disks is slower than a guest with a single disk even when the workload only
> accesses a single disk.
>
> This patch series solves this scalability problem so that IOPS is unaffected by
> the number of devices. The trick is to poll only AioHandlers that were
> recently active so that userspace polling scales well.
>
> Unfortunately it's not possible to accomplish this with the existing epoll(7)
> fd monitoring implementation. This patch series adds a Linux io_uring fd
> monitoring implementation. The critical feature is that io_uring can check the
> readiness of file descriptors through userspace polling. This makes it
> possible to safely poll a subset of AioHandlers from userspace without risk of
> starving the other AioHandlers.
>
> Stefan Hajnoczi (7):
> aio-posix: completely stop polling when disabled
> aio-posix: move RCU_READ_LOCK() into run_poll_handlers()
> aio-posix: extract ppoll(2) and epoll(7) fd monitoring
> aio-posix: simplify FDMonOps->update() prototype
> aio-posix: add io_uring fd monitoring implementation
> aio-posix: support userspace polling of fd monitoring
> aio-posix: remove idle poll handlers to improve scalability
>
> MAINTAINERS | 2 +
> configure | 5 +
> include/block/aio.h | 70 ++++++-
> util/Makefile.objs | 3 +
> util/aio-posix.c | 449 ++++++++++++++----------------------------
> util/aio-posix.h | 81 ++++++++
> util/fdmon-epoll.c | 155 +++++++++++++++
> util/fdmon-io_uring.c | 332 +++++++++++++++++++++++++++++++
> util/fdmon-poll.c | 107 ++++++++++
> util/trace-events | 2 +
> 10 files changed, 898 insertions(+), 308 deletions(-)
> create mode 100644 util/aio-posix.h
> create mode 100644 util/fdmon-epoll.c
> create mode 100644 util/fdmon-io_uring.c
> create mode 100644 util/fdmon-poll.c
>
> --
> 2.24.1
>
Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
prev parent reply other threads:[~2020-03-09 16:48 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-05 17:07 [PATCH 0/7] aio-posix: polling scalability improvements Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 1/7] aio-posix: completely stop polling when disabled Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 2/7] aio-posix: move RCU_READ_LOCK() into run_poll_handlers() Stefan Hajnoczi
2020-03-05 17:15 ` Paolo Bonzini
2020-03-06 13:43 ` Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 3/7] aio-posix: extract ppoll(2) and epoll(7) fd monitoring Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 4/7] aio-posix: simplify FDMonOps->update() prototype Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 5/7] aio-posix: add io_uring fd monitoring implementation Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 6/7] aio-posix: support userspace polling of fd monitoring Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 7/7] aio-posix: remove idle poll handlers to improve scalability Stefan Hajnoczi
2020-03-05 17:28 ` Paolo Bonzini
2020-03-06 13:50 ` Stefan Hajnoczi
2020-03-06 14:17 ` Paolo Bonzini
2020-03-09 16:37 ` Stefan Hajnoczi
2020-03-09 16:47 ` Stefan Hajnoczi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200309164701.GA46812@stefanha-x1.localdomain \
--to=stefanha@redhat.com \
--cc=fam@euphon.net \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).