qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: Fam Zheng <fam@euphon.net>, Paolo Bonzini <pbonzini@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH 0/7] aio-posix: polling scalability improvements
Date: Mon, 9 Mar 2020 16:47:01 +0000	[thread overview]
Message-ID: <20200309164701.GA46812@stefanha-x1.localdomain> (raw)
In-Reply-To: <20200305170806.1313245-1-stefanha@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

On Thu, Mar 05, 2020 at 05:07:59PM +0000, Stefan Hajnoczi wrote:
> A guest with 100 virtio-blk-pci,num-queues=32 devices only reaches 10k IOPS
> while a guest with a single device reaches 105k IOPS
> (rw=randread,bs=4k,iodepth=1,ioengine=libaio).
> 
> The bottleneck is that aio_poll() userspace polling iterates over all
> AioHandlers to invoke their ->io_poll() callbacks.  All AioHandlers are polled
> even if only one of them was recently active.  Therefore a guest with many
> disks is slower than a guest with a single disk even when the workload only
> accesses a single disk.
> 
> This patch series solves this scalability problem so that IOPS is unaffected by
> the number of devices.  The trick is to poll only AioHandlers that were
> recently active so that userspace polling scales well.
> 
> Unfortunately it's not possible to accomplish this with the existing epoll(7)
> fd monitoring implementation.  This patch series adds a Linux io_uring fd
> monitoring implementation.  The critical feature is that io_uring can check the
> readiness of file descriptors through userspace polling.  This makes it
> possible to safely poll a subset of AioHandlers from userspace without risk of
> starving the other AioHandlers.
> 
> Stefan Hajnoczi (7):
>   aio-posix: completely stop polling when disabled
>   aio-posix: move RCU_READ_LOCK() into run_poll_handlers()
>   aio-posix: extract ppoll(2) and epoll(7) fd monitoring
>   aio-posix: simplify FDMonOps->update() prototype
>   aio-posix: add io_uring fd monitoring implementation
>   aio-posix: support userspace polling of fd monitoring
>   aio-posix: remove idle poll handlers to improve scalability
> 
>  MAINTAINERS           |   2 +
>  configure             |   5 +
>  include/block/aio.h   |  70 ++++++-
>  util/Makefile.objs    |   3 +
>  util/aio-posix.c      | 449 ++++++++++++++----------------------------
>  util/aio-posix.h      |  81 ++++++++
>  util/fdmon-epoll.c    | 155 +++++++++++++++
>  util/fdmon-io_uring.c | 332 +++++++++++++++++++++++++++++++
>  util/fdmon-poll.c     | 107 ++++++++++
>  util/trace-events     |   2 +
>  10 files changed, 898 insertions(+), 308 deletions(-)
>  create mode 100644 util/aio-posix.h
>  create mode 100644 util/fdmon-epoll.c
>  create mode 100644 util/fdmon-io_uring.c
>  create mode 100644 util/fdmon-poll.c
> 
> -- 
> 2.24.1
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      parent reply	other threads:[~2020-03-09 16:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05 17:07 [PATCH 0/7] aio-posix: polling scalability improvements Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 1/7] aio-posix: completely stop polling when disabled Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 2/7] aio-posix: move RCU_READ_LOCK() into run_poll_handlers() Stefan Hajnoczi
2020-03-05 17:15   ` Paolo Bonzini
2020-03-06 13:43     ` Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 3/7] aio-posix: extract ppoll(2) and epoll(7) fd monitoring Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 4/7] aio-posix: simplify FDMonOps->update() prototype Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 5/7] aio-posix: add io_uring fd monitoring implementation Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 6/7] aio-posix: support userspace polling of fd monitoring Stefan Hajnoczi
2020-03-05 17:08 ` [PATCH 7/7] aio-posix: remove idle poll handlers to improve scalability Stefan Hajnoczi
2020-03-05 17:28   ` Paolo Bonzini
2020-03-06 13:50     ` Stefan Hajnoczi
2020-03-06 14:17       ` Paolo Bonzini
2020-03-09 16:37         ` Stefan Hajnoczi
2020-03-09 16:47 ` Stefan Hajnoczi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200309164701.GA46812@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).