From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Anthony Liguori <aliguori@us.ibm.com>,
qemu-devel@nongnu.org, Blue Swirl <blauwirbel@gmail.com>,
khoa@us.ibm.com, Paolo Bonzini <pbonzini@redhat.com>,
asias@redhat.com
Subject: Re: [Qemu-devel] [PATCH v5 00/11] virtio: virtio-blk data plane
Date: Thu, 6 Dec 2012 13:38:28 +0200 [thread overview]
Message-ID: <20121206113828.GN10837@redhat.com> (raw)
In-Reply-To: <1354740430-22452-1-git-send-email-stefanha@redhat.com>
On Wed, Dec 05, 2012 at 09:46:59PM +0100, Stefan Hajnoczi wrote:
> This series adds the -device virtio-blk-pci,x-data-plane=on property that
> enables a high performance I/O codepath. A dedicated thread is used to process
> virtio-blk requests outside the global mutex and without going through the QEMU
> block layer.
>
> Khoa Huynh <khoa@us.ibm.com> reported an increase from 140,000 IOPS to 600,000
> IOPS for a single VM using virtio-blk-data-plane in July:
>
> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580
>
> The virtio-blk-data-plane approach was originally presented at Linux Plumbers
> Conference 2010. The following slides contain a brief overview:
>
> http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf
>
> The basic approach is:
> 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
> signalling when the guest kicks the virtqueue.
> 2. Requests are processed without going through the QEMU block layer using
> Linux AIO directly.
> 3. Completion interrupts are injected via irqfd from the dedicated thread.
>
> To try it out:
>
> qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=...
> -device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on
>
> Limitations:
> * Only format=raw is supported
> * Live migration is not supported
> * Block jobs, hot unplug, and other operations fail with -EBUSY
> * I/O throttling limits are ignored
> * Only Linux hosts are supported due to Linux AIO usage
>
> The code has reached a stage where I feel it is ready to merge. Users have
> been playing with it for some time and want the significant performance boost.
>
> We are refactoring QEMU to get rid of the global mutex. I believe that
> virtio-blk-data-plane can eventually become the default mode of operation.
>
> Instead of waiting for global mutex removal efforts to finish, I want to use
> virtio-blk-data-plane as an example device for AioContext and threaded hw
> dispatch refactoring. This means:
>
> 1. When the block layer can bind to an AioContext and execute I/O outside the
> global mutex, virtio-blk-data-plane can use this (and gain image format
> support).
>
> 2. When hw dispatch no longer needs the global mutex we can use hw/virtio.c
> again and perhaps run a pool of iothreads instead of dedicated data plane
> threads.
>
> But in the meantime, I have cleaned up the virtio-blk-data-plane code so that
> it can be merged as an experimental feature.
I mostly looked at the virtio side of the patchset.
I don't see any bugs here. I sent some improvement suggestions but
we can do them in tree as well.
> v5:
> * Omit memory regions with dirty logging enabled from hostmem [Michael]
> * Add doc comment about quiescing requests across memory hot unplug [Michael]
> * Clarify which Linux vhost version the vring code originates from [Michael]
> * Break up indirect vring buffer into 1 hostmem_lookup() per descriptor [Michael]
> * Barriers in hw/dataplane/vring.c to force fields to be loaded [Michael]
> * split vring_set_notification() into enable/disable [Paolo]
> * barriers in vring.c instead of virtio-blk.c [Michael]
> * move setup code from hw/virtio-blk.c into hw/dataplane/virtio-blk.c [Michael]
>
> * Note I did not get rid of the mutex+condvar approach to draining requests.
> I've had good feedback on the performance of the patch series so I'm not
> worried about eliminating the lock (it's very rarely contended). Hope
> Michael and Paolo are okay with this approach.
>
> v4:
> * Add qemu_iovec_concat_iov() [Paolo]
> * Use QEMUIOVector to copy out virtio_blk_inhdr [Michael, Paolo]
>
> v3:
> * Don't assume iovec layout [Michael]
> * Better naming for hostmem.c MemoryListener callbacks [Don]
> * More vring quarantining if commands are bogus instead of exiting [Blue]
>
> v2:
> * Use MemoryListener for thread-safe memory mapping [Paolo, Anthony, and everyone else pointed this out ;-)]
> * Quarantine invalid vring instead of exiting [Blue]
> * Replace __u16 kernel types with uint16_t [Blue]
>
> Changes from the RFC v9:
> * Add x-data-plane=on|off option and coexist with regular virtio-blk code
> * Create thread from BH so it inherits iothread cpusets
> * Drain requests on vm_stop() so stopped guest does not access image file
> * Add migration blocker
> * Add bdrv_in_use() to prevent block jobs and other operations that can interfere
> * Drop IOQueue request merging for simplicity
> * Drop ioctl interrupt injection and always use irqfd for simplicity
> * Major cleanup to split up source files
> * Rebase from qemu-kvm.git onto qemu.git
> * Address Michael Tsirkin's review comments
>
> Stefan Hajnoczi (11):
> raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
> configure: add CONFIG_VIRTIO_BLK_DATA_PLANE
> dataplane: add host memory mapping code
> dataplane: add virtqueue vring code
> dataplane: add event loop
> dataplane: add Linux AIO request queue
> iov: add iov_discard() to remove data
> test-iov: add iov_discard() testcase
> iov: add qemu_iovec_concat_iov()
> dataplane: add virtio-blk data plane code
> virtio-blk: add x-data-plane=on|off performance feature
>
> block.h | 9 +
> block/raw-posix.c | 34 ++++
> configure | 21 ++
> hw/Makefile.objs | 2 +-
> hw/dataplane/Makefile.objs | 3 +
> hw/dataplane/event-poll.c | 109 +++++++++++
> hw/dataplane/event-poll.h | 40 ++++
> hw/dataplane/hostmem.c | 173 +++++++++++++++++
> hw/dataplane/hostmem.h | 57 ++++++
> hw/dataplane/ioq.c | 118 ++++++++++++
> hw/dataplane/ioq.h | 57 ++++++
> hw/dataplane/virtio-blk.c | 463 +++++++++++++++++++++++++++++++++++++++++++++
> hw/dataplane/virtio-blk.h | 43 +++++
> hw/dataplane/vring.c | 361 +++++++++++++++++++++++++++++++++++
> hw/dataplane/vring.h | 63 ++++++
> hw/virtio-blk.c | 28 ++-
> hw/virtio-blk.h | 1 +
> hw/virtio-pci.c | 3 +
> iov.c | 80 ++++++--
> iov.h | 13 ++
> qemu-common.h | 3 +
> tests/test-iov.c | 129 +++++++++++++
> trace-events | 9 +
> 23 files changed, 1805 insertions(+), 14 deletions(-)
> create mode 100644 hw/dataplane/Makefile.objs
> create mode 100644 hw/dataplane/event-poll.c
> create mode 100644 hw/dataplane/event-poll.h
> create mode 100644 hw/dataplane/hostmem.c
> create mode 100644 hw/dataplane/hostmem.h
> create mode 100644 hw/dataplane/ioq.c
> create mode 100644 hw/dataplane/ioq.h
> create mode 100644 hw/dataplane/virtio-blk.c
> create mode 100644 hw/dataplane/virtio-blk.h
> create mode 100644 hw/dataplane/vring.c
> create mode 100644 hw/dataplane/vring.h
>
> --
> 1.8.0.1
next prev parent reply other threads:[~2012-12-06 11:35 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-05 20:46 [Qemu-devel] [PATCH v5 00/11] virtio: virtio-blk data plane Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 01/11] raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 02/11] configure: add CONFIG_VIRTIO_BLK_DATA_PLANE Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 03/11] dataplane: add host memory mapping code Stefan Hajnoczi
2012-12-09 4:02 ` liu ping fan
2012-12-09 10:36 ` Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 04/11] dataplane: add virtqueue vring code Stefan Hajnoczi
2012-12-06 11:22 ` Michael S. Tsirkin
2012-12-06 12:53 ` Stefan Hajnoczi
2012-12-07 14:07 ` Kevin Wolf
2012-12-07 14:46 ` Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 05/11] dataplane: add event loop Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 06/11] dataplane: add Linux AIO request queue Stefan Hajnoczi
2012-12-07 14:21 ` Kevin Wolf
2012-12-10 13:05 ` Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 07/11] iov: add iov_discard() to remove data Stefan Hajnoczi
2012-12-06 11:36 ` Michael S. Tsirkin
2012-12-06 14:07 ` Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 08/11] test-iov: add iov_discard() testcase Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 09/11] iov: add qemu_iovec_concat_iov() Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 10/11] dataplane: add virtio-blk data plane code Stefan Hajnoczi
2012-12-06 7:35 ` Paolo Bonzini
2012-12-06 14:03 ` Stefan Hajnoczi
2012-12-07 6:06 ` Stefan Hajnoczi
2012-12-07 10:51 ` Paolo Bonzini
2012-12-06 11:33 ` Michael S. Tsirkin
2012-12-07 5:43 ` Stefan Hajnoczi
2012-12-07 18:04 ` Kevin Wolf
2012-12-10 13:06 ` Stefan Hajnoczi
2012-12-05 20:47 ` [Qemu-devel] [PATCH v5 11/11] virtio-blk: add x-data-plane=on|off performance feature Stefan Hajnoczi
2012-12-06 11:38 ` Michael S. Tsirkin [this message]
2012-12-07 6:12 ` [Qemu-devel] [PATCH v5 00/11] virtio: virtio-blk data plane Stefan Hajnoczi
2012-12-07 2:43 ` Liu Yuan
2012-12-07 5:46 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121206113828.GN10837@redhat.com \
--to=mst@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=asias@redhat.com \
--cc=blauwirbel@gmail.com \
--cc=khoa@us.ibm.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).