qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Ming Lin <mlin@kernel.org>
Cc: qemu-devel@nongnu.org, Rob Nelson <rlnelson@google.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-nvme@lists.infradead.org,
	virtualization@lists.linux-foundation.org
Subject: Re: [Qemu-devel] [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target
Date: Wed, 2 Dec 2015 11:07:36 +0100	[thread overview]
Message-ID: <565EC2E8.9010508@redhat.com> (raw)
In-Reply-To: <1449033191.3041.11.camel@hasee>



On 02/12/2015 06:13, Ming Lin wrote:
> On Tue, 2015-12-01 at 11:59 -0500, Paolo Bonzini wrote:
>>> What do you think about virtio-nvme+vhost-nvme?
>>
>> What would be the advantage over virtio-blk?  Multiqueue is not supported
>> by QEMU but it's already supported by Linux (commit 6a27b656fc).
> 
> I expect performance would be better.

Why?  nvme and virtio-blk are almost the same, even more so with the
doorbell extension.  virtio is designed to only hit paths that are not
slowed down by virtualization.  It's really hard to do better, except
perhaps with VFIO (and then you don't need your vendor extension).

>> To me, the advantage of nvme is that it provides more than decent performance on
>> unmodified Windows guests, and thanks to your vendor extension can be used
>> on Linux as well with speeds comparable to virtio-blk.  So it's potentially
>> a very good choice for a cloud provider that wants to support Windows guests
>> (together with e.g. a fast SAS emulated controller to replace virtio-scsi,
>> and emulated igb or ixgbe to replace virtio-net).
> 
> vhost-nvme patches are learned from rts-megasas, which could possibly be
> a fast SAS emulated controller.
> https://github.com/Datera/rts-megasas

Why the hate for userspace? :)

I don't see a reason why vhost-nvme would be faster than a userspace
implementation.  vhost-blk was never committed upstream for similar
reasons: it lost all the userspace features (snapshots, storage
migration, etc.)---which are nice to have and do not cost performance if
you do not use them---without any compelling performance gain.

Without the doorbell extension you'd have to go back to userspace on
every write and ioctl to vhost (see MEGASAS_IOC_FRAME in rts-megasas).
With the doorbell extension you're doing exactly the same work, and then
kernel thread vs. userspace thread shouldn't matter much given similar
optimization effort.  A userspace NVMe, however, will gain all
optimization that is done to QEMU's block layer for free.  We have done
a lot and have more planned.

>> Which features are supported by NVMe and not virtio-blk?

Having read the driver, the main improvements of NVMe compared to
virtio-blk are support for discard and FUA.  Discard is easy to add to
virtio-blk.  In the past the idea was "just use virtio-scsi", but it may
be worth adding it now that SSDs are more common.

Thus, FUA is pretty much the only reason for a kernel-based
implementation, because it is not exported in userspace.  However, does
it actually make a difference on real-world workloads?  Local SSDs on
Google Cloud are not even persistent, so you never need to flush to them.

Paolo

      reply	other threads:[~2015-12-02 10:07 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20  0:20 [Qemu-devel] [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 1/9] nvme-vhost: add initial commit Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 2/9] nvme-vhost: add basic ioctl handlers Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 3/9] nvme-vhost: add basic nvme bar read/write Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 4/9] nvmet: add a controller "start" hook Ming Lin
2015-11-20  5:13   ` Christoph Hellwig
2015-11-20  5:31     ` Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 5/9] nvme-vhost: add controller "start" callback Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 6/9] nvmet: add a "parse_extra_admin_cmd" hook Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 7/9] nvme-vhost: add "parse_extra_admin_cmd" callback Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 8/9] nvme-vhost: add vhost memory helpers Ming Lin
2015-11-20  0:21 ` [Qemu-devel] [RFC PATCH 9/9] nvme-vhost: add nvme queue handlers Ming Lin
2015-11-20  5:16 ` [Qemu-devel] [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target Christoph Hellwig
2015-11-20  5:33   ` Ming Lin
2015-11-21 13:11 ` Paolo Bonzini
2015-11-23  8:17   ` Ming Lin
2015-11-23 14:14     ` Paolo Bonzini
2015-11-24  7:27       ` Ming Lin
2015-11-24  8:23         ` Ming Lin
2015-11-24 10:51         ` Paolo Bonzini
2015-11-24 19:25           ` Ming Lin
2015-11-25 11:27             ` Paolo Bonzini
2015-11-25 18:51               ` Ming Lin
2015-11-25 19:32                 ` Paolo Bonzini
2015-11-30 23:20       ` Ming Lin
2015-12-01 16:02         ` Paolo Bonzini
2015-12-01 16:26           ` Ming Lin
2015-12-01 16:59             ` Paolo Bonzini
2015-12-02  5:13               ` Ming Lin
2015-12-02 10:07                 ` Paolo Bonzini [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=565EC2E8.9010508@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mlin@kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rlnelson@google.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).