From: mlin@kernel.org (Ming Lin)
Subject: [RFC PATCH 0/2] virtio nvme
Date: Wed, 23 Sep 2015 15:58:17 -0700 [thread overview]
Message-ID: <1443049097.28503.13.camel@ssi> (raw)
In-Reply-To: <1442610544.10492.33.camel@haakon3.risingtidesystems.com>
On Fri, 2015-09-18@14:09 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2015-09-18@11:12 -0700, Ming Lin wrote:
> > On Thu, 2015-09-17@17:55 -0700, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-09-17@16:31 -0700, Ming Lin wrote:
> > > > On Wed, 2015-09-16@23:10 -0700, Nicholas A. Bellinger wrote:
> > > > > Hi Ming & Co,
>
> <SNIP>
>
> > > > > > I think the future "LIO NVMe target" only speaks NVMe protocol.
> > > > > >
> > > > > > Nick(CCed), could you correct me if I'm wrong?
> > > > > >
> > > > > > For SCSI stack, we have:
> > > > > > virtio-scsi(guest)
> > > > > > tcm_vhost(or vhost_scsi, host)
> > > > > > LIO-scsi-target
> > > > > >
> > > > > > For NVMe stack, we'll have similar components:
> > > > > > virtio-nvme(guest)
> > > > > > vhost_nvme(host)
> > > > > > LIO-NVMe-target
> > > > > >
> > > > >
> > > > > I think it's more interesting to consider a 'vhost style' driver that
> > > > > can be used with unmodified nvme host OS drivers.
> > > > >
> > > > > Dr. Hannes (CC'ed) had done something like this for megasas a few years
> > > > > back using specialized QEMU emulation + eventfd based LIO fabric driver,
> > > > > and got it working with Linux + MSFT guests.
> > > > >
> > > > > Doing something similar for nvme would (potentially) be on par with
> > > > > current virtio-scsi+vhost-scsi small-block performance for scsi-mq
> > > > > guests, without the extra burden of a new command set specific virtio
> > > > > driver.
> > > >
> > > > Trying to understand it.
> > > > Is it like below?
> > > >
> > > > .------------------------. MMIO .---------------------------------------.
> > > > | Guest |--------> | Qemu |
> > > > | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) |
> > > > '------------------------' '---------------------------------------'
> > > > | ^
> > > > write NVMe | | notify command
> > > > command | | completion
> > > > to eventfd | | to eventfd
> > > > v |
> > > > .--------------------------------------.
> > > > | Host: |
> > > > | eventfd based LIO NVMe fabric driver |
> > > > '--------------------------------------'
> > > > |
> > > > | nvme_queue_rq()
> > > > v
> > > > .--------------------------------------.
> > > > | NVMe driver |
> > > > '--------------------------------------'
> > > > |
> > > > |
> > > > v
> > > > .-------------------------------------.
> > > > | NVMe device |
> > > > '-------------------------------------'
> > > >
> > >
> > > Correct. The LIO driver on KVM host would be handling some amount of
> > > NVMe host interface emulation in kernel code, and would be able to
> > > decode nvme Read/Write/Flush operations and translate -> submit to
> > > existing backend drivers.
> >
> > Let me call the "eventfd based LIO NVMe fabric driver" as
> > "tcm_eventfd_nvme"
> >
> > Currently, LIO frontend driver(iscsi, fc, vhost-scsi etc) talk to LIO
> > backend driver(fileio, iblock etc) with SCSI commands.
> >
> > Did you mean the "tcm_eventfd_nvme" driver need to translate NVMe
> > commands to SCSI commands and then submit to backend driver?
> >
>
> IBLOCK + FILEIO + RD_MCP don't speak SCSI, they simply process I/Os with
> LBA + length based on SGL memory or pass along a FLUSH with LBA +
> length.
>
> So once the 'tcm_eventfd_nvme' driver on KVM host receives a nvme host
> hardware frame via eventfd, it would decode the frame and send along the
> Read/Write/Flush when exposing existing (non nvme native) backend
> drivers.
Learned vhost architecture:
http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html
The nice thing is it is not tied to KVM in any way.
For SCSI, there are "virtio-scsi" in guest kernel and "vhost-scsi" in
host kernel.
For NVMe, there is no "virtio-nvme" in guest kernel(just unmodified NVMe
driver), but I'll do similar thing in Qemu with vhost infrastructure.
And there is "vhost_nvme" in host kernel.
For the "virtqueue" implementation in qemu-nvme, I'll possibly just
use/copy drivers/virtio/virtio_ring.c, same as what
linux/tools/virtio/virtio_test.c does.
A bit more detail graph as below. What do you think?
.-----------------------------------------. .------------------------.
| Guest(Linux, Windows, FreeBSD, Solaris) | NVMe | qemu |
| unmodified NVMe driver | command | NVMe device emulation |
| | -------> | vhost + virtqueue |
'-----------------------------------------' '------------------------'
| | ^
passthrough | kick/notify
NVMe command | via eventfd
userspace via virtqueue | | |
v v |
----------------------------------------------------------------------------------
.-----------------------------------------------------------------------.
kernel | LIO frontend driver |
| - vhost_nvme |
'-----------------------------------------------------------------------'
| translate ^
| (NVMe command) |
| to |
v (LBA, length) |
.----------------------------------------------------------------------.
| LIO backend driver |
| - fileio (/mnt/xxx.file) |
| - iblock (/dev/sda1, /dev/nvme0n1, ...) |
'----------------------------------------------------------------------'
| ^
| submit_bio() |
v |
.----------------------------------------------------------------------.
| block layer |
| |
'----------------------------------------------------------------------'
| ^
| |
v |
.----------------------------------------------------------------------.
| block device driver |
| |
'----------------------------------------------------------------------'
| | | |
| | | |
v v v v
.------------. .-----------. .------------. .---------------.
| SATA | | SCSI | | NVMe | | .... |
'------------' '-----------' '------------' '---------------'
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lin <mlin@kernel.org>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: "Minturn, Dave B" <dave.b.minturn@intel.com>,
linux-nvme@lists.infradead.org,
Linux Virtualization <virtualization@lists.linux-foundation.org>,
target-devel <target-devel@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 0/2] virtio nvme
Date: Wed, 23 Sep 2015 15:58:17 -0700 [thread overview]
Message-ID: <1443049097.28503.13.camel@ssi> (raw)
In-Reply-To: <1442610544.10492.33.camel@haakon3.risingtidesystems.com>
On Fri, 2015-09-18 at 14:09 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2015-09-18 at 11:12 -0700, Ming Lin wrote:
> > On Thu, 2015-09-17 at 17:55 -0700, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-09-17 at 16:31 -0700, Ming Lin wrote:
> > > > On Wed, 2015-09-16 at 23:10 -0700, Nicholas A. Bellinger wrote:
> > > > > Hi Ming & Co,
>
> <SNIP>
>
> > > > > > I think the future "LIO NVMe target" only speaks NVMe protocol.
> > > > > >
> > > > > > Nick(CCed), could you correct me if I'm wrong?
> > > > > >
> > > > > > For SCSI stack, we have:
> > > > > > virtio-scsi(guest)
> > > > > > tcm_vhost(or vhost_scsi, host)
> > > > > > LIO-scsi-target
> > > > > >
> > > > > > For NVMe stack, we'll have similar components:
> > > > > > virtio-nvme(guest)
> > > > > > vhost_nvme(host)
> > > > > > LIO-NVMe-target
> > > > > >
> > > > >
> > > > > I think it's more interesting to consider a 'vhost style' driver that
> > > > > can be used with unmodified nvme host OS drivers.
> > > > >
> > > > > Dr. Hannes (CC'ed) had done something like this for megasas a few years
> > > > > back using specialized QEMU emulation + eventfd based LIO fabric driver,
> > > > > and got it working with Linux + MSFT guests.
> > > > >
> > > > > Doing something similar for nvme would (potentially) be on par with
> > > > > current virtio-scsi+vhost-scsi small-block performance for scsi-mq
> > > > > guests, without the extra burden of a new command set specific virtio
> > > > > driver.
> > > >
> > > > Trying to understand it.
> > > > Is it like below?
> > > >
> > > > .------------------------. MMIO .---------------------------------------.
> > > > | Guest |--------> | Qemu |
> > > > | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) |
> > > > '------------------------' '---------------------------------------'
> > > > | ^
> > > > write NVMe | | notify command
> > > > command | | completion
> > > > to eventfd | | to eventfd
> > > > v |
> > > > .--------------------------------------.
> > > > | Host: |
> > > > | eventfd based LIO NVMe fabric driver |
> > > > '--------------------------------------'
> > > > |
> > > > | nvme_queue_rq()
> > > > v
> > > > .--------------------------------------.
> > > > | NVMe driver |
> > > > '--------------------------------------'
> > > > |
> > > > |
> > > > v
> > > > .-------------------------------------.
> > > > | NVMe device |
> > > > '-------------------------------------'
> > > >
> > >
> > > Correct. The LIO driver on KVM host would be handling some amount of
> > > NVMe host interface emulation in kernel code, and would be able to
> > > decode nvme Read/Write/Flush operations and translate -> submit to
> > > existing backend drivers.
> >
> > Let me call the "eventfd based LIO NVMe fabric driver" as
> > "tcm_eventfd_nvme"
> >
> > Currently, LIO frontend driver(iscsi, fc, vhost-scsi etc) talk to LIO
> > backend driver(fileio, iblock etc) with SCSI commands.
> >
> > Did you mean the "tcm_eventfd_nvme" driver need to translate NVMe
> > commands to SCSI commands and then submit to backend driver?
> >
>
> IBLOCK + FILEIO + RD_MCP don't speak SCSI, they simply process I/Os with
> LBA + length based on SGL memory or pass along a FLUSH with LBA +
> length.
>
> So once the 'tcm_eventfd_nvme' driver on KVM host receives a nvme host
> hardware frame via eventfd, it would decode the frame and send along the
> Read/Write/Flush when exposing existing (non nvme native) backend
> drivers.
Learned vhost architecture:
http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html
The nice thing is it is not tied to KVM in any way.
For SCSI, there are "virtio-scsi" in guest kernel and "vhost-scsi" in
host kernel.
For NVMe, there is no "virtio-nvme" in guest kernel(just unmodified NVMe
driver), but I'll do similar thing in Qemu with vhost infrastructure.
And there is "vhost_nvme" in host kernel.
For the "virtqueue" implementation in qemu-nvme, I'll possibly just
use/copy drivers/virtio/virtio_ring.c, same as what
linux/tools/virtio/virtio_test.c does.
A bit more detail graph as below. What do you think?
.-----------------------------------------. .------------------------.
| Guest(Linux, Windows, FreeBSD, Solaris) | NVMe | qemu |
| unmodified NVMe driver | command | NVMe device emulation |
| | -------> | vhost + virtqueue |
'-----------------------------------------' '------------------------'
| | ^
passthrough | kick/notify
NVMe command | via eventfd
userspace via virtqueue | | |
v v |
----------------------------------------------------------------------------------
.-----------------------------------------------------------------------.
kernel | LIO frontend driver |
| - vhost_nvme |
'-----------------------------------------------------------------------'
| translate ^
| (NVMe command) |
| to |
v (LBA, length) |
.----------------------------------------------------------------------.
| LIO backend driver |
| - fileio (/mnt/xxx.file) |
| - iblock (/dev/sda1, /dev/nvme0n1, ...) |
'----------------------------------------------------------------------'
| ^
| submit_bio() |
v |
.----------------------------------------------------------------------.
| block layer |
| |
'----------------------------------------------------------------------'
| ^
| |
v |
.----------------------------------------------------------------------.
| block device driver |
| |
'----------------------------------------------------------------------'
| | | |
| | | |
v v v v
.------------. .-----------. .------------. .---------------.
| SATA | | SCSI | | NVMe | | .... |
'------------' '-----------' '------------' '---------------'
next prev parent reply other threads:[~2015-09-23 22:58 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-10 5:48 [RFC PATCH 0/2] virtio nvme Ming Lin
2015-09-10 5:48 ` Ming Lin
2015-09-10 5:48 ` [RFC PATCH 1/2] virtio_nvme(kernel): virtual NVMe driver using virtio Ming Lin
2015-09-10 5:48 ` Ming Lin
2015-09-10 5:48 ` [RFC PATCH 2/2] virtio-nvme(qemu): NVMe device " Ming Lin
2015-09-10 5:48 ` Ming Lin
2015-09-10 14:02 ` [RFC PATCH 0/2] virtio nvme Keith Busch
2015-09-10 14:02 ` Keith Busch
2015-09-10 17:02 ` Ming Lin
2015-09-10 17:02 ` Ming Lin
2015-09-11 4:55 ` Ming Lin
2015-09-11 4:55 ` Ming Lin
2015-09-11 17:46 ` J Freyensee
2015-09-11 17:46 ` J Freyensee
2015-09-10 14:38 ` Stefan Hajnoczi
2015-09-10 14:38 ` Stefan Hajnoczi
2015-09-10 17:28 ` Ming Lin
2015-09-10 17:28 ` Ming Lin
2015-09-11 7:48 ` Stefan Hajnoczi
2015-09-11 7:48 ` Stefan Hajnoczi
2015-09-11 17:21 ` Ming Lin
2015-09-11 17:21 ` Ming Lin
2015-09-11 17:53 ` Stefan Hajnoczi
2015-09-11 17:53 ` Stefan Hajnoczi
2015-09-11 18:54 ` Ming Lin
2015-09-11 18:54 ` Ming Lin
2015-09-17 6:10 ` Nicholas A. Bellinger
2015-09-17 6:10 ` Nicholas A. Bellinger
2015-09-17 18:18 ` Ming Lin
2015-09-17 21:43 ` Nicholas A. Bellinger
2015-09-17 21:43 ` Nicholas A. Bellinger
2015-09-17 18:18 ` Ming Lin
2015-09-17 23:31 ` Ming Lin
2015-09-17 23:31 ` Ming Lin
2015-09-18 0:55 ` Nicholas A. Bellinger
2015-09-18 18:12 ` Ming Lin
2015-09-18 21:09 ` Nicholas A. Bellinger
2015-09-18 21:09 ` Nicholas A. Bellinger
2015-09-18 23:05 ` Ming Lin
2015-09-18 23:05 ` Ming Lin
2015-09-23 22:58 ` Ming Lin [this message]
2015-09-23 22:58 ` Ming Lin
2015-09-27 5:01 ` Nicholas A. Bellinger
2015-09-27 6:49 ` Ming Lin
2015-09-27 6:49 ` Ming Lin
2015-09-28 5:58 ` Hannes Reinecke
2015-09-28 5:58 ` Hannes Reinecke
2015-09-27 5:01 ` Nicholas A. Bellinger
2015-09-18 18:12 ` Ming Lin
2015-09-18 0:55 ` Nicholas A. Bellinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1443049097.28503.13.camel@ssi \
--to=mlin@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.