qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v1] Introduce a new NVMe host device type to QEMU
@ 2018-01-15  8:01 Changpeng Liu
  2018-01-15  8:01 ` [Qemu-devel] [RFC v1] block/NVMe: introduce a new vhost NVMe host device " Changpeng Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Changpeng Liu @ 2018-01-15  8:01 UTC (permalink / raw)
  To: qemu-devel, changpeng.liu
  Cc: james.r.harris, keith.busch, famz, stefanha, pbonzini, mst

NVMe 1.3 specification(http://nvmexpress.org/resources/specifications/) introduced a new Admin command:
Doorbell Buffer Config, which designed for emulated NVMe controllers only, Linux kernel 4.12 added the
support of Doorbell Buffer Config. With this feature, when NVMe driver issues new requests to firmware,
the driver will write shadow doorbell instead of MMIO writes, so the NVMe specification itself can
become a great Para-virtualization protocol.

While here, similar with existing vhost-user-scsi idea, we can setup a slave I/O target which can serve
Guest I/Os directly via NVMe I/O queues. Here we can route the NVMe queue's information, such as queue
size/queue address etc. to a separate slave I/O target via UNIX domain socket. I took exist QEMU
vhost-user protocol as reference, designed several totally new socket messages to enable the function.
With this idea, an emulated virtual NVMe controller  will be presented at the Guest, and native NVMe
driver inside Guest can be used.

-----------------------------------------------------------------------------------------------------------------------------------------
| Unix Domain Socket Messages      | Description                                                                                                | 
-----------------------------------------------------------------------------------------------------------------------------------------
| Get Controller Capabilities             | Controller capabilitiy register of NVMe specification                        |
-----------------------------------------------------------------------------------------------------------------------------------------
| Get/Set Controller Configuration | Enable/Disable NVMe controller                                                            |
-----------------------------------------------------------------------------------------------------------------------------------------
| Admin passthrough                        | Mandatory NVMe Admin commands routed to slave I/O target      |
-----------------------------------------------------------------------------------------------------------------------------------------
| IO passthrough                               | IO messages before the shadow doorbell buffer being configured  |
-----------------------------------------------------------------------------------------------------------------------------------------
| Set memory table                          | Same with exist vhost-user message, used for memory translation |
-----------------------------------------------------------------------------------------------------------------------------------------
| Set Guest Notifier                          | Completion queue interrupt, interrupt Guest when I/O completed |
-----------------------------------------------------------------------------------------------------------------------------------------

With those messages, slave I/O target can access all the I/O queues of NVMe include submission queue and
completion queue. After finished the Admin Shadow Doorbell command, the slave I/O target can start to
process the I/O requests sent from Guest.

Currently I implemented both QEMU driver and slave I/O target which largely reused the code from QEMU
NVMe driver and vhost-user driver for performance evaluation:

Optional slave I/O target(SPDK Vhost Target) patches: https://review.gerrithub.io/#/c/384213/

User space NVMe driver is implemented at the slave I/O target so that NVMe controller can be shared
with multiple VMs, and the namespaces presented to the guest VM are virtual namespaces, meaning the
slave I/O target can back these namespaces with any kind of storage. Guest OS must be 4.12 or later(with
Admin Doorbell Buffer Config support), tests from my side used Fedora 27 with 4.13 kernel.

Currently this still is an ongoing work, there are some opens need to be addressed:
-Reused a lot of code from QEMU/nvme driver, need to think about abstracting a common NVMe library;
-Reused a lot of code from QEMU/vhost-user driver, for this idea, we just want to use UNIX domain
 socket to deliver mandatory messages, of course Set memory table and Set guest notifier is exactly
 same with vhost-user driver;
-Can support Guest OS kernel > 4.12 with Admin Doorbell Buffer feature enabled inside Guest, for BIOS
 stage IO requests and older Linux kernel without Admin Doorbell Buffer support, it can forward the IO
 requests through socket message, but this will have huge performance drop;

Any feedback is appreciated.

Changpeng Liu (1):
  block/NVMe: introduce a new vhost NVMe host device to QEMU

 hw/block/Makefile.objs     |   3 +
 hw/block/nvme.h            |  28 ++
 hw/block/vhost.c           | 439 ++++++++++++++++++++++
 hw/block/vhost_user.c      | 588 +++++++++++++++++++++++++++++
 hw/block/vhost_user_nvme.c | 902 +++++++++++++++++++++++++++++++++++++++++++++
 hw/block/vhost_user_nvme.h |  38 ++
 6 files changed, 1998 insertions(+)
 create mode 100644 hw/block/vhost.c
 create mode 100644 hw/block/vhost_user.c
 create mode 100644 hw/block/vhost_user_nvme.c
 create mode 100644 hw/block/vhost_user_nvme.h

-- 
1.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-24  8:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-15  8:01 [Qemu-devel] [RFC v1] Introduce a new NVMe host device type to QEMU Changpeng Liu
2018-01-15  8:01 ` [Qemu-devel] [RFC v1] block/NVMe: introduce a new vhost NVMe host device " Changpeng Liu
2018-01-16 17:06   ` Paolo Bonzini
2018-01-17  0:53     ` Liu, Changpeng
2018-01-17  7:10       ` Paolo Bonzini
2018-10-23 23:39     ` Michael S. Tsirkin
2018-10-24  8:23       ` Liu, Changpeng
2018-01-29 15:29   ` Stefan Hajnoczi
2018-01-29 15:40     ` Harris, James R
2018-01-30  1:19     ` Liu, Changpeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).