From: Changpeng Liu <changpeng.liu@intel.com>
To: qemu-devel@nongnu.org, changpeng.liu@intel.com
Cc: james.r.harris@intel.com, keith.busch@intel.com, famz@redhat.com,
stefanha@gmail.com, pbonzini@redhat.com, mst@redhat.com
Subject: [Qemu-devel] [RFC v1] Introduce a new NVMe host device type to QEMU
Date: Mon, 15 Jan 2018 16:01:54 +0800 [thread overview]
Message-ID: <1516003315-17878-1-git-send-email-changpeng.liu@intel.com> (raw)
NVMe 1.3 specification(http://nvmexpress.org/resources/specifications/) introduced a new Admin command:
Doorbell Buffer Config, which designed for emulated NVMe controllers only, Linux kernel 4.12 added the
support of Doorbell Buffer Config. With this feature, when NVMe driver issues new requests to firmware,
the driver will write shadow doorbell instead of MMIO writes, so the NVMe specification itself can
become a great Para-virtualization protocol.
While here, similar with existing vhost-user-scsi idea, we can setup a slave I/O target which can serve
Guest I/Os directly via NVMe I/O queues. Here we can route the NVMe queue's information, such as queue
size/queue address etc. to a separate slave I/O target via UNIX domain socket. I took exist QEMU
vhost-user protocol as reference, designed several totally new socket messages to enable the function.
With this idea, an emulated virtual NVMe controller will be presented at the Guest, and native NVMe
driver inside Guest can be used.
-----------------------------------------------------------------------------------------------------------------------------------------
| Unix Domain Socket Messages | Description |
-----------------------------------------------------------------------------------------------------------------------------------------
| Get Controller Capabilities | Controller capabilitiy register of NVMe specification |
-----------------------------------------------------------------------------------------------------------------------------------------
| Get/Set Controller Configuration | Enable/Disable NVMe controller |
-----------------------------------------------------------------------------------------------------------------------------------------
| Admin passthrough | Mandatory NVMe Admin commands routed to slave I/O target |
-----------------------------------------------------------------------------------------------------------------------------------------
| IO passthrough | IO messages before the shadow doorbell buffer being configured |
-----------------------------------------------------------------------------------------------------------------------------------------
| Set memory table | Same with exist vhost-user message, used for memory translation |
-----------------------------------------------------------------------------------------------------------------------------------------
| Set Guest Notifier | Completion queue interrupt, interrupt Guest when I/O completed |
-----------------------------------------------------------------------------------------------------------------------------------------
With those messages, slave I/O target can access all the I/O queues of NVMe include submission queue and
completion queue. After finished the Admin Shadow Doorbell command, the slave I/O target can start to
process the I/O requests sent from Guest.
Currently I implemented both QEMU driver and slave I/O target which largely reused the code from QEMU
NVMe driver and vhost-user driver for performance evaluation:
Optional slave I/O target(SPDK Vhost Target) patches: https://review.gerrithub.io/#/c/384213/
User space NVMe driver is implemented at the slave I/O target so that NVMe controller can be shared
with multiple VMs, and the namespaces presented to the guest VM are virtual namespaces, meaning the
slave I/O target can back these namespaces with any kind of storage. Guest OS must be 4.12 or later(with
Admin Doorbell Buffer Config support), tests from my side used Fedora 27 with 4.13 kernel.
Currently this still is an ongoing work, there are some opens need to be addressed:
-Reused a lot of code from QEMU/nvme driver, need to think about abstracting a common NVMe library;
-Reused a lot of code from QEMU/vhost-user driver, for this idea, we just want to use UNIX domain
socket to deliver mandatory messages, of course Set memory table and Set guest notifier is exactly
same with vhost-user driver;
-Can support Guest OS kernel > 4.12 with Admin Doorbell Buffer feature enabled inside Guest, for BIOS
stage IO requests and older Linux kernel without Admin Doorbell Buffer support, it can forward the IO
requests through socket message, but this will have huge performance drop;
Any feedback is appreciated.
Changpeng Liu (1):
block/NVMe: introduce a new vhost NVMe host device to QEMU
hw/block/Makefile.objs | 3 +
hw/block/nvme.h | 28 ++
hw/block/vhost.c | 439 ++++++++++++++++++++++
hw/block/vhost_user.c | 588 +++++++++++++++++++++++++++++
hw/block/vhost_user_nvme.c | 902 +++++++++++++++++++++++++++++++++++++++++++++
hw/block/vhost_user_nvme.h | 38 ++
6 files changed, 1998 insertions(+)
create mode 100644 hw/block/vhost.c
create mode 100644 hw/block/vhost_user.c
create mode 100644 hw/block/vhost_user_nvme.c
create mode 100644 hw/block/vhost_user_nvme.h
--
1.9.3
next reply other threads:[~2018-01-15 7:57 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-15 8:01 Changpeng Liu [this message]
2018-01-15 8:01 ` [Qemu-devel] [RFC v1] block/NVMe: introduce a new vhost NVMe host device to QEMU Changpeng Liu
2018-01-16 17:06 ` Paolo Bonzini
2018-01-17 0:53 ` Liu, Changpeng
2018-01-17 7:10 ` Paolo Bonzini
2018-10-23 23:39 ` Michael S. Tsirkin
2018-10-24 8:23 ` Liu, Changpeng
2018-01-29 15:29 ` Stefan Hajnoczi
2018-01-29 15:40 ` Harris, James R
2018-01-30 1:19 ` Liu, Changpeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1516003315-17878-1-git-send-email-changpeng.liu@intel.com \
--to=changpeng.liu@intel.com \
--cc=famz@redhat.com \
--cc=james.r.harris@intel.com \
--cc=keith.busch@intel.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).