[Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
@ 2017-12-22  6:41 Tiwei Bie
  2017-12-22  6:41 ` [Qemu-devel] [RFC 1/3] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Tiwei Bie @ 2017-12-22  6:41 UTC (permalink / raw)
  To: virtio-dev, qemu-devel, mst, alex.williamson, pbonzini, stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This RFC patch set does some small extensions to vhost-user protocol
to support VFIO based accelerators, and makes it possible to get the
similar performance of VFIO passthru while keeping the virtio device
emulation in QEMU.

When we have virtio ring compatible devices, it's possible to setup
the device (DMA mapping, PCI config, etc) based on the existing info
(memory-table, features, vring info, etc) which is available on the
vhost-backend (e.g. DPDK vhost library). Then, we will be able to
use such devices to accelerate the emulated device for the VM. And
we call it vDPA: vhost DataPath Acceleration. The key difference
between VFIO passthru and vDPA is that, in vDPA only the data path
(e.g. ring, notify and queue interrupt) is pass-throughed, the device
control path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device VFIO passthru include (but not limit to):

- consistent device interface from guest OS;
- max flexibility on control path and hardware design;
- leveraging the existing virtio live-migration framework;

But the critical issue in vDPA is that the data path performance is
relatively low and some host threads are needed for the data path,
because some necessary mechanisms are missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch set does some small extensions to vhost-user protocol
to make both of them possible. It leverages the same mechanisms (e.g.
EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
achieve the data path pass through.

A new protocol feature bit is added to negotiate the accelerator feature
support. Two new slave message types are added to enable the notify and
interrupt passthru for each queue. From the view of vhost-user protocol
design, it's very flexible. The passthru can be enabled/disabled for
each queue individually, and it's possible to accelerate each queue by
different devices. More design and implementation details can be found
from the last patch.

There are some rough edges in this patch set (so this is a RFC patch
set for now), but it's never too early to hear the thoughts from the
community! So any comments and suggestions would be really appreciated!

Tiwei Bie (3):
  vhost-user: support receiving file descriptors in slave_read
  vhost-user: introduce shared vhost-user state
  vhost-user: add VFIO based accelerators support

 docs/interop/vhost-user.txt     |  57 ++++++
 hw/scsi/vhost-user-scsi.c       |   6 +-
 hw/vfio/common.c                |   2 +-
 hw/virtio/vhost-user.c          | 430 +++++++++++++++++++++++++++++++++++++++-
 hw/virtio/vhost.c               |   3 +-
 hw/virtio/virtio-pci.c          |   8 -
 hw/virtio/virtio-pci.h          |   8 +
 include/hw/vfio/vfio.h          |   2 +
 include/hw/virtio/vhost-user.h  |  43 ++++
 include/hw/virtio/virtio-scsi.h |   6 +-
 net/vhost-user.c                |  30 +--
 11 files changed, 561 insertions(+), 34 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.13.3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC 1/3] vhost-user: support receiving file descriptors in slave_read
  2017-12-22  6:41 [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Tiwei Bie
@ 2017-12-22  6:41 ` Tiwei Bie
  2017-12-22  6:41 ` [Qemu-devel] [RFC 2/3] vhost-user: introduce shared vhost-user state Tiwei Bie
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Tiwei Bie @ 2017-12-22  6:41 UTC (permalink / raw)
  To: virtio-dev, qemu-devel, mst, alex.williamson, pbonzini, stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 093675ed98..e7108138fd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -614,14 +614,43 @@ static void slave_read(void *opaque)
     struct vhost_user *u = dev->opaque;
     VhostUserMsg msg = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    size_t fdsize = sizeof(fd);
+    char control[CMSG_SPACE(fdsize)];
+    struct cmsghdr *cmsg;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &msg;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (msg.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", msg.size,
@@ -642,9 +671,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -669,6 +704,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC 2/3] vhost-user: introduce shared vhost-user state
  2017-12-22  6:41 [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Tiwei Bie
  2017-12-22  6:41 ` [Qemu-devel] [RFC 1/3] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
@ 2017-12-22  6:41 ` Tiwei Bie
  2017-12-22  6:41 ` [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support Tiwei Bie
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Tiwei Bie @ 2017-12-22  6:41 UTC (permalink / raw)
  To: virtio-dev, qemu-devel, mst, alex.williamson, pbonzini, stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

When multi-queue is enabled for virtio-net, each virtio
queue pair will have a vhost_dev, and the only thing they
share currently is the chardev. This patch introduces a
vhost-user state structure which will be shared by all
virtio queue pairs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/scsi/vhost-user-scsi.c       |  6 +++---
 hw/virtio/vhost-user.c          |  9 +++++----
 include/hw/virtio/vhost-user.h  | 17 +++++++++++++++++
 include/hw/virtio/virtio-scsi.h |  6 +++++-
 net/vhost-user.c                | 30 ++++++++++++++++--------------
 5 files changed, 46 insertions(+), 22 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index f7561e23fa..2c46c74128 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -73,7 +73,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     Error *err = NULL;
     int ret;
 
-    if (!vs->conf.chardev.chr) {
+    if (!vs->conf.vhost_user.chr.chr) {
         error_setg(errp, "vhost-user-scsi: missing chardev");
         return;
     }
@@ -91,7 +91,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
@@ -132,7 +132,7 @@ static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
 }
 
 static Property vhost_user_scsi_properties[] = {
-    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
+    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.vhost_user.chr),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
     DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e7108138fd..3e308d0a62 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -123,7 +124,7 @@ static VhostUserMsg m __attribute__ ((unused));
 #define VHOST_USER_VERSION    (0x1)
 
 struct vhost_user {
-    CharBackend *chr;
+    VhostUser *shared;
     int slave_fd;
 };
 
@@ -135,7 +136,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -221,7 +222,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->size;
 
     /*
@@ -767,7 +768,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->shared = opaque;
     u->slave_fd = -1;
     dev->opaque = u;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..10d698abe2
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2017 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUser {
+    CharBackend chr;
+} VhostUser;
+
+#endif
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 4c0bcdb788..885c3e84b5 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -19,6 +19,7 @@
 #define VIRTIO_SCSI_SENSE_SIZE 0
 #include "standard-headers/linux/virtio_scsi.h"
 #include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/pci/pci.h"
 #include "hw/scsi/scsi.h"
 #include "chardev/char-fe.h"
@@ -54,7 +55,10 @@ struct VirtIOSCSIConf {
     char *vhostfd;
     char *wwpn;
 #endif
-    CharBackend chardev;
+    union {
+        VhostUser vhost_user;
+        CharBackend chardev;
+    };
     uint32_t boot_tpgt;
     IOThread *iothread;
 };
diff --git a/net/vhost-user.c b/net/vhost-user.c
index c23927c912..b398294074 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
@@ -20,7 +21,7 @@
 
 typedef struct VhostUserState {
     NetClientState nc;
-    CharBackend chr; /* only queue index 0 */
+    VhostUser vhost_user; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -62,7 +63,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -155,7 +156,7 @@ static void vhost_user_cleanup(NetClientState *nc)
             g_source_remove(s->watch);
             s->watch = 0;
         }
-        qemu_chr_fe_deinit(&s->chr, true);
+        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
     }
 
     qemu_purge_queued_packets(nc);
@@ -189,7 +190,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
 {
     VhostUserState *s = opaque;
 
-    qemu_chr_fe_disconnect(&s->chr);
+    qemu_chr_fe_disconnect(&s->vhost_user.chr);
 
     return TRUE;
 }
@@ -214,7 +215,8 @@ static void chr_closed_bh(void *opaque)
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
 
-    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
+    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
+                             net_vhost_user_event,
                              NULL, opaque, NULL, true);
 
     if (err) {
@@ -237,15 +239,15 @@ static void net_vhost_user_event(void *opaque, int event)
     assert(queues < MAX_QUEUE_NUM);
 
     s = DO_UPCAST(VhostUserState, nc, ncs[0]);
-    chr = qemu_chr_fe_get_driver(&s->chr);
+    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
-            qemu_chr_fe_disconnect(&s->chr);
+        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
+            qemu_chr_fe_disconnect(&s->vhost_user.chr);
             return;
         }
-        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
+        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
                                          net_vhost_user_watch, s);
         qmp_set_link(name, true, &err);
         s->started = true;
@@ -261,8 +263,8 @@ static void net_vhost_user_event(void *opaque, int event)
 
             g_source_remove(s->watch);
             s->watch = 0;
-            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
-                                     NULL, NULL, false);
+            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
+                                     NULL, NULL, NULL, false);
 
             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
         }
@@ -294,7 +296,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         if (!nc0) {
             nc0 = nc;
             s = DO_UPCAST(VhostUserState, nc, nc);
-            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
+            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
             }
@@ -304,11 +306,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     s = DO_UPCAST(VhostUserState, nc, nc0);
     do {
-        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
+        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
             error_report_err(err);
             return -1;
         }
-        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
+        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
                                  true);
     } while (!s->started);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support
  2017-12-22  6:41 [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Tiwei Bie
  2017-12-22  6:41 ` [Qemu-devel] [RFC 1/3] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
  2017-12-22  6:41 ` [Qemu-devel] [RFC 2/3] vhost-user: introduce shared vhost-user state Tiwei Bie
@ 2017-12-22  6:41 ` Tiwei Bie
  2018-01-16 17:23   ` Alex Williamson
  2018-01-02  2:42 ` [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Alexey Kardashevskiy
  2018-01-03 14:34 ` [Qemu-devel] [virtio-dev] " Jason Wang
  4 siblings, 1 reply; 19+ messages in thread
From: Tiwei Bie @ 2017-12-22  6:41 UTC (permalink / raw)
  To: virtio-dev, qemu-devel, mst, alex.williamson, pbonzini, stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  57 ++++++
 hw/vfio/common.c               |   2 +-
 hw/virtio/vhost-user.c         | 381 ++++++++++++++++++++++++++++++++++++++++-
 hw/virtio/vhost.c              |   3 +-
 hw/virtio/virtio-pci.c         |   8 -
 hw/virtio/virtio-pci.h         |   8 +
 include/hw/vfio/vfio.h         |   2 +
 include/hw/virtio/vhost-user.h |  26 +++
 8 files changed, 476 insertions(+), 11 deletions(-)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 954771d0d8..dd029e4b9d 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -116,6 +116,15 @@ Depending on the request type, payload can be:
     - 3: IOTLB invalidate
     - 4: IOTLB access fail
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit unsigned integer
+   Size: a 64-bit size
+   Offset: a 64-bit offset
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -129,6 +138,7 @@ typedef struct VhostUserMsg {
         VhostUserMemory memory;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -317,6 +327,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
 A slave may then send VHOST_USER_SLAVE_* messages to the master
 using this fd communication channel.
 
+VFIO based accelerators
+-----------------------
+
+The VFIO based accelerators feature is a protocol extension. It is supported
+when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
+
+The vhost-user backend will set the accelerator context via slave channel,
+and QEMU just needs to handle those messages passively. The accelerator
+context will be set for each queue independently. So the page-per-vq property
+should also be enabled.
+
 Protocol features
 -----------------
 
@@ -327,6 +348,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_MTU            4
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
+#define VHOST_USER_PROTOCOL_F_VFIO           7
 
 Master message types
 --------------------
@@ -614,6 +636,41 @@ Slave message types
       This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
       has been successfully negotiated.
 
+ * VHOST_USER_SLAVE_VFIO_SET_VRING_GROUP_FD
+
+      Id: 2
+      Equivalent ioctl: N/A
+      Slave payload: u64
+      Master payload: N/A
+
+      Sets the VFIO group file descriptor which is passed as ancillary data
+      for a specified queue (queue index is carried in the u64 payload).
+      Slave sends this request to tell QEMU to add or delete a VFIO group.
+      QEMU will delete the current group if any for the specified queue when the
+      message is sent without a file descriptor. A VFIO group will be actually
+      deleted when its reference count reaches zero.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature has been successfully negotiated.
+
+ * VHOST_USER_SLAVE_VFIO_SET_VRING_NOTIFY_AREA
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets the notify area for a specified queue (queue index is carried
+      in the u64 field of the vring area description). A file descriptor is
+      passed as ancillary data (typically it's a VFIO device fd). QEMU can
+      mmap the file descriptor based on the information carried in the vring
+      area description.
+      Slave sends this request to tell QEMU to add or delete a MemoryRegion
+      for a specified queue's notify MMIO region. QEMU will delete the current
+      MemoryRegion if any for the specified queue when the message is sent
+      without a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c0ef..53d8700581 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -49,7 +49,7 @@ struct vfio_as_head vfio_address_spaces =
  * initialized, this file descriptor is only released on QEMU exit and
  * we'll re-use it should another vfio device be attached before then.
  */
-static int vfio_kvm_device_fd = -1;
+int vfio_kvm_device_fd = -1;
 #endif
 
 /*
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 3e308d0a62..22d7dd5729 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -14,6 +14,8 @@
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/vhost-user.h"
 #include "hw/virtio/virtio-net.h"
+#include "hw/virtio/virtio-pci.h"
+#include "hw/vfio/vfio.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
 #include "qemu/error-report.h"
@@ -35,6 +37,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_NET_MTU = 4,
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
     VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
+    VHOST_USER_PROTOCOL_F_VFIO = 7,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -72,6 +75,8 @@ typedef enum VhostUserRequest {
 typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
+    VHOST_USER_SLAVE_VFIO_SET_VRING_GROUP_FD = 2,
+    VHOST_USER_SLAVE_VFIO_SET_VRING_NOTIFY_AREA = 3,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -93,6 +98,12 @@ typedef struct VhostUserLog {
     uint64_t mmap_offset;
 } VhostUserLog;
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct VhostUserMsg {
     VhostUserRequest request;
 
@@ -110,6 +121,7 @@ typedef struct VhostUserMsg {
         VhostUserMemory memory;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
+        VhostUserVringArea area;
     } payload;
 } QEMU_PACKED VhostUserMsg;
 
@@ -609,6 +621,342 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
+#ifdef CONFIG_KVM
+static int vfio_group_fd_to_id(int group_fd)
+{
+    char linkname[PATH_MAX];
+    char pathname[PATH_MAX];
+    char *filename;
+    int group_id, ret;
+
+    snprintf(linkname, sizeof(linkname), "/proc/self/fd/%d", group_fd);
+
+    ret = readlink(linkname, pathname, sizeof(pathname));
+    if (ret < 0) {
+        return -1;
+    }
+
+    filename = g_path_get_basename(pathname);
+    group_id = atoi(filename);
+    g_free(filename);
+
+    return group_id;
+}
+
+static int vhost_user_kvm_add_vfio_group(struct vhost_dev *dev,
+                                         int group_id, int group_fd)
+{
+    struct vhost_user *u = dev->opaque;
+    struct vhost_user_vfio_state *vfio = &u->shared->vfio;
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_ADD,
+    };
+    bool found = false;
+    int i, ret;
+
+    for (i = 0; i < vfio->nr_group; i++) {
+        if (vfio->group[i].id == group_id) {
+            found = true;
+            break;
+        }
+    }
+
+    if (found) {
+        close(group_fd);
+        vfio->group[i].refcnt++;
+        return 0;
+    }
+
+    if (vfio->nr_group >= VIRTIO_QUEUE_MAX) {
+        return -1;
+    }
+
+    vfio->group[i].id = group_id;
+    vfio->group[i].fd = group_fd;
+    vfio->group[i].refcnt = 1;
+
+    attr.addr = (uint64_t)(uintptr_t)&vfio->group[i].fd;
+
+again:
+    /* XXX: improve this */
+    if (vfio_kvm_device_fd < 0) {
+        struct kvm_create_device cd = {
+            .type = KVM_DEV_TYPE_VFIO,
+        };
+
+        ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);
+        if (ret < 0) {
+            if (errno == EBUSY) {
+                goto again;
+            }
+            error_report("Failed to create KVM VFIO device.");
+            return -1;
+        }
+
+        vfio_kvm_device_fd = cd.fd;
+    }
+
+    ret = ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr);
+    if (ret < 0) {
+        error_report("Failed to add group %d to KVM VFIO device.",
+                     group_id);
+        return -1;
+    }
+
+    vfio->nr_group++;
+
+    return 0;
+}
+
+static int vhost_user_kvm_del_vfio_group(struct vhost_dev *dev, int group_id)
+{
+    struct vhost_user *u = dev->opaque;
+    struct vhost_user_vfio_state *vfio = &u->shared->vfio;
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_DEL,
+    };
+    bool found = false;
+    int i, ret;
+
+    kvm_irqchip_commit_routes(kvm_state);
+
+    for (i = 0; i < vfio->nr_group; i++) {
+        if (vfio->group[i].id == group_id) {
+            found = true;
+            break;
+        }
+    }
+
+    if (!found) {
+        return 0;
+    }
+
+    vfio->group[i].refcnt--;
+
+    if (vfio->group[i].refcnt == 0) {
+        attr.addr = (uint64_t)(uintptr_t)&vfio->group[i].fd;
+        ret = ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr);
+        if (ret < 0) {
+            error_report("Failed to remove group %d from KVM VFIO device.",
+                    group_id);
+            vfio->group[i].refcnt++;
+            return -1;
+        }
+
+        close(vfio->group[i].fd);
+
+        for (; i + 1 < vfio->nr_group; i++) {
+            vfio->group[i] = vfio->group[i + 1];
+        }
+        vfio->nr_group--;
+    }
+
+    return 0;
+}
+
+static int vhost_user_handle_vfio_set_vring_group_fd(struct vhost_dev *dev,
+                                                     uint64_t u64,
+                                                     int group_fd)
+{
+    struct vhost_user *u = dev->opaque;
+    struct vhost_user_vfio_state *vfio = &u->shared->vfio;
+    int qid = u64 & VHOST_USER_VRING_IDX_MASK;
+    int group_id, nvqs, ret = 0;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO)) {
+        ret = -1;
+        goto out;
+    }
+
+    if (dev->vdev == NULL) {
+        error_report("vhost_dev isn't available.");
+        ret = -1;
+        goto out;
+    }
+
+    nvqs = virtio_get_num_queues(dev->vdev);
+    if (qid >= nvqs) {
+        error_report("invalid queue index.");
+        ret = -1;
+        goto out;
+    }
+
+    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
+        group_id = vfio->group_id[qid];
+        if (group_id != -1) {
+            if (vhost_user_kvm_del_vfio_group(dev, group_id) < 0) {
+                ret = -1;
+                goto out;
+            }
+            vfio->group_id[qid] = -1;
+        }
+        goto out;
+    }
+
+    group_id = vfio_group_fd_to_id(group_fd);
+    if (group_id == -1) {
+        ret = -1;
+        goto out;
+    }
+
+    if (vfio->group_id[qid] == group_id) {
+        close(group_fd);
+        goto out;
+    }
+
+    if (vfio->group_id[qid] != -1) {
+        if (vhost_user_kvm_del_vfio_group(dev, vfio->group_id[qid]) < 0) {
+            ret = -1;
+            goto out;
+        }
+        vfio->group_id[qid] = -1;
+    }
+
+    if (vhost_user_kvm_add_vfio_group(dev, group_id, group_fd) < 0) {
+        ret = -1;
+        goto out;
+    }
+    vfio->group_id[qid] = group_id;
+
+out:
+    kvm_irqchip_commit_routes(kvm_state);
+    qemu_mutex_unlock(&vfio->lock);
+
+    if (ret != 0 && group_fd != -1) {
+        close(group_fd);
+    }
+
+    return ret;
+}
+#else
+static int vhost_user_handle_vfio_set_vring_group_fd(struct vhost_dev *dev,
+                                                     uint64_t u64,
+                                                     int group_fd)
+{
+    if (group_fd != -1) {
+        close(group_fd);
+    }
+
+    return 0;
+}
+#endif
+
+static int vhost_user_add_mapping(struct vhost_dev *dev, int qid, int fd,
+                                  uint64_t size, uint64_t offset)
+{
+    struct vhost_user *u = dev->opaque;
+    struct vhost_user_vfio_state *vfio = &u->shared->vfio;
+    MemoryRegion *sysmem = get_system_memory();
+    VirtIONetPCI *d;
+    VirtIOPCIProxy *proxy; /* XXX: handle non-PCI case */
+    uint64_t paddr;
+    void *addr;
+    char *name;
+
+    d = container_of(dev->vdev, VirtIONetPCI, vdev.parent_obj);
+    proxy = &d->parent_obj;
+
+    if ((proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ) == 0 ||
+        size != virtio_pci_queue_mem_mult(proxy)) {
+        return -1;
+    }
+
+    addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset);
+    if (addr == MAP_FAILED) {
+        error_report("Can't map notify region.");
+        return -1;
+    }
+
+    vfio->notify[qid].mmap.addr = addr;
+    vfio->notify[qid].mmap.size = size;
+
+    /* The notify_offset of each queue is queue_select */
+    paddr = proxy->modern_bar.addr + proxy->notify.offset +
+                virtio_pci_queue_mem_mult(proxy) * qid;
+
+    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, qid);
+    memory_region_init_ram_device_ptr(&vfio->notify[qid].mr,
+                                      memory_region_owner(sysmem),
+                                      name, size, addr);
+    g_free(name);
+    memory_region_add_subregion(sysmem, paddr, &vfio->notify[qid].mr);
+
+    return 0;
+}
+
+static int vhost_user_del_mapping(struct vhost_dev *dev, int qid)
+{
+    struct vhost_user *u = dev->opaque;
+    struct vhost_user_vfio_state *vfio = &u->shared->vfio;
+    MemoryRegion *sysmem = get_system_memory();
+
+    if (vfio->notify[qid].mmap.addr == NULL) {
+        return 0;
+    }
+
+    memory_region_del_subregion(sysmem, &vfio->notify[qid].mr);
+    object_unparent(OBJECT(&vfio->notify[qid].mr));
+
+    munmap(vfio->notify[qid].mmap.addr, vfio->notify[qid].mmap.size);
+    vfio->notify[qid].mmap.addr = NULL;
+    vfio->notify[qid].mmap.size = 0;
+
+    return 0;
+}
+
+static int vhost_user_handle_vfio_set_vring_notify_area(struct vhost_dev *dev,
+        VhostUserVringArea *notify_area, int fd)
+{
+    struct vhost_user *u = dev->opaque;
+    struct vhost_user_vfio_state *vfio = &u->shared->vfio;
+    int qid = notify_area->u64 & VHOST_USER_VRING_IDX_MASK;
+    int nvqs, ret = 0;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO)) {
+        ret = -1;
+        goto out;
+    }
+
+    if (dev->vdev == NULL) {
+        error_report("vhost_dev isn't available.");
+        ret = -1;
+        goto out;
+    }
+
+    nvqs = virtio_get_num_queues(dev->vdev);
+    if (qid >= nvqs) {
+        error_report("invalid queue index.");
+        ret = -1;
+        goto out;
+    }
+
+    if (vfio->notify[qid].mmap.addr != NULL) {
+        vhost_user_del_mapping(dev, qid);
+    }
+
+    if (notify_area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    ret = vhost_user_add_mapping(dev, qid, fd, notify_area->size,
+                                 notify_area->offset);
+
+out:
+    if (fd != -1) {
+        close(fd);
+    }
+    qemu_mutex_unlock(&vfio->lock);
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -670,6 +1018,14 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_IOTLB_MSG:
         ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
         break;
+    case VHOST_USER_SLAVE_VFIO_SET_VRING_GROUP_FD:
+        ret = vhost_user_handle_vfio_set_vring_group_fd(dev,
+                    msg.payload.u64, fd);
+        break;
+    case VHOST_USER_SLAVE_VFIO_SET_VRING_NOTIFY_AREA:
+        ret = vhost_user_handle_vfio_set_vring_notify_area(dev,
+                    &msg.payload.area, fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -763,7 +1119,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features, protocol_features;
     struct vhost_user *u;
-    int err;
+    int i, err;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
@@ -772,6 +1128,13 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     u->slave_fd = -1;
     dev->opaque = u;
 
+    if (dev->vq_index == 0) {
+        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+            u->shared->vfio.group_id[i] = -1;
+        }
+        qemu_mutex_init(&u->shared->vfio.lock);
+    }
+
     err = vhost_user_get_features(dev, &features);
     if (err < 0) {
         return err;
@@ -832,6 +1195,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
+    int i;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
@@ -841,6 +1205,21 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
         close(u->slave_fd);
         u->slave_fd = -1;
     }
+
+    if (dev->vq_index == 0) {
+        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+            vhost_user_del_mapping(dev, i);
+        }
+
+#ifdef CONFIG_KVM
+        while (u->shared->vfio.nr_group > 0) {
+            int group_id;
+            group_id = u->shared->vfio.group[0].id;
+            vhost_user_kvm_del_vfio_group(dev, group_id);
+        }
+#endif
+    }
+
     g_free(u);
     dev->opaque = 0;
 
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e4290ce93d..a001a0936a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -612,7 +612,8 @@ static void vhost_set_memory(MemoryListener *listener,
 static bool vhost_section(MemoryRegionSection *section)
 {
     return memory_region_is_ram(section->mr) &&
-        !memory_region_is_rom(section->mr);
+        !memory_region_is_rom(section->mr) &&
+        !memory_region_is_ram_device(section->mr);
 }
 
 static void vhost_begin(MemoryListener *listener)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index e92837c42b..c28fed8676 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -219,14 +219,6 @@ static bool virtio_pci_ioeventfd_enabled(DeviceState *d)
     return (proxy->flags & VIRTIO_PCI_FLAG_USE_IOEVENTFD) != 0;
 }
 
-#define QEMU_VIRTIO_PCI_QUEUE_MEM_MULT 0x1000
-
-static inline int virtio_pci_queue_mem_mult(struct VirtIOPCIProxy *proxy)
-{
-    return (proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ) ?
-        QEMU_VIRTIO_PCI_QUEUE_MEM_MULT : 4;
-}
-
 static int virtio_pci_ioeventfd_assign(DeviceState *d, EventNotifier *notifier,
                                        int n, bool assign)
 {
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 12d3a90686..f2a613569b 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -209,6 +209,14 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
     proxy->disable_modern = true;
 }
 
+#define QEMU_VIRTIO_PCI_QUEUE_MEM_MULT 0x1000
+
+static inline int virtio_pci_queue_mem_mult(struct VirtIOPCIProxy *proxy)
+{
+    return (proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ) ?
+        QEMU_VIRTIO_PCI_QUEUE_MEM_MULT : 4;
+}
+
 /*
  * virtio-scsi-pci: This extends VirtioPCIProxy.
  */
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 86248f5436..7425fcd90c 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -1,6 +1,8 @@
 #ifndef HW_VFIO_H
 #define HW_VFIO_H
 
+extern int vfio_kvm_device_fd;
+
 bool vfio_eeh_as_ok(AddressSpace *as);
 int vfio_eeh_as_op(AddressSpace *as, uint32_t op);
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 10d698abe2..cc998f4f43 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,35 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+
+struct vhost_user_vfio_state {
+    /* The group ID associated with each queue */
+    int group_id[VIRTIO_QUEUE_MAX];
+
+    /* The notify context of each queue */
+    struct {
+        struct {
+            uint64_t size;
+            void *addr;
+        } mmap;
+        MemoryRegion mr;
+    } notify[VIRTIO_QUEUE_MAX];
+
+    /* The vfio groups associated with this vhost user */
+    struct {
+        int fd;
+        int id;
+        int refcnt;
+    } group[VIRTIO_QUEUE_MAX];
+    int nr_group;
+
+    QemuMutex lock;
+};
 
 typedef struct VhostUser {
     CharBackend chr;
+    struct vhost_user_vfio_state vfio;
 } VhostUser;
 
 #endif
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2017-12-22  6:41 [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Tiwei Bie
                   ` (2 preceding siblings ...)
  2017-12-22  6:41 ` [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support Tiwei Bie
@ 2018-01-02  2:42 ` Alexey Kardashevskiy
  2018-01-02  5:49   ` Liang, Cunming
  2018-01-03 14:34 ` [Qemu-devel] [virtio-dev] " Jason Wang
  4 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2018-01-02  2:42 UTC (permalink / raw)
  To: Tiwei Bie, virtio-dev, qemu-devel, mst, alex.williamson, pbonzini,
	stefanha
  Cc: jianfeng.tan, cunming.liang, xiao.w.wang, zhihong.wang, dan.daly

On 22/12/17 17:41, Tiwei Bie wrote:
> This RFC patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get the
> similar performance of VFIO passthru while keeping the virtio device
> emulation in QEMU.
> 
> When we have virtio ring compatible devices, it's possible to setup
> the device (DMA mapping, PCI config, etc) based on the existing info
> (memory-table, features, vring info, etc) which is available on the
> vhost-backend (e.g. DPDK vhost library). Then, we will be able to
> use such devices to accelerate the emulated device for the VM. And
> we call it vDPA: vhost DataPath Acceleration. The key difference
> between VFIO passthru and vDPA is that, in vDPA only the data path
> (e.g. ring, notify and queue interrupt) is pass-throughed, the device
> control path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device VFIO passthru include (but not limit to):
> 
> - consistent device interface from guest OS;
> - max flexibility on control path and hardware design;
> - leveraging the existing virtio live-migration framework;
> 
> But the critical issue in vDPA is that the data path performance is
> relatively low and some host threads are needed for the data path,
> because some necessary mechanisms are missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch set does some small extensions to vhost-user protocol
> to make both of them possible. It leverages the same mechanisms (e.g.
> EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
> achieve the data path pass through.
> 
> A new protocol feature bit is added to negotiate the accelerator feature
> support. Two new slave message types are added to enable the notify and
> interrupt passthru for each queue. From the view of vhost-user protocol
> design, it's very flexible. The passthru can be enabled/disabled for
> each queue individually, and it's possible to accelerate each queue by
> different devices. More design and implementation details can be found
> from the last patch.
> 
> There are some rough edges in this patch set (so this is a RFC patch
> set for now), but it's never too early to hear the thoughts from the
> community! So any comments and suggestions would be really appreciated!

I am missing a lot of context here. Out of curiosity - how is this all
supposed to work? QEMU command line example would be useful, what will the
guest see? A virtio device (i.e. Redhat vendor ID) or an actual PCI device
(since VFIO is mentioned)? Thanks.



-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-02  2:42 ` [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Alexey Kardashevskiy
@ 2018-01-02  5:49   ` Liang, Cunming
  2018-01-02  6:01     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Liang, Cunming @ 2018-01-02  5:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Bie, Tiwei, virtio-dev@lists.oasis-open.org,
	qemu-devel@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	pbonzini@redhat.com, stefanha@redhat.com
  Cc: Tan, Jianfeng, Wang, Xiao W, Wang, Zhihong, Daly, Dan



> -----Original Message-----
> From: Alexey Kardashevskiy [mailto:aik@ozlabs.ru]
> Sent: Tuesday, January 2, 2018 10:42 AM
> To: Bie, Tiwei <tiwei.bie@intel.com>; virtio-dev@lists.oasis-open.org; qemu-
> devel@nongnu.org; mst@redhat.com; alex.williamson@redhat.com;
> pbonzini@redhat.com; stefanha@redhat.com
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>; Wang,
> Zhihong <zhihong.wang@intel.com>; Daly, Dan <dan.daly@intel.com>
> Subject: Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based
> accelerators
> 
> On 22/12/17 17:41, Tiwei Bie wrote:
> > This RFC patch set does some small extensions to vhost-user protocol
> > to support VFIO based accelerators, and makes it possible to get the
> > similar performance of VFIO passthru while keeping the virtio device
> > emulation in QEMU.
> >
> > When we have virtio ring compatible devices, it's possible to setup
> > the device (DMA mapping, PCI config, etc) based on the existing info
> > (memory-table, features, vring info, etc) which is available on the
> > vhost-backend (e.g. DPDK vhost library). Then, we will be able to use
> > such devices to accelerate the emulated device for the VM. And we call
> > it vDPA: vhost DataPath Acceleration. The key difference between VFIO
> > passthru and vDPA is that, in vDPA only the data path (e.g. ring,
> > notify and queue interrupt) is pass-throughed, the device control path
> > (e.g. PCI configuration space and MMIO regions) is still defined and
> > emulated by QEMU.
> >
> > The benefits of keeping virtio device emulation in QEMU compared with
> > virtio device VFIO passthru include (but not limit to):
> >
> > - consistent device interface from guest OS;
> > - max flexibility on control path and hardware design;
> > - leveraging the existing virtio live-migration framework;
> >
> > But the critical issue in vDPA is that the data path performance is
> > relatively low and some host threads are needed for the data path,
> > because some necessary mechanisms are missing to support:
> >
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> >
> > So this patch set does some small extensions to vhost-user protocol to
> > make both of them possible. It leverages the same mechanisms (e.g.
> > EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
> > achieve the data path pass through.
> >
> > A new protocol feature bit is added to negotiate the accelerator
> > feature support. Two new slave message types are added to enable the
> > notify and interrupt passthru for each queue. From the view of
> > vhost-user protocol design, it's very flexible. The passthru can be
> > enabled/disabled for each queue individually, and it's possible to
> > accelerate each queue by different devices. More design and
> > implementation details can be found from the last patch.
> >
> > There are some rough edges in this patch set (so this is a RFC patch
> > set for now), but it's never too early to hear the thoughts from the
> > community! So any comments and suggestions would be really appreciated!
> 
> I am missing a lot of context here. Out of curiosity - how is this all supposed to
> work? QEMU command line example would be useful, what will the guest see? A
> virtio device (i.e. Redhat vendor ID) or an actual PCI device (since VFIO is
> mentioned)? Thanks.

It's a normal virtio PCIe devices in the guest. Extensions on the host are transparent to the guest.

In terms of the usage, there's a sample may help.
http://dpdk.org/ml/archives/dev/2017-December/085044.html
The sample takes virtio-net device in VM as data path accelerator of virtio-net in nested VM.
When taking physical device on bare metal, it accelerates virtio-net in VM equivalently.
There's no additional params of QEMU command line needed for vhost-user.

One more context, including vDPA enabling in DPDK vhost-user library.
http://dpdk.org/ml/archives/dev/2017-December/084792.html

> 
> 
> 
> --
> Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-02  5:49   ` Liang, Cunming
@ 2018-01-02  6:01     ` Alexey Kardashevskiy
  2018-01-02  6:48       ` Liang, Cunming
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2018-01-02  6:01 UTC (permalink / raw)
  To: Liang, Cunming, Bie, Tiwei, virtio-dev@lists.oasis-open.org,
	qemu-devel@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	pbonzini@redhat.com, stefanha@redhat.com
  Cc: Tan, Jianfeng, Wang, Xiao W, Wang, Zhihong, Daly, Dan

On 02/01/18 16:49, Liang, Cunming wrote:
> 
> 
>> -----Original Message-----
>> From: Alexey Kardashevskiy [mailto:aik@ozlabs.ru]
>> Sent: Tuesday, January 2, 2018 10:42 AM
>> To: Bie, Tiwei <tiwei.bie@intel.com>; virtio-dev@lists.oasis-open.org; qemu-
>> devel@nongnu.org; mst@redhat.com; alex.williamson@redhat.com;
>> pbonzini@redhat.com; stefanha@redhat.com
>> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Liang, Cunming
>> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>; Wang,
>> Zhihong <zhihong.wang@intel.com>; Daly, Dan <dan.daly@intel.com>
>> Subject: Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based
>> accelerators
>>
>> On 22/12/17 17:41, Tiwei Bie wrote:
>>> This RFC patch set does some small extensions to vhost-user protocol
>>> to support VFIO based accelerators, and makes it possible to get the
>>> similar performance of VFIO passthru while keeping the virtio device
>>> emulation in QEMU.
>>>
>>> When we have virtio ring compatible devices, it's possible to setup
>>> the device (DMA mapping, PCI config, etc) based on the existing info
>>> (memory-table, features, vring info, etc) which is available on the
>>> vhost-backend (e.g. DPDK vhost library). Then, we will be able to use
>>> such devices to accelerate the emulated device for the VM. And we call
>>> it vDPA: vhost DataPath Acceleration. The key difference between VFIO
>>> passthru and vDPA is that, in vDPA only the data path (e.g. ring,
>>> notify and queue interrupt) is pass-throughed, the device control path
>>> (e.g. PCI configuration space and MMIO regions) is still defined and
>>> emulated by QEMU.
>>>
>>> The benefits of keeping virtio device emulation in QEMU compared with
>>> virtio device VFIO passthru include (but not limit to):
>>>
>>> - consistent device interface from guest OS;
>>> - max flexibility on control path and hardware design;
>>> - leveraging the existing virtio live-migration framework;
>>>
>>> But the critical issue in vDPA is that the data path performance is
>>> relatively low and some host threads are needed for the data path,
>>> because some necessary mechanisms are missing to support:
>>>
>>> 1) guest driver notifies the device directly;
>>> 2) device interrupts the guest directly;
>>>
>>> So this patch set does some small extensions to vhost-user protocol to
>>> make both of them possible. It leverages the same mechanisms (e.g.
>>> EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
>>> achieve the data path pass through.
>>>
>>> A new protocol feature bit is added to negotiate the accelerator
>>> feature support. Two new slave message types are added to enable the
>>> notify and interrupt passthru for each queue. From the view of
>>> vhost-user protocol design, it's very flexible. The passthru can be
>>> enabled/disabled for each queue individually, and it's possible to
>>> accelerate each queue by different devices. More design and
>>> implementation details can be found from the last patch.
>>>
>>> There are some rough edges in this patch set (so this is a RFC patch
>>> set for now), but it's never too early to hear the thoughts from the
>>> community! So any comments and suggestions would be really appreciated!
>>
>> I am missing a lot of context here. Out of curiosity - how is this all supposed to
>> work? QEMU command line example would be useful, what will the guest see? A
>> virtio device (i.e. Redhat vendor ID) or an actual PCI device (since VFIO is
>> mentioned)? Thanks.
> 
> It's a normal virtio PCIe devices in the guest. Extensions on the host are transparent to the guest.
> 
> In terms of the usage, there's a sample may help.
> http://dpdk.org/ml/archives/dev/2017-December/085044.html
> The sample takes virtio-net device in VM as data path accelerator of virtio-net in nested VM.


Aaah, this is for nested VMs, the original description was not clear about
this. I get it now, thanks.


> When taking physical device on bare metal, it accelerates virtio-net in VM equivalently.
> There's no additional params of QEMU command line needed for vhost-user.
> 
> One more context, including vDPA enabling in DPDK vhost-user library.
> http://dpdk.org/ml/archives/dev/2017-December/084792.html



-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-02  6:01     ` Alexey Kardashevskiy
@ 2018-01-02  6:48       ` Liang, Cunming
  0 siblings, 0 replies; 19+ messages in thread
From: Liang, Cunming @ 2018-01-02  6:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Bie, Tiwei, virtio-dev@lists.oasis-open.org,
	qemu-devel@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	pbonzini@redhat.com, stefanha@redhat.com
  Cc: Tan, Jianfeng, Wang, Xiao W, Wang, Zhihong, Daly, Dan



> -----Original Message-----
> From: Alexey Kardashevskiy [mailto:aik@ozlabs.ru]
> Sent: Tuesday, January 2, 2018 2:01 PM
> To: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> virtio-dev@lists.oasis-open.org; qemu-devel@nongnu.org; mst@redhat.com;
> alex.williamson@redhat.com; pbonzini@redhat.com; stefanha@redhat.com
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Xiao W
> <xiao.w.wang@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Daly,
> Dan <dan.daly@intel.com>
> Subject: Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based
> accelerators
> 
> On 02/01/18 16:49, Liang, Cunming wrote:
> >
> >
> >> -----Original Message-----
> >> From: Alexey Kardashevskiy [mailto:aik@ozlabs.ru]
> >> Sent: Tuesday, January 2, 2018 10:42 AM
> >> To: Bie, Tiwei <tiwei.bie@intel.com>;
> >> virtio-dev@lists.oasis-open.org; qemu- devel@nongnu.org;
> >> mst@redhat.com; alex.williamson@redhat.com; pbonzini@redhat.com;
> >> stefanha@redhat.com
> >> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Liang, Cunming
> >> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>;
> >> Wang, Zhihong <zhihong.wang@intel.com>; Daly, Dan
> >> <dan.daly@intel.com>
> >> Subject: Re: [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO
> >> based accelerators
> >>
> >> On 22/12/17 17:41, Tiwei Bie wrote:
> >>> This RFC patch set does some small extensions to vhost-user protocol
> >>> to support VFIO based accelerators, and makes it possible to get the
> >>> similar performance of VFIO passthru while keeping the virtio device
> >>> emulation in QEMU.
> >>>
> >>> When we have virtio ring compatible devices, it's possible to setup
> >>> the device (DMA mapping, PCI config, etc) based on the existing info
> >>> (memory-table, features, vring info, etc) which is available on the
> >>> vhost-backend (e.g. DPDK vhost library). Then, we will be able to
> >>> use such devices to accelerate the emulated device for the VM. And
> >>> we call it vDPA: vhost DataPath Acceleration. The key difference
> >>> between VFIO passthru and vDPA is that, in vDPA only the data path
> >>> (e.g. ring, notify and queue interrupt) is pass-throughed, the
> >>> device control path (e.g. PCI configuration space and MMIO regions)
> >>> is still defined and emulated by QEMU.
> >>>
> >>> The benefits of keeping virtio device emulation in QEMU compared
> >>> with virtio device VFIO passthru include (but not limit to):
> >>>
> >>> - consistent device interface from guest OS;
> >>> - max flexibility on control path and hardware design;
> >>> - leveraging the existing virtio live-migration framework;
> >>>
> >>> But the critical issue in vDPA is that the data path performance is
> >>> relatively low and some host threads are needed for the data path,
> >>> because some necessary mechanisms are missing to support:
> >>>
> >>> 1) guest driver notifies the device directly;
> >>> 2) device interrupts the guest directly;
> >>>
> >>> So this patch set does some small extensions to vhost-user protocol
> >>> to make both of them possible. It leverages the same mechanisms (e.g.
> >>> EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
> >>> achieve the data path pass through.
> >>>
> >>> A new protocol feature bit is added to negotiate the accelerator
> >>> feature support. Two new slave message types are added to enable the
> >>> notify and interrupt passthru for each queue. From the view of
> >>> vhost-user protocol design, it's very flexible. The passthru can be
> >>> enabled/disabled for each queue individually, and it's possible to
> >>> accelerate each queue by different devices. More design and
> >>> implementation details can be found from the last patch.
> >>>
> >>> There are some rough edges in this patch set (so this is a RFC patch
> >>> set for now), but it's never too early to hear the thoughts from the
> >>> community! So any comments and suggestions would be really appreciated!
> >>
> >> I am missing a lot of context here. Out of curiosity - how is this
> >> all supposed to work? QEMU command line example would be useful, what
> >> will the guest see? A virtio device (i.e. Redhat vendor ID) or an
> >> actual PCI device (since VFIO is mentioned)? Thanks.
> >
> > It's a normal virtio PCIe devices in the guest. Extensions on the host are
> transparent to the guest.
> >
> > In terms of the usage, there's a sample may help.
> > http://dpdk.org/ml/archives/dev/2017-December/085044.html
> > The sample takes virtio-net device in VM as data path accelerator of virtio-net
> in nested VM.
> 
> 
> Aaah, this is for nested VMs, the original description was not clear about this. I
> get it now, thanks.

BTW, the patch is not only used for nested VM, even the sample is.
Once you get a virtio compatible device, it's helpful to normal VM too.
Basically, it gives extra ability of para-virtualized device to associate with an accelerator who can talk with the guest PV device driver directly.

> 
> 
> > When taking physical device on bare metal, it accelerates virtio-net in VM
> equivalently.
> > There's no additional params of QEMU command line needed for vhost-user.
> >
> > One more context, including vDPA enabling in DPDK vhost-user library.
> > http://dpdk.org/ml/archives/dev/2017-December/084792.html
> 
> 
> 
> --
> Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2017-12-22  6:41 [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Tiwei Bie
                   ` (3 preceding siblings ...)
  2018-01-02  2:42 ` [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Alexey Kardashevskiy
@ 2018-01-03 14:34 ` Jason Wang
  2018-01-04  6:18   ` Tiwei Bie
  4 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2018-01-03 14:34 UTC (permalink / raw)
  To: Tiwei Bie, virtio-dev, qemu-devel, mst, alex.williamson, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang



On 2017年12月22日 14:41, Tiwei Bie wrote:
> This RFC patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get the
> similar performance of VFIO passthru while keeping the virtio device
> emulation in QEMU.
>
> When we have virtio ring compatible devices, it's possible to setup
> the device (DMA mapping, PCI config, etc) based on the existing info
> (memory-table, features, vring info, etc) which is available on the
> vhost-backend (e.g. DPDK vhost library). Then, we will be able to
> use such devices to accelerate the emulated device for the VM. And
> we call it vDPA: vhost DataPath Acceleration. The key difference
> between VFIO passthru and vDPA is that, in vDPA only the data path
> (e.g. ring, notify and queue interrupt) is pass-throughed, the device
> control path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
>
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device VFIO passthru include (but not limit to):
>
> - consistent device interface from guest OS;
> - max flexibility on control path and hardware design;
> - leveraging the existing virtio live-migration framework;
>
> But the critical issue in vDPA is that the data path performance is
> relatively low and some host threads are needed for the data path,
> because some necessary mechanisms are missing to support:
>
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
>
> So this patch set does some small extensions to vhost-user protocol
> to make both of them possible. It leverages the same mechanisms (e.g.
> EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
> achieve the data path pass through.
>
> A new protocol feature bit is added to negotiate the accelerator feature
> support. Two new slave message types are added to enable the notify and
> interrupt passthru for each queue. From the view of vhost-user protocol
> design, it's very flexible. The passthru can be enabled/disabled for
> each queue individually, and it's possible to accelerate each queue by
> different devices. More design and implementation details can be found
> from the last patch.
>
> There are some rough edges in this patch set (so this is a RFC patch
> set for now), but it's never too early to hear the thoughts from the
> community! So any comments and suggestions would be really appreciated!
>
> Tiwei Bie (3):
>    vhost-user: support receiving file descriptors in slave_read
>    vhost-user: introduce shared vhost-user state
>    vhost-user: add VFIO based accelerators support
>
>   docs/interop/vhost-user.txt     |  57 ++++++
>   hw/scsi/vhost-user-scsi.c       |   6 +-
>   hw/vfio/common.c                |   2 +-
>   hw/virtio/vhost-user.c          | 430 +++++++++++++++++++++++++++++++++++++++-
>   hw/virtio/vhost.c               |   3 +-
>   hw/virtio/virtio-pci.c          |   8 -
>   hw/virtio/virtio-pci.h          |   8 +
>   include/hw/vfio/vfio.h          |   2 +
>   include/hw/virtio/vhost-user.h  |  43 ++++
>   include/hw/virtio/virtio-scsi.h |   6 +-
>   net/vhost-user.c                |  30 +--
>   11 files changed, 561 insertions(+), 34 deletions(-)
>   create mode 100644 include/hw/virtio/vhost-user.h
>

I may miss something, but may I ask why you must implement them through 
vhost-use/dpdk. It looks to me you could put all of them in qemu which 
could simplify a lots of things (just like userspace NVME driver wrote 
by Fam).

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-03 14:34 ` [Qemu-devel] [virtio-dev] " Jason Wang
@ 2018-01-04  6:18   ` Tiwei Bie
  2018-01-04  7:21     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Tiwei Bie @ 2018-01-04  6:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, qemu-devel, mst, alex.williamson, pbonzini, stefanha,
	cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On Wed, Jan 03, 2018 at 10:34:36PM +0800, Jason Wang wrote:
> On 2017年12月22日 14:41, Tiwei Bie wrote:
> > This RFC patch set does some small extensions to vhost-user protocol
> > to support VFIO based accelerators, and makes it possible to get the
> > similar performance of VFIO passthru while keeping the virtio device
> > emulation in QEMU.
> > 
> > When we have virtio ring compatible devices, it's possible to setup
> > the device (DMA mapping, PCI config, etc) based on the existing info
> > (memory-table, features, vring info, etc) which is available on the
> > vhost-backend (e.g. DPDK vhost library). Then, we will be able to
> > use such devices to accelerate the emulated device for the VM. And
> > we call it vDPA: vhost DataPath Acceleration. The key difference
> > between VFIO passthru and vDPA is that, in vDPA only the data path
> > (e.g. ring, notify and queue interrupt) is pass-throughed, the device
> > control path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device VFIO passthru include (but not limit to):
> > 
> > - consistent device interface from guest OS;
> > - max flexibility on control path and hardware design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > But the critical issue in vDPA is that the data path performance is
> > relatively low and some host threads are needed for the data path,
> > because some necessary mechanisms are missing to support:
> > 
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> > 
> > So this patch set does some small extensions to vhost-user protocol
> > to make both of them possible. It leverages the same mechanisms (e.g.
> > EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
> > achieve the data path pass through.
> > 
> > A new protocol feature bit is added to negotiate the accelerator feature
> > support. Two new slave message types are added to enable the notify and
> > interrupt passthru for each queue. From the view of vhost-user protocol
> > design, it's very flexible. The passthru can be enabled/disabled for
> > each queue individually, and it's possible to accelerate each queue by
> > different devices. More design and implementation details can be found
> > from the last patch.
> > 
> > There are some rough edges in this patch set (so this is a RFC patch
> > set for now), but it's never too early to hear the thoughts from the
> > community! So any comments and suggestions would be really appreciated!
> > 
> > Tiwei Bie (3):
> >    vhost-user: support receiving file descriptors in slave_read
> >    vhost-user: introduce shared vhost-user state
> >    vhost-user: add VFIO based accelerators support
> > 
> >   docs/interop/vhost-user.txt     |  57 ++++++
> >   hw/scsi/vhost-user-scsi.c       |   6 +-
> >   hw/vfio/common.c                |   2 +-
> >   hw/virtio/vhost-user.c          | 430 +++++++++++++++++++++++++++++++++++++++-
> >   hw/virtio/vhost.c               |   3 +-
> >   hw/virtio/virtio-pci.c          |   8 -
> >   hw/virtio/virtio-pci.h          |   8 +
> >   include/hw/vfio/vfio.h          |   2 +
> >   include/hw/virtio/vhost-user.h  |  43 ++++
> >   include/hw/virtio/virtio-scsi.h |   6 +-
> >   net/vhost-user.c                |  30 +--
> >   11 files changed, 561 insertions(+), 34 deletions(-)
> >   create mode 100644 include/hw/virtio/vhost-user.h
> > 
> 
> I may miss something, but may I ask why you must implement them through
> vhost-use/dpdk. It looks to me you could put all of them in qemu which could
> simplify a lots of things (just like userspace NVME driver wrote by Fam).
> 

Thanks for your comments! :-)

Yeah, you're right. We can also implement everything in QEMU
like the userspace NVME driver by Fam. It was also described
by Cunming on the KVM Forum 2017. Below is the link to the
slides:

https://events.static.linuxfound.org/sites/events/files/slides/KVM17%27-vDPA.pdf

We're also working on it (including defining a standard device
for vhost data path acceleration based on mdev to hide vendor
specific details).

And IMO it's also not a bad idea to extend vhost-user protocol
to support the accelerators if possible. And it could be more
flexible because it could support (for example) below things
easily without introducing any complex command line options or
monitor commands to QEMU:

- the switching among different accelerators and software version
  can be done at runtime in vhost process;
- use different accelerators to accelerate different queue pairs
  or just accelerate some (instead of all) queue pairs;

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-04  6:18   ` Tiwei Bie
@ 2018-01-04  7:21     ` Jason Wang
  2018-01-05  6:58       ` Liang, Cunming
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2018-01-04  7:21 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jianfeng.tan, virtio-dev, mst, cunming.liang, qemu-devel,
	alex.williamson, xiao.w.wang, stefanha, zhihong.wang, pbonzini,
	dan.daly



On 2018年01月04日 14:18, Tiwei Bie wrote:
> On Wed, Jan 03, 2018 at 10:34:36PM +0800, Jason Wang wrote:
>> On 2017年12月22日 14:41, Tiwei Bie wrote:
>>> This RFC patch set does some small extensions to vhost-user protocol
>>> to support VFIO based accelerators, and makes it possible to get the
>>> similar performance of VFIO passthru while keeping the virtio device
>>> emulation in QEMU.
>>>
>>> When we have virtio ring compatible devices, it's possible to setup
>>> the device (DMA mapping, PCI config, etc) based on the existing info
>>> (memory-table, features, vring info, etc) which is available on the
>>> vhost-backend (e.g. DPDK vhost library). Then, we will be able to
>>> use such devices to accelerate the emulated device for the VM. And
>>> we call it vDPA: vhost DataPath Acceleration. The key difference
>>> between VFIO passthru and vDPA is that, in vDPA only the data path
>>> (e.g. ring, notify and queue interrupt) is pass-throughed, the device
>>> control path (e.g. PCI configuration space and MMIO regions) is still
>>> defined and emulated by QEMU.
>>>
>>> The benefits of keeping virtio device emulation in QEMU compared
>>> with virtio device VFIO passthru include (but not limit to):
>>>
>>> - consistent device interface from guest OS;
>>> - max flexibility on control path and hardware design;
>>> - leveraging the existing virtio live-migration framework;
>>>
>>> But the critical issue in vDPA is that the data path performance is
>>> relatively low and some host threads are needed for the data path,
>>> because some necessary mechanisms are missing to support:
>>>
>>> 1) guest driver notifies the device directly;
>>> 2) device interrupts the guest directly;
>>>
>>> So this patch set does some small extensions to vhost-user protocol
>>> to make both of them possible. It leverages the same mechanisms (e.g.
>>> EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
>>> achieve the data path pass through.
>>>
>>> A new protocol feature bit is added to negotiate the accelerator feature
>>> support. Two new slave message types are added to enable the notify and
>>> interrupt passthru for each queue. From the view of vhost-user protocol
>>> design, it's very flexible. The passthru can be enabled/disabled for
>>> each queue individually, and it's possible to accelerate each queue by
>>> different devices. More design and implementation details can be found
>>> from the last patch.
>>>
>>> There are some rough edges in this patch set (so this is a RFC patch
>>> set for now), but it's never too early to hear the thoughts from the
>>> community! So any comments and suggestions would be really appreciated!
>>>
>>> Tiwei Bie (3):
>>>     vhost-user: support receiving file descriptors in slave_read
>>>     vhost-user: introduce shared vhost-user state
>>>     vhost-user: add VFIO based accelerators support
>>>
>>>    docs/interop/vhost-user.txt     |  57 ++++++
>>>    hw/scsi/vhost-user-scsi.c       |   6 +-
>>>    hw/vfio/common.c                |   2 +-
>>>    hw/virtio/vhost-user.c          | 430 +++++++++++++++++++++++++++++++++++++++-
>>>    hw/virtio/vhost.c               |   3 +-
>>>    hw/virtio/virtio-pci.c          |   8 -
>>>    hw/virtio/virtio-pci.h          |   8 +
>>>    include/hw/vfio/vfio.h          |   2 +
>>>    include/hw/virtio/vhost-user.h  |  43 ++++
>>>    include/hw/virtio/virtio-scsi.h |   6 +-
>>>    net/vhost-user.c                |  30 +--
>>>    11 files changed, 561 insertions(+), 34 deletions(-)
>>>    create mode 100644 include/hw/virtio/vhost-user.h
>>>
>> I may miss something, but may I ask why you must implement them through
>> vhost-use/dpdk. It looks to me you could put all of them in qemu which could
>> simplify a lots of things (just like userspace NVME driver wrote by Fam).
>>
> Thanks for your comments! :-)
>
> Yeah, you're right. We can also implement everything in QEMU
> like the userspace NVME driver by Fam. It was also described
> by Cunming on the KVM Forum 2017. Below is the link to the
> slides:
>
> https://events.static.linuxfound.org/sites/events/files/slides/KVM17%27-vDPA.pdf

Thanks for the pointer. Looks rather interesting.

>
> We're also working on it (including defining a standard device
> for vhost data path acceleration based on mdev to hide vendor
> specific details).

This is exactly what I mean. Form my point of view, there's no need for 
any extension for vhost protocol, we just need to reuse qemu iothread to 
implement a userspace vhost dataplane and do the mdev inside that thread.

>
> And IMO it's also not a bad idea to extend vhost-user protocol
> to support the accelerators if possible. And it could be more
> flexible because it could support (for example) below things
> easily without introducing any complex command line options or
> monitor commands to QEMU:

Maybe I was wrong but I don't think we care about the complexity of 
command line or monitor command in this case.

>
> - the switching among different accelerators and software version
>    can be done at runtime in vhost process;
> - use different accelerators to accelerate different queue pairs
>    or just accelerate some (instead of all) queue pairs;

Well, technically, if we want, these could be implemented in qemu too.

And here's some more advantages if you implement it in qemu:

1) Avoid extra dependency like dpdk
2) More flexible, mdev could even choose to not use VFIO or not depend 
on vDPA
3) More efficient guest IOMMU integration especially for dynamic 
mappings (device IOTLB transactions could be done by function calls 
instead of slow UDP messages)
4) Zerocopy (for non intel vDPA) is more easier to be implemented
5) Compare to vhost-user, tightly coupled with device emulation can 
simplify lots of things (an example is programmable flow director/RSS 
implementation). And any future enhancement to virtio does not need to 
introduce new type of vhost-user messages.

I don't object vhost-user/dpdk method but I second for implementing all 
the stuffs in qemu.

Thanks

>
> Best regards,
> Tiwei Bie
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-04  7:21     ` Jason Wang
@ 2018-01-05  6:58       ` Liang, Cunming
  2018-01-05  8:38         ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Liang, Cunming @ 2018-01-05  6:58 UTC (permalink / raw)
  To: Jason Wang, Bie, Tiwei
  Cc: Tan, Jianfeng, virtio-dev@lists.oasis-open.org, mst@redhat.com,
	qemu-devel@nongnu.org, alex.williamson@redhat.com, Wang, Xiao W,
	stefanha@redhat.com, Wang, Zhihong, pbonzini@redhat.com,
	Daly, Dan



> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Thursday, January 4, 2018 3:22 PM
> To: Bie, Tiwei <tiwei.bie@intel.com>
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; virtio-dev@lists.oasis-open.org;
> mst@redhat.com; Liang, Cunming <cunming.liang@intel.com>; qemu-
> devel@nongnu.org; alex.williamson@redhat.com; Wang, Xiao W
> <xiao.w.wang@intel.com>; stefanha@redhat.com; Wang, Zhihong
> <zhihong.wang@intel.com>; pbonzini@redhat.com; Daly, Dan
> <dan.daly@intel.com>
> Subject: Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support
> VFIO based accelerators
> 
> 
> 
> On 2018年01月04日 14:18, Tiwei Bie wrote:
> > On Wed, Jan 03, 2018 at 10:34:36PM +0800, Jason Wang wrote:
> >> On 2017年12月22日 14:41, Tiwei Bie wrote:
> >>> This RFC patch set does some small extensions to vhost-user protocol
> >>> to support VFIO based accelerators, and makes it possible to get the
> >>> similar performance of VFIO passthru while keeping the virtio device
> >>> emulation in QEMU.
> >>>
> >>> When we have virtio ring compatible devices, it's possible to setup
> >>> the device (DMA mapping, PCI config, etc) based on the existing info
> >>> (memory-table, features, vring info, etc) which is available on the
> >>> vhost-backend (e.g. DPDK vhost library). Then, we will be able to
> >>> use such devices to accelerate the emulated device for the VM. And
> >>> we call it vDPA: vhost DataPath Acceleration. The key difference
> >>> between VFIO passthru and vDPA is that, in vDPA only the data path
> >>> (e.g. ring, notify and queue interrupt) is pass-throughed, the
> >>> device control path (e.g. PCI configuration space and MMIO regions)
> >>> is still defined and emulated by QEMU.
> >>>
> >>> The benefits of keeping virtio device emulation in QEMU compared
> >>> with virtio device VFIO passthru include (but not limit to):
> >>>
> >>> - consistent device interface from guest OS;
> >>> - max flexibility on control path and hardware design;
> >>> - leveraging the existing virtio live-migration framework;
> >>>
> >>> But the critical issue in vDPA is that the data path performance is
> >>> relatively low and some host threads are needed for the data path,
> >>> because some necessary mechanisms are missing to support:
> >>>
> >>> 1) guest driver notifies the device directly;
> >>> 2) device interrupts the guest directly;
> >>>
> >>> So this patch set does some small extensions to vhost-user protocol
> >>> to make both of them possible. It leverages the same mechanisms (e.g.
> >>> EPT and Posted-Interrupt on Intel platform) as the VFIO passthru to
> >>> achieve the data path pass through.
> >>>
> >>> A new protocol feature bit is added to negotiate the accelerator
> >>> feature support. Two new slave message types are added to enable the
> >>> notify and interrupt passthru for each queue. From the view of
> >>> vhost-user protocol design, it's very flexible. The passthru can be
> >>> enabled/disabled for each queue individually, and it's possible to
> >>> accelerate each queue by different devices. More design and
> >>> implementation details can be found from the last patch.
> >>>
> >>> There are some rough edges in this patch set (so this is a RFC patch
> >>> set for now), but it's never too early to hear the thoughts from the
> >>> community! So any comments and suggestions would be really
> appreciated!
> >>>
> >>> Tiwei Bie (3):
> >>>     vhost-user: support receiving file descriptors in slave_read
> >>>     vhost-user: introduce shared vhost-user state
> >>>     vhost-user: add VFIO based accelerators support
> >>>
> >>>    docs/interop/vhost-user.txt     |  57 ++++++
> >>>    hw/scsi/vhost-user-scsi.c       |   6 +-
> >>>    hw/vfio/common.c                |   2 +-
> >>>    hw/virtio/vhost-user.c          | 430
> +++++++++++++++++++++++++++++++++++++++-
> >>>    hw/virtio/vhost.c               |   3 +-
> >>>    hw/virtio/virtio-pci.c          |   8 -
> >>>    hw/virtio/virtio-pci.h          |   8 +
> >>>    include/hw/vfio/vfio.h          |   2 +
> >>>    include/hw/virtio/vhost-user.h  |  43 ++++
> >>>    include/hw/virtio/virtio-scsi.h |   6 +-
> >>>    net/vhost-user.c                |  30 +--
> >>>    11 files changed, 561 insertions(+), 34 deletions(-)
> >>>    create mode 100644 include/hw/virtio/vhost-user.h
> >>>
> >> I may miss something, but may I ask why you must implement them
> >> through vhost-use/dpdk. It looks to me you could put all of them in
> >> qemu which could simplify a lots of things (just like userspace NVME
> driver wrote by Fam).
> >>
> > Thanks for your comments! :-)
> >
> > Yeah, you're right. We can also implement everything in QEMU like the
> > userspace NVME driver by Fam. It was also described by Cunming on the
> > KVM Forum 2017. Below is the link to the
> > slides:
> >
> > https://events.static.linuxfound.org/sites/events/files/slides/KVM17%2
> > 7-vDPA.pdf
> 
> Thanks for the pointer. Looks rather interesting.
> 
> >
> > We're also working on it (including defining a standard device for
> > vhost data path acceleration based on mdev to hide vendor specific
> > details).
> 
> This is exactly what I mean. Form my point of view, there's no need for any
> extension for vhost protocol, we just need to reuse qemu iothread to
> implement a userspace vhost dataplane and do the mdev inside that thread.
On functional perspective, it makes sense to have qemu native support of those certain usage. However, qemu doesn't have to take responsibility for dataplane. There're already huge amounts of codes for different devices emulation, leveraging external dataplane library is an effective way to introduce more. The beauty of vhost_user is to open a door for variable userland workloads(e.g. vswitch). The dataplane connected with VM usually need to be close integrated with those userland workloads, a control place interface(vhost-user) is better than a datapath interface(e.g. provided by dataplace in qemu iothread). On workloads point of view, it's not excited to be part of qemu process.
That comes up with the idea of vhost-user extension. Userland workloads decides to enable accelerators or not, qemu provides the common control plane infrastructure.

> 
> >
> > And IMO it's also not a bad idea to extend vhost-user protocol
> > to support the accelerators if possible. And it could be more
> > flexible because it could support (for example) below things
> > easily without introducing any complex command line options or
> > monitor commands to QEMU:
> 
> Maybe I was wrong but I don't think we care about the complexity of
> command line or monitor command in this case.
> 
> >
> > - the switching among different accelerators and software version
> >    can be done at runtime in vhost process;
> > - use different accelerators to accelerate different queue pairs
> >    or just accelerate some (instead of all) queue pairs;
> 
> Well, technically, if we want, these could be implemented in qemu too.
You're right if just considering I/O. The ways to consume those I/O is another perspective.
Simply 1:1 associating guest virtio-net and accelerator w/ SW datapath fallback is not the whole picture. It's variable usages on workload side to abstract the device (e.g. port re-presenter for vswitch) and etc. I don't think qemu is interested for all bunch of things there.

> 
> And here's some more advantages if you implement it in qemu:
> 
> 1) Avoid extra dependency like dpdk
> 2) More flexible, mdev could even choose to not use VFIO or not depend
> on vDPA
> 3) More efficient guest IOMMU integration especially for dynamic
> mappings (device IOTLB transactions could be done by function calls
> instead of slow UDP messages)
> 4) Zerocopy (for non intel vDPA) is more easier to be implemented
> 5) Compare to vhost-user, tightly coupled with device emulation can
> simplify lots of things (an example is programmable flow director/RSS
> implementation). And any future enhancement to virtio does not need to
> introduce new type of vhost-user messages.
> 
> I don't object vhost-user/dpdk method but I second for implementing all
> the stuffs in qemu.
> 
> Thanks
> 
> >
> > Best regards,
> > Tiwei Bie
> >


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-05  6:58       ` Liang, Cunming
@ 2018-01-05  8:38         ` Jason Wang
  2018-01-05 10:25           ` Liang, Cunming
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2018-01-05  8:38 UTC (permalink / raw)
  To: Liang, Cunming, Bie, Tiwei
  Cc: Tan, Jianfeng, virtio-dev@lists.oasis-open.org, mst@redhat.com,
	qemu-devel@nongnu.org, alex.williamson@redhat.com, Wang, Xiao W,
	stefanha@redhat.com, Wang, Zhihong, pbonzini@redhat.com,
	Daly, Dan



On 2018年01月05日 14:58, Liang, Cunming wrote:
>> Thanks for the pointer. Looks rather interesting.
>>
>>> We're also working on it (including defining a standard device for
>>> vhost data path acceleration based on mdev to hide vendor specific
>>> details).
>> This is exactly what I mean. Form my point of view, there's no need for any
>> extension for vhost protocol, we just need to reuse qemu iothread to
>> implement a userspace vhost dataplane and do the mdev inside that thread.
> On functional perspective, it makes sense to have qemu native support of those certain usage. However, qemu doesn't have to take responsibility for dataplane. There're already huge amounts of codes for different devices emulation, leveraging external dataplane library is an effective way to introduce more.

This does not mean to drop external dataplane library. Actually, you can 
link dpdk to qemu directly.

> The beauty of vhost_user is to open a door for variable userland workloads(e.g. vswitch). The dataplane connected with VM usually need to be close integrated with those userland workloads, a control place interface(vhost-user) is better than a datapath interface(e.g. provided by dataplace in qemu iothread).

Do we really need vswitch for vDPA?

>   On workloads point of view, it's not excited to be part of qemu process.

Don't see why, qemu have dataplane for virtio-blk/scsi.

> That comes up with the idea of vhost-user extension. Userland workloads decides to enable accelerators or not, qemu provides the common control plane infrastructure.

It brings extra complexity: endless new types of messages and a huge 
brunch of bugs. And what's more important, the split model tends to be 
less efficient in some cases, e.g guest IOMMU integration. I'm pretty 
sure we will meet more in the future.

>>> And IMO it's also not a bad idea to extend vhost-user protocol
>>> to support the accelerators if possible. And it could be more
>>> flexible because it could support (for example) below things
>>> easily without introducing any complex command line options or
>>> monitor commands to QEMU:
>> Maybe I was wrong but I don't think we care about the complexity of
>> command line or monitor command in this case.
>>
>>> - the switching among different accelerators and software version
>>>     can be done at runtime in vhost process;
>>> - use different accelerators to accelerate different queue pairs
>>>     or just accelerate some (instead of all) queue pairs;
>> Well, technically, if we want, these could be implemented in qemu too.
> You're right if just considering I/O. The ways to consume those I/O is another perspective.
> Simply 1:1 associating guest virtio-net and accelerator w/ SW datapath fallback is not the whole picture.

Pay attention:

1) What I mean is not a fallback here. You can still do a lot of tricks 
e.g offloading datapath to hardware or doorbell map.
2) Qemu supports (very old and inefficient) a split model of device 
emulation and network backend. This means we can switch between backends 
(though not implemented).

>   It's variable usages on workload side to abstract the device (e.g. port re-presenter for vswitch) and etc. I don't think qemu is interested for all bunch of things there.
>

Again, you can link any dataplane to qemu directly instead of using 
vhost-user if vhost-user tends to be less useful in some cases (vDPA is 
one of the case I think).

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-05  8:38         ` Jason Wang
@ 2018-01-05 10:25           ` Liang, Cunming
  2018-01-08  3:23             ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Liang, Cunming @ 2018-01-05 10:25 UTC (permalink / raw)
  To: Jason Wang, Bie, Tiwei
  Cc: Tan, Jianfeng, virtio-dev@lists.oasis-open.org, mst@redhat.com,
	qemu-devel@nongnu.org, alex.williamson@redhat.com, Wang, Xiao W,
	stefanha@redhat.com, Wang, Zhihong, pbonzini@redhat.com,
	Daly, Dan, Maxime Coquelin



> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Friday, January 5, 2018 4:39 PM
> To: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; virtio-dev@lists.oasis-open.org;
> mst@redhat.com; qemu-devel@nongnu.org; alex.williamson@redhat.com;
> Wang, Xiao W <xiao.w.wang@intel.com>; stefanha@redhat.com; Wang,
> Zhihong <zhihong.wang@intel.com>; pbonzini@redhat.com; Daly, Dan
> <dan.daly@intel.com>
> Subject: Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support
> VFIO based accelerators
> 
> 
> 
> On 2018年01月05日 14:58, Liang, Cunming wrote:
> >> Thanks for the pointer. Looks rather interesting.
> >>
> >>> We're also working on it (including defining a standard device for
> >>> vhost data path acceleration based on mdev to hide vendor specific
> >>> details).
> >> This is exactly what I mean. Form my point of view, there's no need
> >> for any extension for vhost protocol, we just need to reuse qemu
> >> iothread to implement a userspace vhost dataplane and do the mdev
> inside that thread.
> > On functional perspective, it makes sense to have qemu native support of
> those certain usage. However, qemu doesn't have to take responsibility for
> dataplane. There're already huge amounts of codes for different devices
> emulation, leveraging external dataplane library is an effective way to
> introduce more.
> 
> This does not mean to drop external dataplane library. Actually, you can link
> dpdk to qemu directly.
It's not a bad idea, then the interface comes to be new API/ABI definition of external dataplane library instead of existing vhost protocol. dpdk as a library is not a big deal to link with, customized application is.
In addition, it will ask for qemu to provide flexible process model then. Lots of application level features (e.g. hot upgrade/fix) becomes burden.
I'm open to that option, keep eyes on any proposal there.

> 
> > The beauty of vhost_user is to open a door for variable userland
> workloads(e.g. vswitch). The dataplane connected with VM usually need to
> be close integrated with those userland workloads, a control place
> interface(vhost-user) is better than a datapath interface(e.g. provided by
> dataplace in qemu iothread).
> 
> Do we really need vswitch for vDPA?
Accelerators come into the picture of vswitch, which usually provides in-chip EMC for early classification. It gives a fast path for those throughput sensitive(SLA) VNF to bypass the further table lookup. It co-exists other VNF whose SLA level is best effort but requires more functions(e.g. stateful conntrack, security check, even higher layer WAF support) support, DPDK based datapath still boost the throughput there. It's not used to be a single choice of dedicated or shared datapath, usually they're co-exist. 

> 
> >   On workloads point of view, it's not excited to be part of qemu process.
> 
> Don't see why, qemu have dataplane for virtio-blk/scsi.
Qemu has vhost-user for scsi too. I'm not saying which one is bad, just point out sometime it's very workloads driven. Network is different with blk/scsi/crypto.

> 
> > That comes up with the idea of vhost-user extension. Userland workloads
> decides to enable accelerators or not, qemu provides the common control
> plane infrastructure.
> 
> It brings extra complexity: endless new types of messages and a huge brunch
> of bugs. And what's more important, the split model tends to be less efficient
> in some cases, e.g guest IOMMU integration. I'm pretty sure we will meet
> more in the future.
vIOMMU relevant message has been supported by vhost protocol. It's independent effort there.
I don't see this patch introduce endless new types. My taking of your fundamental concern is about continues adding new features on vhost-user.
Feel free to correct me if I misunderstood your point.

> 
> >>> And IMO it's also not a bad idea to extend vhost-user protocol to
> >>> support the accelerators if possible. And it could be more flexible
> >>> because it could support (for example) below things easily without
> >>> introducing any complex command line options or monitor commands to
> >>> QEMU:
> >> Maybe I was wrong but I don't think we care about the complexity of
> >> command line or monitor command in this case.
> >>
> >>> - the switching among different accelerators and software version
> >>>     can be done at runtime in vhost process;
> >>> - use different accelerators to accelerate different queue pairs
> >>>     or just accelerate some (instead of all) queue pairs;
> >> Well, technically, if we want, these could be implemented in qemu too.
> > You're right if just considering I/O. The ways to consume those I/O is
> another perspective.
> > Simply 1:1 associating guest virtio-net and accelerator w/ SW datapath
> fallback is not the whole picture.
> 
> Pay attention:
> 
> 1) What I mean is not a fallback here. You can still do a lot of tricks e.g
> offloading datapath to hardware or doorbell map.
> 2) Qemu supports (very old and inefficient) a split model of device emulation
> and network backend. This means we can switch between backends (though
> not implemented).
Accelerator won't be defined in the same device layout, it means there're different kinds of drivers.
Qemu definitely won't like to have HW relevant driver there, that's end up with another vhost-vfio in my slides. A mediated device can help to unify the device layout definition, and leave the driver part in its own place.
This approach is quite good when application doesn't need to put userland SW datapath and accelerator datapath in the same picture as which I mentioned(vswitch cases).

> 
> >   It's variable usages on workload side to abstract the device (e.g. port re-
> presenter for vswitch) and etc. I don't think qemu is interested for all bunch
> of things there.
> >
> 
> Again, you can link any dataplane to qemu directly instead of using vhost-
> user if vhost-user tends to be less useful in some cases (vDPA is one of the
> case I think).
See my previous words.

> 
> Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-05 10:25           ` Liang, Cunming
@ 2018-01-08  3:23             ` Jason Wang
  2018-01-08  8:23               ` [Qemu-devel] [virtio-dev] " Liang, Cunming
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2018-01-08  3:23 UTC (permalink / raw)
  To: Liang, Cunming, Bie, Tiwei
  Cc: Tan, Jianfeng, virtio-dev@lists.oasis-open.org, mst@redhat.com,
	Daly, Dan, qemu-devel@nongnu.org, alex.williamson@redhat.com,
	Wang, Xiao W, Maxime Coquelin, stefanha@redhat.com, Wang, Zhihong,
	pbonzini@redhat.com



On 2018年01月05日 18:25, Liang, Cunming wrote:
>
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Friday, January 5, 2018 4:39 PM
>> To: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei
>> <tiwei.bie@intel.com>
>> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; virtio-dev@lists.oasis-open.org;
>> mst@redhat.com; qemu-devel@nongnu.org; alex.williamson@redhat.com;
>> Wang, Xiao W <xiao.w.wang@intel.com>; stefanha@redhat.com; Wang,
>> Zhihong <zhihong.wang@intel.com>; pbonzini@redhat.com; Daly, Dan
>> <dan.daly@intel.com>
>> Subject: Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to support
>> VFIO based accelerators
>>
>>
>>
>> On 2018年01月05日 14:58, Liang, Cunming wrote:
>>>> Thanks for the pointer. Looks rather interesting.
>>>>
>>>>> We're also working on it (including defining a standard device for
>>>>> vhost data path acceleration based on mdev to hide vendor specific
>>>>> details).
>>>> This is exactly what I mean. Form my point of view, there's no need
>>>> for any extension for vhost protocol, we just need to reuse qemu
>>>> iothread to implement a userspace vhost dataplane and do the mdev
>> inside that thread.
>>> On functional perspective, it makes sense to have qemu native support of
>> those certain usage. However, qemu doesn't have to take responsibility for
>> dataplane. There're already huge amounts of codes for different devices
>> emulation, leveraging external dataplane library is an effective way to
>> introduce more.
>>
>> This does not mean to drop external dataplane library. Actually, you can link
>> dpdk to qemu directly.
> It's not a bad idea, then the interface comes to be new API/ABI definition of external dataplane library instead of existing vhost protocol.

These API/ABI should be qemu internal which should be much flexible than 
vhost-user.

>   dpdk as a library is not a big deal to link with, customized application is.
> In addition, it will ask for qemu to provide flexible process model then. Lots of application level features (e.g. hot upgrade/fix) becomes burden.

Don't quite get this, I think we can solve this by migration. Even if a 
dpdk userspace backend can do this, it can only do upgrade and fix for 
network datapath. This is not a complete solution obviously.

It's nice to discuss this but it was a little bit out of the topic.

> I'm open to that option, keep eyes on any proposal there.
>
>>> The beauty of vhost_user is to open a door for variable userland
>> workloads(e.g. vswitch). The dataplane connected with VM usually need to
>> be close integrated with those userland workloads, a control place
>> interface(vhost-user) is better than a datapath interface(e.g. provided by
>> dataplace in qemu iothread).
>>
>> Do we really need vswitch for vDPA?
> Accelerators come into the picture of vswitch, which usually provides in-chip EMC for early classification. It gives a fast path for those throughput sensitive(SLA) VNF to bypass the further table lookup. It co-exists other VNF whose SLA level is best effort but requires more functions(e.g. stateful conntrack, security check, even higher layer WAF support) support, DPDK based datapath still boost the throughput there. It's not used to be a single choice of dedicated or shared datapath, usually they're co-exist.

So if I understand this correctly, the "vswtich" here is a hardware 
function (something like smart NICs or OVS offloaded). So the question 
still, is vhost-user a must in this case?

>
>>>    On workloads point of view, it's not excited to be part of qemu process.
>> Don't see why, qemu have dataplane for virtio-blk/scsi.
> Qemu has vhost-user for scsi too. I'm not saying which one is bad, just point out sometime it's very workloads driven. Network is different with blk/scsi/crypto.

What's the main difference from your point of view which makes 
vhost-user a must in this case?

>>> That comes up with the idea of vhost-user extension. Userland workloads
>> decides to enable accelerators or not, qemu provides the common control
>> plane infrastructure.
>>
>> It brings extra complexity: endless new types of messages and a huge brunch
>> of bugs. And what's more important, the split model tends to be less efficient
>> in some cases, e.g guest IOMMU integration. I'm pretty sure we will meet
>> more in the future.
> vIOMMU relevant message has been supported by vhost protocol. It's independent effort there.

The point is vIOMMU integration is very inefficient in vhost-user for 
some cases. If you have lots of dynamic mappings, it can have only 
5%-10% performance compared to vIOMMU disabled. A huge amount of 
translation request will be generated in this case. The main issue here 
is you can not offload datapath completely to vhost-user backends 
completely, IOMMU translations were still done in qemu. This is one of 
the defect of vhost-user when datapath need to access the device state.

> I don't see this patch introduce endless new types.

Not this patch but we can imagine vhost-user protocol will become 
complex in the future.

> My taking of your fundamental concern is about continues adding new features on vhost-user.
> Feel free to correct me if I misunderstood your point.

Unfortunately not, endless itself is not a problem but we'd better only 
try to extend it only when it was really needed. The main questions are:

1) whether or not we need to split things like what you suggested here?
2) if needed, is vhost-user the best method?

>
>>>>> And IMO it's also not a bad idea to extend vhost-user protocol to
>>>>> support the accelerators if possible. And it could be more flexible
>>>>> because it could support (for example) below things easily without
>>>>> introducing any complex command line options or monitor commands to
>>>>> QEMU:
>>>> Maybe I was wrong but I don't think we care about the complexity of
>>>> command line or monitor command in this case.
>>>>
>>>>> - the switching among different accelerators and software version
>>>>>      can be done at runtime in vhost process;
>>>>> - use different accelerators to accelerate different queue pairs
>>>>>      or just accelerate some (instead of all) queue pairs;
>>>> Well, technically, if we want, these could be implemented in qemu too.
>>> You're right if just considering I/O. The ways to consume those I/O is
>> another perspective.
>>> Simply 1:1 associating guest virtio-net and accelerator w/ SW datapath
>> fallback is not the whole picture.
>>
>> Pay attention:
>>
>> 1) What I mean is not a fallback here. You can still do a lot of tricks e.g
>> offloading datapath to hardware or doorbell map.
>> 2) Qemu supports (very old and inefficient) a split model of device emulation
>> and network backend. This means we can switch between backends (though
>> not implemented).
> Accelerator won't be defined in the same device layout, it means there're different kinds of drivers.

Well, you can still use different drivers if you link dpdk or whatever 
other dataplane library to qemu.

> Qemu definitely won't like to have HW relevant driver there,

Why not? We've already had userspace NVME driver.

>   that's end up with another vhost-vfio in my slides.

I don't get why we can't implement it purely through a userspace driver 
inside qemu.

>   A mediated device can help to unify the device layout definition, and leave the driver part in its own place.

The point is not about mediated device but why you must use vhost-user 
to do it.

Thanks

> This approach is quite good when application doesn't need to put userland SW datapath and accelerator datapath in the same picture as which I mentioned(vswitch cases).
>
>>>    It's variable usages on workload side to abstract the device (e.g. port re-
>> presenter for vswitch) and etc. I don't think qemu is interested for all bunch
>> of things there.
>> Again, you can link any dataplane to qemu directly instead of using vhost-
>> user if vhost-user tends to be less useful in some cases (vDPA is one of the
>> case I think).
> See my previous words.
>
>> Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-08  3:23             ` Jason Wang
@ 2018-01-08  8:23               ` Liang, Cunming
  2018-01-08 10:06                 ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Liang, Cunming @ 2018-01-08  8:23 UTC (permalink / raw)
  To: Jason Wang, Bie, Tiwei
  Cc: Tan, Jianfeng, virtio-dev@lists.oasis-open.org, mst@redhat.com,
	Daly, Dan, qemu-devel@nongnu.org, alex.williamson@redhat.com,
	Wang, Xiao W, Maxime Coquelin, stefanha@redhat.com, Wang, Zhihong,
	pbonzini@redhat.com



> -----Original Message-----
> From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org]
> On Behalf Of Jason Wang
> Sent: Monday, January 8, 2018 11:24 AM
> To: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; virtio-dev@lists.oasis-open.org;
> mst@redhat.com; Daly, Dan <dan.daly@intel.com>; qemu-
> devel@nongnu.org; alex.williamson@redhat.com; Wang, Xiao W
> <xiao.w.wang@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; stefanha@redhat.com; Wang, Zhihong
> <zhihong.wang@intel.com>; pbonzini@redhat.com
> Subject: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-
> user to support VFIO based accelerators
> 
> 
> 
> On 2018年01月05日 18:25, Liang, Cunming wrote:
> >
> >> -----Original Message-----
> >> From: Jason Wang [mailto:jasowang@redhat.com]
> >> Sent: Friday, January 5, 2018 4:39 PM
> >> To: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei
> >> <tiwei.bie@intel.com>
> >> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>;
> >> virtio-dev@lists.oasis-open.org; mst@redhat.com;
> >> qemu-devel@nongnu.org; alex.williamson@redhat.com; Wang, Xiao W
> >> <xiao.w.wang@intel.com>; stefanha@redhat.com; Wang, Zhihong
> >> <zhihong.wang@intel.com>; pbonzini@redhat.com; Daly, Dan
> >> <dan.daly@intel.com>
> >> Subject: Re: [Qemu-devel] [virtio-dev] [RFC 0/3] Extend vhost-user to
> >> support VFIO based accelerators
> >>
> >>
> >>
> >> On 2018年01月05日 14:58, Liang, Cunming wrote:
> >>>> Thanks for the pointer. Looks rather interesting.
> >>>>
> >>>>> We're also working on it (including defining a standard device for
> >>>>> vhost data path acceleration based on mdev to hide vendor specific
> >>>>> details).
> >>>> This is exactly what I mean. Form my point of view, there's no need
> >>>> for any extension for vhost protocol, we just need to reuse qemu
> >>>> iothread to implement a userspace vhost dataplane and do the mdev
> >> inside that thread.
> >>> On functional perspective, it makes sense to have qemu native
> >>> support of
> >> those certain usage. However, qemu doesn't have to take
> >> responsibility for dataplane. There're already huge amounts of codes
> >> for different devices emulation, leveraging external dataplane
> >> library is an effective way to introduce more.
> >>
> >> This does not mean to drop external dataplane library. Actually, you
> >> can link dpdk to qemu directly.
> > It's not a bad idea, then the interface comes to be new API/ABI definition of
> external dataplane library instead of existing vhost protocol.
> 
> These API/ABI should be qemu internal which should be much flexible than
> vhost-user.
> 
> >   dpdk as a library is not a big deal to link with, customized application is.
> > In addition, it will ask for qemu to provide flexible process model then. Lots
> of application level features (e.g. hot upgrade/fix) becomes burden.
> 
> Don't quite get this, I think we can solve this by migration. Even if a dpdk
> userspace backend can do this, it can only do upgrade and fix for network
> datapath. This is not a complete solution obviously.
> 
> It's nice to discuss this but it was a little bit out of the topic.
> 
> > I'm open to that option, keep eyes on any proposal there.
> >
> >>> The beauty of vhost_user is to open a door for variable userland
> >> workloads(e.g. vswitch). The dataplane connected with VM usually need
> >> to be close integrated with those userland workloads, a control place
> >> interface(vhost-user) is better than a datapath interface(e.g.
> >> provided by dataplace in qemu iothread).
> >>
> >> Do we really need vswitch for vDPA?
> > Accelerators come into the picture of vswitch, which usually provides in-
> chip EMC for early classification. It gives a fast path for those throughput
> sensitive(SLA) VNF to bypass the further table lookup. It co-exists other VNF
> whose SLA level is best effort but requires more functions(e.g. stateful
> conntrack, security check, even higher layer WAF support) support, DPDK
> based datapath still boost the throughput there. It's not used to be a single
> choice of dedicated or shared datapath, usually they're co-exist.
> 
> So if I understand this correctly, the "vswtich" here is a hardware function
> (something like smart NICs or OVS offloaded). So the question still, is vhost-
> user a must in this case?

"vswitch" point to SW vswitch(e.g. OVS-DPDK). Accelerators stands for different offloading IPs on the device(e.g. smart NIC) which can be used from a userland driver.
EMC IP used to offload OVS fastpath, so as move traffic to VM directly. Either SRIOV device assignment or vDPA helps to build datapath pass-thru context which represented by a virtual interface on management perspective. For entire "vswitch", there still co-exist none pass-thru interface(SW backend) which uses vhost-user for virtual interface.
Both of them shall be able to replace each other. 

There's no other user space choice yet recently for network except vhost-user. The patch of vhost-user extension has lower impact for qemu.
If you read this patch, it's really about to reduce the doorbell and interrupt overhead. Basic vDPA works even without any qemu change. As vhost-user is well-recognized as the vhost interface for userland backend, it's reasonable to well-support the usage of userland backend w/ I/O accelerator.

Before moving forward, it's necessary to get some alignment on two basic things.
- Do you agree that providing userland backend via vhost-user is the right way to do with vswitch workload.
   Otherwise, we probably shall go back to revisit vhost-user itself rather than talking anything new happening on vhost-user.
- Do you agree vhost-user is a right way for qemu to allow multi-process?
   Please refer to https://www.linux-kvm.org/images/f/fc/KVM_FORUM_multi-process.pdf

> 
> >
> >>>    On workloads point of view, it's not excited to be part of qemu process.
> >> Don't see why, qemu have dataplane for virtio-blk/scsi.
> > Qemu has vhost-user for scsi too. I'm not saying which one is bad, just
> point out sometime it's very workloads driven. Network is different with
> blk/scsi/crypto.
> 
> What's the main difference from your point of view which makes
> vhost-user a must in this case?
Network devices, a NIC or a Smart NIC usually has vendor specific driver. DPDK takes devices by its user space drivers to run OVS. Virtual interface is all vhost-user based talking with qemu. For some virtual interface, it now tries to bypass the traffic. It's looking forward a consistent vhost-user interface there. Linking OVS-DPDK with qemu, TBH, it's far away from today's usage.

> 
> >>> That comes up with the idea of vhost-user extension. Userland
> workloads
> >> decides to enable accelerators or not, qemu provides the common control
> >> plane infrastructure.
> >>
> >> It brings extra complexity: endless new types of messages and a huge
> brunch
> >> of bugs. And what's more important, the split model tends to be less
> efficient
> >> in some cases, e.g guest IOMMU integration. I'm pretty sure we will meet
> >> more in the future.
> > vIOMMU relevant message has been supported by vhost protocol. It's
> independent effort there.
> 
> The point is vIOMMU integration is very inefficient in vhost-user for
> some cases. If you have lots of dynamic mappings, it can have only
> 5%-10% performance compared to vIOMMU disabled. A huge amount of
> translation request will be generated in this case. The main issue here
> is you can not offload datapath completely to vhost-user backends
> completely, IOMMU translations were still done in qemu. This is one of
> the defect of vhost-user when datapath need to access the device state.
It's vIOMMU's challenge of dynamic mapping, besides vhost-user, kernel vhost shall face the same situation. Static mapping w/ DPDK looks much better. It's not fair to blame vhost-user by vIOMMU overhead.

> 
> > I don't see this patch introduce endless new types.
> 
> Not this patch but we can imagine vhost-user protocol will become
> complex in the future.
> 
> > My taking of your fundamental concern is about continues adding new
> features on vhost-user.
> > Feel free to correct me if I misunderstood your point.
> 
> Unfortunately not, endless itself is not a problem but we'd better only
> try to extend it only when it was really needed. The main questions are:
> 
> 1) whether or not we need to split things like what you suggested here?
> 2) if needed, is vhost-user the best method?
Sounds good. BTW, this patch(vhost-user extention) is a performance improvement patch for DPDK vDPA usage(Refer DPDK patches). Another RFC patch stay tuned for kernel space usage which will propose a qemu native vhost adaptor for in-kernel mediated device driver.

> 
> >
> >>>>> And IMO it's also not a bad idea to extend vhost-user protocol to
> >>>>> support the accelerators if possible. And it could be more flexible
> >>>>> because it could support (for example) below things easily without
> >>>>> introducing any complex command line options or monitor commands
> to
> >>>>> QEMU:
> >>>> Maybe I was wrong but I don't think we care about the complexity of
> >>>> command line or monitor command in this case.
> >>>>
> >>>>> - the switching among different accelerators and software version
> >>>>>      can be done at runtime in vhost process;
> >>>>> - use different accelerators to accelerate different queue pairs
> >>>>>      or just accelerate some (instead of all) queue pairs;
> >>>> Well, technically, if we want, these could be implemented in qemu too.
> >>> You're right if just considering I/O. The ways to consume those I/O is
> >> another perspective.
> >>> Simply 1:1 associating guest virtio-net and accelerator w/ SW datapath
> >> fallback is not the whole picture.
> >>
> >> Pay attention:
> >>
> >> 1) What I mean is not a fallback here. You can still do a lot of tricks e.g
> >> offloading datapath to hardware or doorbell map.
> >> 2) Qemu supports (very old and inefficient) a split model of device
> emulation
> >> and network backend. This means we can switch between backends
> (though
> >> not implemented).
> > Accelerator won't be defined in the same device layout, it means there're
> different kinds of drivers.
> 
> Well, you can still use different drivers if you link dpdk or whatever
> other dataplane library to qemu.
> 
> > Qemu definitely won't like to have HW relevant driver there,
> 
> Why not? We've already had userspace NVME driver.
There's huge amount of vendor specific driver for network. NVMe is much generalized than NIC.
The idea of linking an external dataplane sounds interesting, but it's not used in real world. Looking forward the progress.

> 
> >   that's end up with another vhost-vfio in my slides.
> 
> I don't get why we can't implement it purely through a userspace driver
> inside qemu.
TBH, we think about this before. There're a few reasons stopping us.
- qemu hasn't an abstraction layout of network device(HW NIC) for userspace drivers
- qemu launch process, linking dpdk w/ qemu is not problem. Gap is on ovs integration, effort/impact is not small
- for qemu native virtio SW backend, it lacks of efficient ways to talk with external process. The change efforts/impact is not small.
- qemu native userspace driver only used for qemu, userspace driver in DPDK can be used for others

> 
> >   A mediated device can help to unify the device layout definition, and leave
> the driver part in its own place.
> 
> The point is not about mediated device but why you must use vhost-user
> to do it.
> 
> Thanks
> 
> > This approach is quite good when application doesn't need to put userland
> SW datapath and accelerator datapath in the same picture as which I
> mentioned(vswitch cases).
> >
> >>>    It's variable usages on workload side to abstract the device (e.g. port
> re-
> >> presenter for vswitch) and etc. I don't think qemu is interested for all
> bunch
> >> of things there.
> >> Again, you can link any dataplane to qemu directly instead of using vhost-
> >> user if vhost-user tends to be less useful in some cases (vDPA is one of the
> >> case I think).
> > See my previous words.
> >
> >> Thanks
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
  2018-01-08  8:23               ` [Qemu-devel] [virtio-dev] " Liang, Cunming
@ 2018-01-08 10:06                 ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2018-01-08 10:06 UTC (permalink / raw)
  To: Liang, Cunming, Bie, Tiwei
  Cc: Tan, Jianfeng, virtio-dev@lists.oasis-open.org,
	pbonzini@redhat.com, mst@redhat.com, Daly, Dan,
	qemu-devel@nongnu.org, alex.williamson@redhat.com, Wang, Xiao W,
	stefanha@redhat.com, Wang, Zhihong, Maxime Coquelin


[...]

>> chip EMC for early classification. It gives a fast path for those throughput
>> sensitive(SLA) VNF to bypass the further table lookup. It co-exists other VNF
>> whose SLA level is best effort but requires more functions(e.g. stateful
>> conntrack, security check, even higher layer WAF support) support, DPDK
>> based datapath still boost the throughput there. It's not used to be a single
>> choice of dedicated or shared datapath, usually they're co-exist.
>>
>> So if I understand this correctly, the "vswtich" here is a hardware function
>> (something like smart NICs or OVS offloaded). So the question still, is vhost-
>> user a must in this case?
> "vswitch" point to SW vswitch(e.g. OVS-DPDK). Accelerators stands for different offloading IPs on the device(e.g. smart NIC) which can be used from a userland driver.
> EMC IP used to offload OVS fastpath, so as move traffic to VM directly. Either SRIOV device assignment or vDPA helps to build datapath pass-thru context which represented by a virtual interface on management perspective. For entire "vswitch", there still co-exist none pass-thru interface(SW backend) which uses vhost-user for virtual interface.
> Both of them shall be able to replace each other.

Thanks, I kind of get the picture here.

A question is about the software backend, e.g what's the software 
counterpart for SRIOV or vDPA? E.g is there a VF or vDPA pmd connected 
to OVS-dpdk and it can switch to offload if required?

>
> There's no other user space choice yet recently for network except vhost-user. The patch of vhost-user extension has lower impact for qemu.
> If you read this patch, it's really about to reduce the doorbell and interrupt overhead.

For this patch, you need decouple pci specific stuffs out of vhost-user 
which is transport independent (at least now).

> Basic vDPA works even without any qemu change. As vhost-user is well-recognized as the vhost interface for userland backend, it's reasonable to well-support the usage of userland backend w/ I/O accelerator.

Right, so you can do all offloads in qemu, vhost-user could be still 
there. And qemu can switch between the two like a transparent bond or team?

>
> Before moving forward, it's necessary to get some alignment on two basic things.
> - Do you agree that providing userland backend via vhost-user is the right way to do with vswitch workload.
>     Otherwise, we probably shall go back to revisit vhost-user itself rather than talking anything new happening on vhost-user.

I agree.

> - Do you agree vhost-user is a right way for qemu to allow multi-process?
>     Please refer to https://www.linux-kvm.org/images/f/fc/KVM_FORUM_multi-process.pdf

This is questionable. From both performance and security points. We had 
example of performance (vIOMMU). For security, e.g in this patch, qemu 
can setup memory region based on the request from vhost-user slave, does 
this increase the attack surface?

I think you missed my point some how, as replied in previous thread, I 
did't object what you propose here. I just want to understand the reason 
you choose vhost-user. And in the cover letter, vswitch case is not 
mentioned at all, instead and it compares vDPA with VFIO. This makes 
reader easily to think that qemu will monopoly the device, so it's 
rather nature to ask why not do it inside qemu.

>
>>>>>     On workloads point of view, it's not excited to be part of qemu process.
>>>> Don't see why, qemu have dataplane for virtio-blk/scsi.
>>> Qemu has vhost-user for scsi too. I'm not saying which one is bad, just
>> point out sometime it's very workloads driven. Network is different with
>> blk/scsi/crypto.
>>
>> What's the main difference from your point of view which makes
>> vhost-user a must in this case?
> Network devices, a NIC or a Smart NIC usually has vendor specific driver. DPDK takes devices by its user space drivers to run OVS. Virtual interface is all vhost-user based talking with qemu. For some virtual interface, it now tries to bypass the traffic. It's looking forward a consistent vhost-user interface there.

So the point is probably you can keep vhost-user for sw path while 
implementing offloaded path in qemu completely?

>   Linking OVS-DPDK with qemu, TBH, it's far away from today's usage.
>
>>>>> That comes up with the idea of vhost-user extension. Userland
>> workloads
>>>> decides to enable accelerators or not, qemu provides the common control
>>>> plane infrastructure.
>>>>
>>>> It brings extra complexity: endless new types of messages and a huge
>> brunch
>>>> of bugs. And what's more important, the split model tends to be less
>> efficient
>>>> in some cases, e.g guest IOMMU integration. I'm pretty sure we will meet
>>>> more in the future.
>>> vIOMMU relevant message has been supported by vhost protocol. It's
>> independent effort there.
>>
>> The point is vIOMMU integration is very inefficient in vhost-user for
>> some cases. If you have lots of dynamic mappings, it can have only
>> 5%-10% performance compared to vIOMMU disabled. A huge amount of
>> translation request will be generated in this case. The main issue here
>> is you can not offload datapath completely to vhost-user backends
>> completely, IOMMU translations were still done in qemu. This is one of
>> the defect of vhost-user when datapath need to access the device state.
> It's vIOMMU's challenge of dynamic mapping, besides vhost-user, kernel vhost shall face the same situation. Static mapping w/ DPDK looks much better. It's not fair to blame vhost-user by vIOMMU overhead.

Yes, that's why I want a vhost dataplane inside qemu. (btw vhost-user 
should be even worse consider syscall is less expensive than IPC).

>
>>> I don't see this patch introduce endless new types.
>> Not this patch but we can imagine vhost-user protocol will become
>> complex in the future.
>>
>>> My taking of your fundamental concern is about continues adding new
>> features on vhost-user.
>>> Feel free to correct me if I misunderstood your point.
>> Unfortunately not, endless itself is not a problem but we'd better only
>> try to extend it only when it was really needed. The main questions are:
>>
>> 1) whether or not we need to split things like what you suggested here?
>> 2) if needed, is vhost-user the best method?
> Sounds good. BTW, this patch(vhost-user extention) is a performance improvement patch for DPDK vDPA usage(Refer DPDK patches). Another RFC patch stay tuned for kernel space usage which will propose a qemu native vhost adaptor for in-kernel mediated device driver.

Any pointer to this patch?

[...]

>> Why not? We've already had userspace NVME driver.
> There's huge amount of vendor specific driver for network. NVMe is much generalized than NIC.
> The idea of linking an external dataplane sounds interesting, but it's not used in real world. Looking forward the progress.
>
>>>    that's end up with another vhost-vfio in my slides.
>> I don't get why we can't implement it purely through a userspace driver
>> inside qemu.
> TBH, we think about this before. There're a few reasons stopping us.
> - qemu hasn't an abstraction layout of network device(HW NIC) for userspace drivers

Well, you can still use vhost (but not vhost-user).

> - qemu launch process, linking dpdk w/ qemu is not problem. Gap is on ovs integration, effort/impact is not small

We can keep vhost-user datapath.

> - for qemu native virtio SW backend, it lacks of efficient ways to talk with external process. The change efforts/impact is not small.

By keeping vhost-user datapath there's no such worries. Btw, we will 
probably need a channel between qemu and ovs directly which can 
negotiate more offloads.

> - qemu native userspace driver only used for qemu, userspace driver in DPDK can be used for others
>


Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support
  2017-12-22  6:41 ` [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support Tiwei Bie
@ 2018-01-16 17:23   ` Alex Williamson
  2018-01-17  5:00     ` Tiwei Bie
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2018-01-16 17:23 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: virtio-dev, qemu-devel, mst, pbonzini, stefanha, cunming.liang,
	dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On Fri, 22 Dec 2017 14:41:51 +0800
Tiwei Bie <tiwei.bie@intel.com> wrote:

> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  docs/interop/vhost-user.txt    |  57 ++++++
>  hw/vfio/common.c               |   2 +-
>  hw/virtio/vhost-user.c         | 381 ++++++++++++++++++++++++++++++++++++++++-
>  hw/virtio/vhost.c              |   3 +-
>  hw/virtio/virtio-pci.c         |   8 -
>  hw/virtio/virtio-pci.h         |   8 +
>  include/hw/vfio/vfio.h         |   2 +
>  include/hw/virtio/vhost-user.h |  26 +++
>  8 files changed, 476 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c0ef..53d8700581 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -49,7 +49,7 @@ struct vfio_as_head vfio_address_spaces =
>   * initialized, this file descriptor is only released on QEMU exit and
>   * we'll re-use it should another vfio device be attached before then.
>   */
> -static int vfio_kvm_device_fd = -1;
> +int vfio_kvm_device_fd = -1;
>  #endif

It seems troublesome for vhost to maintain it's own list of groups and
register them with the vfio-kvm device.  These should likely be made
into services provided by vfio/common.c such that we can have a single
group list and interfaces for adding and deleting them.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support
  2018-01-16 17:23   ` Alex Williamson
@ 2018-01-17  5:00     ` Tiwei Bie
  0 siblings, 0 replies; 19+ messages in thread
From: Tiwei Bie @ 2018-01-17  5:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: virtio-dev, qemu-devel, mst, pbonzini, stefanha, cunming.liang,
	dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On Tue, Jan 16, 2018 at 10:23:39AM -0700, Alex Williamson wrote:
> On Fri, 22 Dec 2017 14:41:51 +0800
> Tiwei Bie <tiwei.bie@intel.com> wrote:
> 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >  docs/interop/vhost-user.txt    |  57 ++++++
> >  hw/vfio/common.c               |   2 +-
> >  hw/virtio/vhost-user.c         | 381 ++++++++++++++++++++++++++++++++++++++++-
> >  hw/virtio/vhost.c              |   3 +-
> >  hw/virtio/virtio-pci.c         |   8 -
> >  hw/virtio/virtio-pci.h         |   8 +
> >  include/hw/vfio/vfio.h         |   2 +
> >  include/hw/virtio/vhost-user.h |  26 +++
> >  8 files changed, 476 insertions(+), 11 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 7b2924c0ef..53d8700581 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -49,7 +49,7 @@ struct vfio_as_head vfio_address_spaces =
> >   * initialized, this file descriptor is only released on QEMU exit and
> >   * we'll re-use it should another vfio device be attached before then.
> >   */
> > -static int vfio_kvm_device_fd = -1;
> > +int vfio_kvm_device_fd = -1;
> >  #endif
> 
> It seems troublesome for vhost to maintain it's own list of groups and
> register them with the vfio-kvm device.  These should likely be made
> into services provided by vfio/common.c such that we can have a single
> group list and interfaces for adding and deleting them.  Thanks,

Thank you very much for the comments! I'll fix this.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-01-17  5:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-22  6:41 [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Tiwei Bie
2017-12-22  6:41 ` [Qemu-devel] [RFC 1/3] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
2017-12-22  6:41 ` [Qemu-devel] [RFC 2/3] vhost-user: introduce shared vhost-user state Tiwei Bie
2017-12-22  6:41 ` [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support Tiwei Bie
2018-01-16 17:23   ` Alex Williamson
2018-01-17  5:00     ` Tiwei Bie
2018-01-02  2:42 ` [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Alexey Kardashevskiy
2018-01-02  5:49   ` Liang, Cunming
2018-01-02  6:01     ` Alexey Kardashevskiy
2018-01-02  6:48       ` Liang, Cunming
2018-01-03 14:34 ` [Qemu-devel] [virtio-dev] " Jason Wang
2018-01-04  6:18   ` Tiwei Bie
2018-01-04  7:21     ` Jason Wang
2018-01-05  6:58       ` Liang, Cunming
2018-01-05  8:38         ` Jason Wang
2018-01-05 10:25           ` Liang, Cunming
2018-01-08  3:23             ` Jason Wang
2018-01-08  8:23               ` [Qemu-devel] [virtio-dev] " Liang, Cunming
2018-01-08 10:06                 ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).