[Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting
@ 2019-01-03 10:18 elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets elohimes
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patchset is aimed at supporting qemu to reconnect
vhost-user-blk backend after vhost-user-blk backend crash or
restart.

The patch 1 uses exisiting wait/nowait options to make QEMU not
do a connect on client sockets during initialization of the chardev.

The patch 2 introduces two new messages VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD to support providing shared
memory to backend.

The patch 3,4 are the corresponding libvhost-user patches of
patch 2. Make libvhost-user support VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD.

The patch 5 allows vhost-user-blk to use the two new messages
to provide shared memory to backend.

The patch 6 supports vhost-user-blk to reconnect backend when
connection closed.

The patch 7 introduces VHOST_USER_PROTOCOL_F_SLAVE_SHMFD
to vhost-user-blk backend which is used to tell qemu that
we support reconnecting now.

To use it, we could start qemu with:

qemu-system-x86_64 \
        -chardev socket,id=char0,path=/path/vhost.socket,nowait,reconnect=1, \
        -device vhost-user-blk-pci,chardev=char0 \

and start vhost-user-blk backend with:

vhost-user-blk -b /path/file -s /path/vhost.socket

Then we can restart vhost-user-blk at any time during VM running.

V2 to V3:
- Using exisiting wait/nowait options to control connection on
  client sockets instead of introducing "disconnected" option.
- Support the case that vhost-user backend restart during initialzation
  of vhost-user-blk device.

V1 to V2:
- Introduce "disconnected" option for chardev instead of reuse "wait"
  option
- Support the case that QEMU starts before vhost-user backend
- Drop message VHOST_USER_SET_VRING_INFLIGHT
- Introduce two new messages VHOST_USER_GET_SHM_SIZE
  and VHOST_USER_SET_SHM_FD

Xie Yongji (7):
  char-socket: Enable "nowait" option on client sockets
  vhost-user: Support providing shared memory to backend
  libvhost-user: Introduce vu_queue_map_desc()
  libvhost-user: Support recording inflight I/O in shared memory
  vhost-user-blk: Add support to provide shared memory to backend
  vhost-user-blk: Add support to reconnect backend
  contrib/vhost-user-blk: enable inflight I/O recording

 chardev/char-socket.c                   |  56 ++---
 contrib/libvhost-user/libvhost-user.c   | 309 ++++++++++++++++++++----
 contrib/libvhost-user/libvhost-user.h   |  33 +++
 contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
 docs/interop/vhost-user.txt             |  41 ++++
 hw/block/vhost-user-blk.c               | 223 ++++++++++++++---
 hw/virtio/vhost-user.c                  |  86 +++++++
 hw/virtio/vhost.c                       | 117 +++++++++
 include/hw/virtio/vhost-backend.h       |   9 +
 include/hw/virtio/vhost-user-blk.h      |   5 +
 include/hw/virtio/vhost.h               |  19 ++
 qapi/char.json                          |   3 +-
 qemu-options.hx                         |   9 +-
 13 files changed, 799 insertions(+), 114 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets
  2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
@ 2019-01-03 10:18 ` elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Enable "nowait" option to make QEMU not do a connect
on client sockets during initialization of the chardev.
Then we can use qemu_chr_fe_wait_connected() to connect
when necessary. Now it would be used for unix domain
socket of vhost-user-blk device to support reconnect.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 chardev/char-socket.c | 56 +++++++++++++++++++++----------------------
 qapi/char.json        |  3 +--
 qemu-options.hx       |  9 ++++---
 3 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index eaa8e8b68f..f803f4f7d3 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -1072,37 +1072,37 @@ static void qmp_chardev_open_socket(Chardev *chr,
         s->reconnect_time = reconnect;
     }
 
-    if (s->reconnect_time) {
-        tcp_chr_connect_async(chr);
-    } else {
-        if (s->is_listen) {
-            char *name;
-            s->listener = qio_net_listener_new();
+    if (s->is_listen) {
+        char *name;
+        s->listener = qio_net_listener_new();
 
-            name = g_strdup_printf("chardev-tcp-listener-%s", chr->label);
-            qio_net_listener_set_name(s->listener, name);
-            g_free(name);
+        name = g_strdup_printf("chardev-tcp-listener-%s", chr->label);
+        qio_net_listener_set_name(s->listener, name);
+        g_free(name);
 
-            if (qio_net_listener_open_sync(s->listener, s->addr, errp) < 0) {
-                object_unref(OBJECT(s->listener));
-                s->listener = NULL;
-                goto error;
-            }
+        if (qio_net_listener_open_sync(s->listener, s->addr, errp) < 0) {
+            object_unref(OBJECT(s->listener));
+            s->listener = NULL;
+            goto error;
+        }
 
-            qapi_free_SocketAddress(s->addr);
-            s->addr = socket_local_address(s->listener->sioc[0]->fd, errp);
-            update_disconnected_filename(s);
+        qapi_free_SocketAddress(s->addr);
+        s->addr = socket_local_address(s->listener->sioc[0]->fd, errp);
+        update_disconnected_filename(s);
 
-            if (is_waitconnect &&
-                qemu_chr_wait_connected(chr, errp) < 0) {
-                return;
-            }
-            if (!s->ioc) {
-                qio_net_listener_set_client_func_full(s->listener,
-                                                      tcp_chr_accept,
-                                                      chr, NULL,
-                                                      chr->gcontext);
-            }
+        if (is_waitconnect &&
+            qemu_chr_wait_connected(chr, errp) < 0) {
+            return;
+        }
+        if (!s->ioc) {
+            qio_net_listener_set_client_func_full(s->listener,
+                                                  tcp_chr_accept,
+                                                  chr, NULL,
+                                                  chr->gcontext);
+        }
+    } else if (is_waitconnect) {
+        if (s->reconnect_time) {
+            tcp_chr_connect_async(chr);
         } else if (qemu_chr_wait_connected(chr, errp) < 0) {
             goto error;
         }
@@ -1120,7 +1120,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
                                   Error **errp)
 {
     bool is_listen      = qemu_opt_get_bool(opts, "server", false);
-    bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
+    bool is_waitconnect = qemu_opt_get_bool(opts, "wait", true);
     bool is_telnet      = qemu_opt_get_bool(opts, "telnet", false);
     bool is_tn3270      = qemu_opt_get_bool(opts, "tn3270", false);
     bool is_websock     = qemu_opt_get_bool(opts, "websocket", false);
diff --git a/qapi/char.json b/qapi/char.json
index 77ed847972..6a3b5bcd71 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -249,8 +249,7 @@
 #        or connect to (server=false)
 # @tls-creds: the ID of the TLS credentials object (since 2.6)
 # @server: create server socket (default: true)
-# @wait: wait for incoming connection on server
-#        sockets (default: false).
+# @wait: wait for being connected or connecting to (default: false)
 # @nodelay: set TCP_NODELAY socket option (default: false)
 # @telnet: enable telnet protocol on server
 #          sockets (default: false)
diff --git a/qemu-options.hx b/qemu-options.hx
index df42116ecc..66d99c6e83 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2556,8 +2556,9 @@ undefined if TCP options are specified for a unix socket.
 
 @option{server} specifies that the socket shall be a listening socket.
 
-@option{nowait} specifies that QEMU should not block waiting for a client to
-connect to a listening socket.
+@option{nowait} specifies that QEMU should not wait for being connected on
+server sockets or try to do a sync/async connect on client sockets during
+initialization of the chardev.
 
 @option{telnet} specifies that traffic on the socket should interpret telnet
 escape sequences.
@@ -3093,7 +3094,9 @@ I/O to a location or wait for a connection from a location.  By default
 the TCP Net Console is sent to @var{host} at the @var{port}.  If you use
 the @var{server} option QEMU will wait for a client socket application
 to connect to the port before continuing, unless the @code{nowait}
-option was specified.  The @code{nodelay} option disables the Nagle buffering
+option was specified. And the @code{nowait} option could also be
+used when @var{noserver} is set to disallow QEMU to connect during
+initialization.  The @code{nodelay} option disables the Nagle buffering
 algorithm.  The @code{reconnect} option only applies if @var{noserver} is
 set, if the connection goes down it will attempt to reconnect at the
 given interval.  If @var{host} is omitted, 0.0.0.0 is assumed. Only
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets elohimes
@ 2019-01-03 10:18 ` elohimes
  2019-01-03 17:02   ` Michael S. Tsirkin
  2019-01-03 17:13   ` Michael S. Tsirkin
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD to support providing shared
memory to backend.

Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
required size of shared memory from backend. Then, qemu
allocates memory and sends them back to backend through
VHOST_USER_SET_SHM_FD.

Note that the shared memory should be used to record
inflight I/O by backend. Qemu will clear it when vm reset.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 docs/interop/vhost-user.txt       |  41 +++++++++++
 hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
 hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-backend.h |   9 +++
 include/hw/virtio/vhost.h         |  19 +++++
 5 files changed, 272 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index c2194711d9..5ee9c28ab0 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -142,6 +142,19 @@ Depending on the request type, payload can be:
    Offset: a 64-bit offset of this area from the start of the
        supplied file descriptor
 
+ * Shm description
+   -----------------------------------
+   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
+   -----------------------------------
+
+   Mmap_size: a 64-bit size of the shared memory
+   Mmap_offset: a 64-bit offset of the shared memory from the start
+                of the supplied file descriptor
+   Dev_size: a 32-bit size of device region in shared memory
+   Vq_size: a 32-bit size of each virtqueue region in shared memory
+   Align: a 32-bit align of each region in shared memory
+   Version: a 32-bit version of this shared memory
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserVringArea area;
+        VhostUserShm shm;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -175,6 +189,7 @@ the ones that do:
  * VHOST_USER_GET_PROTOCOL_FEATURES
  * VHOST_USER_GET_VRING_BASE
  * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
+ * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
 
 [ Also see the section on REPLY_ACK protocol extension. ]
 
@@ -188,6 +203,7 @@ in the ancillary data:
  * VHOST_USER_SET_VRING_CALL
  * VHOST_USER_SET_VRING_ERR
  * VHOST_USER_SET_SLAVE_REQ_FD
+ * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
 
 If Master is unable to send the full message or receives a wrong reply it will
 close the connection. An optional reconnection mechanism can be implemented.
@@ -397,6 +413,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CONFIG         9
 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
+#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
 
 Master message types
 --------------------
@@ -761,6 +778,30 @@ Master message types
       was previously sent.
       The value returned is an error indication; 0 is success.
 
+ * VHOST_USER_GET_SHM_SIZE
+      Id: 31
+      Equivalent ioctl: N/A
+      Master payload: shm description
+
+      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
+      successfully negotiated, master need to provide a shared memory to
+      slave. This message is used by master to get required size from slave.
+      The shared memory contains one region for device and several regions
+      for virtqueue. The size of those two kinds of regions is specified
+      by dev_size field and vq_size filed. The align field specify the alignment
+      of those regions.
+
+ * VHOST_USER_SET_SHM_FD
+      Id: 32
+      Equivalent ioctl: N/A
+      Master payload: shm description
+
+      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
+      successfully negotiated, master uses this message to set shared memory
+      for slave. The memory fd is passed in the ancillary data. The shared
+      memory should be used to record inflight I/O by slave. And master will
+      clear it when vm reset.
+
 Slave message types
 -------------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e09bed0e4a..8cdf3b5121 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_GET_SHM_SIZE = 31,
+    VHOST_USER_SET_SHM_FD = 32,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserShm {
+    uint64_t mmap_size;
+    uint64_t mmap_offset;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+} VhostUserShm;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -169,6 +181,7 @@ typedef union {
         VhostUserConfig config;
         VhostUserCryptoSession session;
         VhostUserVringArea area;
+        VhostUserShm shm;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
     return result;
 }
 
+static int vhost_user_get_shm_size(struct vhost_dev *dev,
+                                   struct vhost_shm *shm)
+{
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_GET_SHM_SIZE,
+        .hdr.flags = VHOST_USER_VERSION,
+        .hdr.size = sizeof(msg.payload.shm),
+    };
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
+        shm->dev_size = 0;
+        shm->vq_size = 0;
+        return 0;
+    }
+
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        return -1;
+    }
+
+    if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
+        error_report("Received unexpected msg type. "
+                     "Expected %d received %d",
+                     VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
+        return -1;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.shm)) {
+        error_report("Received bad msg size.");
+        return -1;
+    }
+
+    shm->dev_size = msg.payload.shm.dev_size;
+    shm->vq_size = msg.payload.shm.vq_size;
+    shm->align = msg.payload.shm.align;
+    shm->version = msg.payload.shm.version;
+
+    return 0;
+}
+
+static int vhost_user_set_shm_fd(struct vhost_dev *dev,
+                                 struct vhost_shm *shm)
+{
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_SET_SHM_FD,
+        .hdr.flags = VHOST_USER_VERSION,
+        .payload.shm.mmap_size = shm->mmap_size,
+        .payload.shm.mmap_offset = 0,
+        .payload.shm.dev_size = shm->dev_size,
+        .payload.shm.vq_size = shm->vq_size,
+        .payload.shm.align = shm->align,
+        .payload.shm.version = shm->version,
+        .hdr.size = sizeof(msg.payload.shm),
+    };
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
+        return 0;
+    }
+
+    if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
 VhostUserState *vhost_user_init(void)
 {
     VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
         .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
+        .vhost_get_shm_size = vhost_user_get_shm_size,
+        .vhost_set_shm_fd = vhost_user_set_shm_fd,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 569c4053ea..7a38fed50f 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
     hdev->config_ops = ops;
 }
 
+void vhost_dev_reset_shm(struct vhost_shm *shm)
+{
+    if (shm->addr) {
+        memset(shm->addr, 0, shm->mmap_size);
+    }
+}
+
+void vhost_dev_free_shm(struct vhost_shm *shm)
+{
+    if (shm->addr) {
+        qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
+        shm->addr = NULL;
+        shm->fd = -1;
+    }
+}
+
+int vhost_dev_alloc_shm(struct vhost_shm *shm)
+{
+    Error *err = NULL;
+    int fd = -1;
+    void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
+                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
+                                  &fd, &err);
+    if (err) {
+        error_report_err(err);
+        return -1;
+    }
+
+    shm->addr = addr;
+    shm->fd = fd;
+
+    return 0;
+}
+
+void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
+{
+    if (shm->addr) {
+        qemu_put_be64(f, shm->mmap_size);
+        qemu_put_be32(f, shm->dev_size);
+        qemu_put_be32(f, shm->vq_size);
+        qemu_put_be32(f, shm->align);
+        qemu_put_be32(f, shm->version);
+        qemu_put_buffer(f, shm->addr, shm->mmap_size);
+    } else {
+        qemu_put_be64(f, 0);
+    }
+}
+
+int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
+{
+    uint64_t mmap_size;
+
+    mmap_size = qemu_get_be64(f);
+    if (!mmap_size) {
+        return 0;
+    }
+
+    vhost_dev_free_shm(shm);
+
+    shm->mmap_size = mmap_size;
+    shm->dev_size = qemu_get_be32(f);
+    shm->vq_size = qemu_get_be32(f);
+    shm->align = qemu_get_be32(f);
+    shm->version = qemu_get_be32(f);
+
+    if (vhost_dev_alloc_shm(shm)) {
+        return -ENOMEM;
+    }
+
+    qemu_get_buffer(f, shm->addr, mmap_size);
+
+    return 0;
+}
+
+int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
+{
+    int r;
+
+    if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
+        r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
+        if (r) {
+            VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
+            return -errno;
+        }
+    }
+
+    return 0;
+}
+
+int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
+{
+    int r;
+
+    if (dev->vhost_ops->vhost_get_shm_size) {
+        r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
+        if (r) {
+            VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
+            return -errno;
+        }
+
+        if (!shm->dev_size && !shm->vq_size) {
+            return 0;
+        }
+
+        shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
+                         dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
+
+        if (vhost_dev_alloc_shm(shm)) {
+            return -ENOMEM;
+        }
+
+        vhost_dev_reset_shm(shm);
+    }
+
+    return 0;
+}
+
 /* Host notifiers must be enabled at this point. */
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81283ec50f..4e7f13c9e9 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
     VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
 } VhostSetConfigType;
 
+struct vhost_shm;
 struct vhost_dev;
 struct vhost_log;
 struct vhost_memory;
@@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
 typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
                                                 MemoryRegionSection *section);
 
+typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
+                                     struct vhost_shm *shm);
+
+typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
+                                   struct vhost_shm *shm);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -142,6 +149,8 @@ typedef struct VhostOps {
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
     vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
+    vhost_get_shm_size_op vhost_get_shm_size;
+    vhost_set_shm_fd_op vhost_set_shm_fd;
 } VhostOps;
 
 extern const VhostOps user_ops;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a7f449fa87..b6e3d6ab56 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -7,6 +7,17 @@
 #include "exec/memory.h"
 
 /* Generic structures common for any vhost based device. */
+
+struct vhost_shm {
+    void *addr;
+    uint64_t mmap_size;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+    int fd;
+};
+
 struct vhost_virtqueue {
     int kick;
     int call;
@@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
  */
 void vhost_dev_set_config_notifier(struct vhost_dev *dev,
                                    const VhostDevConfigOps *ops);
+
+void vhost_dev_reset_shm(struct vhost_shm *shm);
+void vhost_dev_free_shm(struct vhost_shm *shm);
+int vhost_dev_alloc_shm(struct vhost_shm *shm);
+void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
+int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
+int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
+int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc()
  2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
@ 2019-01-03 10:18 ` elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Introduce vu_queue_map_desc() which should be
independent with vu_queue_pop();

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 88 ++++++++++++++++-----------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index a6b46cdc03..23bd52264c 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -1853,49 +1853,20 @@ virtqueue_alloc_element(size_t sz,
     return elem;
 }
 
-void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+static void *
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
 {
-    unsigned int i, head, max, desc_len;
+    struct vring_desc *desc = vq->vring.desc;
     uint64_t desc_addr, read_len;
+    unsigned int desc_len;
+    unsigned int max = vq->vring.num;
+    unsigned int i = idx;
     VuVirtqElement *elem;
-    unsigned out_num, in_num;
+    unsigned int out_num = 0, in_num = 0;
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
     struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE];
-    struct vring_desc *desc;
     int rc;
 
-    if (unlikely(dev->broken) ||
-        unlikely(!vq->vring.avail)) {
-        return NULL;
-    }
-
-    if (vu_queue_empty(dev, vq)) {
-        return NULL;
-    }
-    /* Needed after virtio_queue_empty(), see comment in
-     * virtqueue_num_heads(). */
-    smp_rmb();
-
-    /* When we start there are none of either input nor output. */
-    out_num = in_num = 0;
-
-    max = vq->vring.num;
-    if (vq->inuse >= vq->vring.num) {
-        vu_panic(dev, "Virtqueue size exceeded");
-        return NULL;
-    }
-
-    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
-        return NULL;
-    }
-
-    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
-        vring_set_avail_event(vq, vq->last_avail_idx);
-    }
-
-    i = head;
-    desc = vq->vring.desc;
     if (desc[i].flags & VRING_DESC_F_INDIRECT) {
         if (desc[i].len % sizeof(struct vring_desc)) {
             vu_panic(dev, "Invalid size for indirect buffer table");
@@ -1947,12 +1918,13 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
+        vu_panic(dev, "read descriptor error");
         return NULL;
     }
 
     /* Now copy what we have collected and mapped */
     elem = virtqueue_alloc_element(sz, out_num, in_num);
-    elem->index = head;
+    elem->index = idx;
     for (i = 0; i < out_num; i++) {
         elem->out_sg[i] = iov[i];
     }
@@ -1960,6 +1932,48 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         elem->in_sg[i] = iov[out_num + i];
     }
 
+    return elem;
+}
+
+void *
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+{
+    unsigned int head;
+    VuVirtqElement *elem;
+
+    if (unlikely(dev->broken) ||
+        unlikely(!vq->vring.avail)) {
+        return NULL;
+    }
+
+    if (vu_queue_empty(dev, vq)) {
+        return NULL;
+    }
+    /*
+     * Needed after virtio_queue_empty(), see comment in
+     * virtqueue_num_heads().
+     */
+    smp_rmb();
+
+    if (vq->inuse >= vq->vring.num) {
+        vu_panic(dev, "Virtqueue size exceeded");
+        return NULL;
+    }
+
+    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
+        return NULL;
+    }
+
+    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
+        vring_set_avail_event(vq, vq->last_avail_idx);
+    }
+
+    elem = vu_queue_map_desc(dev, vq, head, sz);
+
+    if (!elem) {
+        return NULL;
+    }
+
     vq->inuse++;
 
     return elem;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory
  2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (2 preceding siblings ...)
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
@ 2019-01-03 10:18 ` elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch adds support for VHOST_USER_GET_SHM_SIZE and
VHOST_USER_SET_SHM_FD message to get shared memory from qemu.
Then we maintain a "bitmap" of all descriptors in
the shared memory for each queue to record inflight I/O.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/libvhost-user/libvhost-user.c | 221 +++++++++++++++++++++++++-
 contrib/libvhost-user/libvhost-user.h |  33 ++++
 2 files changed, 248 insertions(+), 6 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 23bd52264c..f18f5e6e62 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -53,6 +53,18 @@
             _min1 < _min2 ? _min1 : _min2; })
 #endif
 
+/* Round number down to multiple */
+#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
+
+/* Round number up to multiple */
+#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
+
+/* Align each region to cache line size in shared memory */
+#define SHM_ALIGNMENT 64
+
+/* The version of shared memory */
+#define SHM_VERSION 1
+
 #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
 
 /* The version of the protocol we support */
@@ -100,6 +112,8 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_POSTCOPY_LISTEN),
         REQ(VHOST_USER_POSTCOPY_END),
+        REQ(VHOST_USER_GET_SHM_SIZE),
+        REQ(VHOST_USER_SET_SHM_FD),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -890,6 +904,41 @@ vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static int
+vu_check_queue_inflights(VuDev *dev, VuVirtq *vq)
+{
+    int i = 0;
+
+    if ((dev->protocol_features &
+        VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->shm)) {
+        return -1;
+    }
+
+    vq->used_idx = vq->vring.used->idx;
+    vq->inflight_num = 0;
+    for (i = 0; i < vq->vring.num; i++) {
+        if (vq->shm->inflight[i] == 0) {
+            continue;
+        }
+
+        vq->inflight_desc[vq->inflight_num++] = i;
+        vq->inuse++;
+    }
+    vq->shadow_avail_idx = vq->last_avail_idx = vq->inuse + vq->used_idx;
+
+    /* in case of I/O hang after reconnecting */
+    if (eventfd_write(vq->kick_fd, 1) ||
+        eventfd_write(vq->call_fd, 1)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 static bool
 vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -925,6 +974,10 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
                dev->vq[index].kick_fd, index);
     }
 
+    if (vu_check_queue_inflights(dev, &dev->vq[index])) {
+        vu_panic(dev, "Failed to check inflights for vq: %d\n", index);
+    }
+
     return false;
 }
 
@@ -1215,6 +1268,115 @@ vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static int
+vu_setup_shm(VuDev *dev)
+{
+    int i;
+    char *addr = (char *)dev->shm_info.addr;
+    uint64_t size = 0;
+    uint32_t vq_size = ALIGN_UP(dev->shm_info.vq_size, dev->shm_info.align);
+
+    if (dev->shm_info.version != SHM_VERSION) {
+        DPRINT("Invalid version for shm: %d", dev->shm_info.version);
+        return -1;
+    }
+
+    if (dev->shm_info.dev_size != 0) {
+        DPRINT("Invalid dev_size for shm: %d", dev->shm_info.dev_size);
+        return -1;
+    }
+
+    if (dev->shm_info.vq_size != sizeof(VuVirtqShm)) {
+        DPRINT("Invalid vq_size for shm: %d", dev->shm_info.vq_size);
+        return -1;
+    }
+
+    for (i = 0; i < VHOST_MAX_NR_VIRTQUEUE; i++) {
+        size += vq_size;
+        if (size > dev->shm_info.mmap_size) {
+            break;
+        }
+        dev->vq[i].shm = (VuVirtqShm *)addr;
+        addr += vq_size;
+    }
+
+    return 0;
+}
+
+static bool
+vu_get_shm_size(VuDev *dev, VhostUserMsg *vmsg)
+{
+    if (vmsg->size != sizeof(vmsg->payload.shm)) {
+        vu_panic(dev, "Invalid get_shm_size message:%d", vmsg->size);
+        vmsg->size = 0;
+        return true;
+    }
+
+    vmsg->payload.shm.dev_size = 0;
+    vmsg->payload.shm.vq_size = sizeof(VuVirtqShm);
+    vmsg->payload.shm.align = SHM_ALIGNMENT;
+    vmsg->payload.shm.version = SHM_VERSION;
+
+    DPRINT("send shm dev_size: %"PRId32"\n", vmsg->payload.shm.dev_size);
+    DPRINT("send shm vq_size: %"PRId32"\n", vmsg->payload.shm.vq_size);
+    DPRINT("send shm align: %"PRId32"\n", vmsg->payload.shm.align);
+    DPRINT("send shm version: %"PRId32"\n", vmsg->payload.shm.version);
+
+    return true;
+}
+
+static bool
+vu_set_shm_fd(VuDev *dev, VhostUserMsg *vmsg)
+{
+    int fd;
+    uint64_t mmap_size, mmap_offset;
+    void *rc;
+
+    if (vmsg->fd_num != 1 ||
+        vmsg->size != sizeof(vmsg->payload.shm)) {
+        vu_panic(dev, "Invalid set_shm_fd message size:%d fds:%d",
+                 vmsg->size, vmsg->fd_num);
+        return false;
+    }
+
+    fd = vmsg->fds[0];
+    mmap_size = vmsg->payload.shm.mmap_size;
+    mmap_offset = vmsg->payload.shm.mmap_offset;
+    DPRINT("set_shm_fd mmap_size: %"PRId64"\n", mmap_size);
+    DPRINT("set_shm_fd mmap_offset: %"PRId64"\n", mmap_offset);
+    DPRINT("set_shm_fd dev_size: %"PRId32"\n", vmsg->payload.shm.dev_size);
+    DPRINT("set_shm_fd vq_size: %"PRId32"\n", vmsg->payload.shm.vq_size);
+    DPRINT("set_shm_fd align: %"PRId32"\n", vmsg->payload.shm.align);
+    DPRINT("set_shm_fd version: %"PRId32"\n", vmsg->payload.shm.version);
+
+    rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+              fd, mmap_offset);
+
+    close(fd);
+
+    if (rc == MAP_FAILED) {
+        vu_panic(dev, "set_shm_fd mmap error: %s", strerror(errno));
+        return false;
+    }
+
+    if (dev->shm_info.addr) {
+        munmap(dev->shm_info.addr, dev->shm_info.mmap_size);
+    }
+    dev->shm_info.addr = rc;
+    dev->shm_info.mmap_size = mmap_size;
+    dev->shm_info.dev_size = vmsg->payload.shm.dev_size;
+    dev->shm_info.vq_size = vmsg->payload.shm.vq_size;
+    dev->shm_info.align = vmsg->payload.shm.align;
+    dev->shm_info.version = vmsg->payload.shm.version;
+
+    if (vu_setup_shm(dev)) {
+        vu_panic(dev, "setup shm failed");
+        return false;
+    }
+
+    return false;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1292,6 +1454,10 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_postcopy_listen(dev, vmsg);
     case VHOST_USER_POSTCOPY_END:
         return vu_set_postcopy_end(dev, vmsg);
+    case VHOST_USER_GET_SHM_SIZE:
+        return vu_get_shm_size(dev, vmsg);
+    case VHOST_USER_SET_SHM_FD:
+        return vu_set_shm_fd(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
@@ -1359,8 +1525,13 @@ vu_deinit(VuDev *dev)
             close(vq->err_fd);
             vq->err_fd = -1;
         }
+        vq->shm = NULL;
     }
 
+    if (dev->shm_info.addr) {
+        munmap(dev->shm_info.addr, dev->shm_info.mmap_size);
+        dev->shm_info.addr = NULL;
+    }
 
     vu_close_log(dev);
     if (dev->slave_fd != -1) {
@@ -1829,12 +2000,6 @@ virtqueue_map_desc(VuDev *dev,
     *p_num_sg = num_sg;
 }
 
-/* Round number down to multiple */
-#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
-
-/* Round number up to multiple */
-#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
-
 static void *
 virtqueue_alloc_element(size_t sz,
                                      unsigned out_num, unsigned in_num)
@@ -1935,9 +2100,44 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
     return elem;
 }
 
+static int
+vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if ((dev->protocol_features &
+        VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->shm)) {
+        return -1;
+    }
+
+    vq->shm->inflight[desc_idx] = 1;
+
+    return 0;
+}
+
+static int
+vu_queue_inflight_put(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if ((dev->protocol_features &
+        VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->shm)) {
+        return -1;
+    }
+
+    vq->shm->inflight[desc_idx] = 0;
+
+    return 0;
+}
+
 void *
 vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 {
+    int i;
     unsigned int head;
     VuVirtqElement *elem;
 
@@ -1946,6 +2146,12 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         return NULL;
     }
 
+    if (unlikely(vq->inflight_num > 0)) {
+        i = (--vq->inflight_num);
+        elem = vu_queue_map_desc(dev, vq, vq->inflight_desc[i], sz);
+        return elem;
+    }
+
     if (vu_queue_empty(dev, vq)) {
         return NULL;
     }
@@ -1976,6 +2182,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 
     vq->inuse++;
 
+    vu_queue_inflight_get(dev, vq, head);
+
     return elem;
 }
 
@@ -2121,4 +2329,5 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
 {
     vu_queue_fill(dev, vq, elem, len, 0);
     vu_queue_flush(dev, vq, 1);
+    vu_queue_inflight_put(dev, vq, elem->index);
 }
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 4aa55b4d2d..fdfda688d2 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -91,6 +92,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_GET_SHM_SIZE = 31,
+    VHOST_USER_SET_SHM_FD = 32,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -138,6 +141,15 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserShm {
+    uint64_t mmap_size;
+    uint64_t mmap_offset;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+} VhostUserShm;
+
 #if defined(_WIN32)
 # define VU_PACKED __attribute__((gcc_struct, packed))
 #else
@@ -163,6 +175,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         VhostUserConfig config;
         VhostUserVringArea area;
+        VhostUserShm shm;
     } payload;
 
     int fds[VHOST_MEMORY_MAX_NREGIONS];
@@ -234,9 +247,19 @@ typedef struct VuRing {
     uint32_t flags;
 } VuRing;
 
+typedef struct VuVirtqShm {
+    char inflight[VIRTQUEUE_MAX_SIZE];
+} VuVirtqShm;
+
 typedef struct VuVirtq {
     VuRing vring;
 
+    VuVirtqShm *shm;
+
+    uint16_t inflight_desc[VIRTQUEUE_MAX_SIZE];
+
+    uint16_t inflight_num;
+
     /* Next head to pop */
     uint16_t last_avail_idx;
 
@@ -279,11 +302,21 @@ typedef void (*vu_set_watch_cb) (VuDev *dev, int fd, int condition,
                                  vu_watch_cb cb, void *data);
 typedef void (*vu_remove_watch_cb) (VuDev *dev, int fd);
 
+typedef struct VuDevShmInfo {
+    void *addr;
+    uint64_t mmap_size;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+} VuDevShmInfo;
+
 struct VuDev {
     int sock;
     uint32_t nregions;
     VuDevRegion regions[VHOST_MEMORY_MAX_NREGIONS];
     VuVirtq vq[VHOST_MAX_NR_VIRTQUEUE];
+    VuDevShmInfo shm_info;
     int log_call_fd;
     int slave_fd;
     uint64_t log_size;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend
  2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (3 preceding siblings ...)
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
@ 2019-01-03 10:18 ` elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes
  6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch add supports for vhost-user-blk device to provide
shared memory to backend.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/block/vhost-user-blk.c          | 26 ++++++++++++++++++++++++++
 include/hw/virtio/vhost-user-blk.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 1451940845..27028cf996 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -126,6 +126,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
     }
 
     s->dev.acked_features = vdev->guest_features;
+
+    ret = vhost_dev_set_shm(&s->dev, s->shm);
+    if (ret < 0) {
+        error_report("Error set shared memory: %d", -ret);
+        goto err_guest_notifiers;
+    }
+
     ret = vhost_dev_start(&s->dev, vdev);
     if (ret < 0) {
         error_report("Error starting vhost: %d", -ret);
@@ -245,6 +252,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static void vhost_user_blk_reset(VirtIODevice *vdev)
+{
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    vhost_dev_reset_shm(s->shm);
+}
+
 static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -284,6 +298,8 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
                          vhost_user_blk_handle_output);
     }
 
+    s->shm = g_new0(struct vhost_shm, 1);
+
     s->dev.nvqs = s->num_queues;
     s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
     s->dev.vq_index = 0;
@@ -309,12 +325,19 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         s->blkcfg.num_queues = s->num_queues;
     }
 
+    ret = vhost_dev_init_shm(&s->dev, s->shm);
+    if (ret < 0) {
+        error_setg(errp, "vhost-user-blk: init shared memory failed");
+        goto vhost_err;
+    }
+
     return;
 
 vhost_err:
     vhost_dev_cleanup(&s->dev);
 virtio_err:
     g_free(s->dev.vqs);
+    g_free(s->shm);
     virtio_cleanup(vdev);
 
     vhost_user_cleanup(user);
@@ -329,7 +352,9 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
 
     vhost_user_blk_set_status(vdev, 0);
     vhost_dev_cleanup(&s->dev);
+    vhost_dev_free_shm(s->shm);
     g_free(s->dev.vqs);
+    g_free(s->shm);
     virtio_cleanup(vdev);
 
     if (s->vhost_user) {
@@ -379,6 +404,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data)
     vdc->set_config = vhost_user_blk_set_config;
     vdc->get_features = vhost_user_blk_get_features;
     vdc->set_status = vhost_user_blk_set_status;
+    vdc->reset = vhost_user_blk_reset;
 }
 
 static const TypeInfo vhost_user_blk_info = {
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index d52944aeeb..bb706d70b3 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -36,6 +36,7 @@ typedef struct VHostUserBlk {
     uint32_t queue_size;
     uint32_t config_wce;
     struct vhost_dev dev;
+    struct vhost_shm *shm;
     VhostUserState *vhost_user;
 } VHostUserBlk;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (4 preceding siblings ...)
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
@ 2019-01-03 10:18 ` elohimes
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes
  6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Since we now support the message VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD. The backend is able to restart
safely because it can record inflight I/O in shared memory.
This patch allows qemu to reconnect the backend after
connection closed.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Ni Xun <nixun@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/block/vhost-user-blk.c          | 205 +++++++++++++++++++++++------
 include/hw/virtio/vhost-user-blk.h |   4 +
 2 files changed, 168 insertions(+), 41 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 27028cf996..c0725e8e4a 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
     .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
 };
 
-static void vhost_user_blk_start(VirtIODevice *vdev)
+static int vhost_user_blk_start(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
@@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
 
     if (!k->set_guest_notifiers) {
         error_report("binding does not support guest notifiers");
-        return;
+        return -ENOSYS;
     }
 
     ret = vhost_dev_enable_notifiers(&s->dev, vdev);
     if (ret < 0) {
         error_report("Error enabling host notifiers: %d", -ret);
-        return;
+        return ret;
     }
 
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
@@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
         vhost_virtqueue_mask(&s->dev, vdev, i, false);
     }
 
-    return;
+    return ret;
 
 err_guest_notifiers:
     k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&s->dev, vdev);
+    return ret;
 }
 
 static void vhost_user_blk_stop(VirtIODevice *vdev)
@@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
-        return;
     }
 
     vhost_dev_disable_notifiers(&s->dev, vdev);
@@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
+    int ret;
 
     if (!vdev->vm_running) {
         should_start = false;
     }
 
-    if (s->dev.started == should_start) {
+    if (s->should_start == should_start) {
+        return;
+    }
+
+    if (!s->connected || s->dev.started == should_start) {
+        s->should_start = should_start;
         return;
     }
 
     if (should_start) {
-        vhost_user_blk_start(vdev);
+        s->should_start = true;
+        /*
+         * make sure vhost_user_blk_handle_output() ignores fake
+         * guest kick by vhost_dev_enable_notifiers()
+         */
+        barrier();
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            qemu_chr_fe_disconnect(&s->chardev);
+        }
     } else {
         vhost_user_blk_stop(vdev);
+        /*
+         * make sure vhost_user_blk_handle_output() ignore fake
+         * guest kick by vhost_dev_disable_notifiers()
+         */
+        barrier();
+        s->should_start = false;
     }
-
 }
 
 static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
@@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
 static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    int i;
+    int i, ret;
 
     if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
         !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
         return;
     }
 
+    if (s->should_start) {
+        return;
+    }
+    s->should_start = true;
+
+    if (!s->connected) {
+        return;
+    }
+
     if (s->dev.started) {
         return;
     }
@@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
      * vhost here instead of waiting for .set_status().
      */
-    vhost_user_blk_start(vdev);
+    ret = vhost_user_blk_start(vdev);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost start failed: %s",
+                     strerror(-ret));
+        qemu_chr_fe_disconnect(&s->chardev);
+        return;
+    }
 
     /* Kick right away to begin processing requests already in vring */
     for (i = 0; i < s->dev.nvqs; i++) {
@@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
     vhost_dev_reset_shm(s->shm);
 }
 
+static int vhost_user_blk_connect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    int ret = 0;
+
+    if (s->connected) {
+        return 0;
+    }
+    s->connected = true;
+
+    s->dev.nvqs = s->num_queues;
+    s->dev.vqs = s->vqs;
+    s->dev.vq_index = 0;
+    s->dev.backend_features = 0;
+
+    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
+
+    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost initialization failed: %s",
+                     strerror(-ret));
+        return ret;
+    }
+
+    /* restore vhost state */
+    if (s->should_start) {
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static void vhost_user_blk_disconnect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    if (!s->connected) {
+        return;
+    }
+    s->connected = false;
+
+    if (s->dev.started) {
+        vhost_user_blk_stop(vdev);
+    }
+
+    vhost_dev_cleanup(&s->dev);
+}
+
+static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
+                                     void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    qemu_chr_fe_disconnect(&s->chardev);
+
+    return true;
+}
+
+static void vhost_user_blk_event(void *opaque, int event)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    switch (event) {
+    case CHR_EVENT_OPENED:
+        if (vhost_user_blk_connect(dev) < 0) {
+            qemu_chr_fe_disconnect(&s->chardev);
+            return;
+        }
+        s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
+                                         vhost_user_blk_watch, dev);
+        break;
+    case CHR_EVENT_CLOSED:
+        vhost_user_blk_disconnect(dev);
+        if (s->watch) {
+            g_source_remove(s->watch);
+            s->watch = 0;
+        }
+        break;
+    }
+}
+
 static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     VhostUserState *user;
     int i, ret;
+    Error *err = NULL;
 
     if (!s->chardev.chr) {
         error_setg(errp, "vhost-user-blk: chardev is mandatory");
@@ -299,26 +429,28 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
     }
 
     s->shm = g_new0(struct vhost_shm, 1);
-
-    s->dev.nvqs = s->num_queues;
-    s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
-    s->dev.vq_index = 0;
-    s->dev.backend_features = 0;
-
-    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
-
-    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
-    if (ret < 0) {
-        error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
-                   strerror(-ret));
-        goto virtio_err;
-    }
+    s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
+    s->watch = 0;
+    s->should_start = false;
+    s->connected = false;
+
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, vhost_user_blk_event,
+                             NULL, (void *)dev, NULL, true);
+
+reconnect:
+    do {
+        if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
+            error_report_err(err);
+            err = NULL;
+            sleep(1);
+        }
+    } while (!s->connected);
 
     ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
-                              sizeof(struct virtio_blk_config));
+                               sizeof(struct virtio_blk_config));
     if (ret < 0) {
-        error_setg(errp, "vhost-user-blk: get block config failed");
-        goto vhost_err;
+        error_report("vhost-user-blk: get block config failed");
+        goto reconnect;
     }
 
     if (s->blkcfg.num_queues != s->num_queues) {
@@ -327,22 +459,11 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 
     ret = vhost_dev_init_shm(&s->dev, s->shm);
     if (ret < 0) {
-        error_setg(errp, "vhost-user-blk: init shared memory failed");
-        goto vhost_err;
+        error_report("vhost-user-blk: init shared memory failed");
+        goto reconnect;
     }
 
     return;
-
-vhost_err:
-    vhost_dev_cleanup(&s->dev);
-virtio_err:
-    g_free(s->dev.vqs);
-    g_free(s->shm);
-    virtio_cleanup(vdev);
-
-    vhost_user_cleanup(user);
-    g_free(user);
-    s->vhost_user = NULL;
 }
 
 static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
@@ -351,9 +472,11 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
     VHostUserBlk *s = VHOST_USER_BLK(dev);
 
     vhost_user_blk_set_status(vdev, 0);
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, NULL,
+                             NULL, NULL, NULL, false);
     vhost_dev_cleanup(&s->dev);
     vhost_dev_free_shm(s->shm);
-    g_free(s->dev.vqs);
+    g_free(s->vqs);
     g_free(s->shm);
     virtio_cleanup(vdev);
 
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index bb706d70b3..c17d47402b 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -38,6 +38,10 @@ typedef struct VHostUserBlk {
     struct vhost_dev dev;
     struct vhost_shm *shm;
     VhostUserState *vhost_user;
+    struct vhost_virtqueue *vqs;
+    guint watch;
+    bool should_start;
+    bool connected;
 } VHostUserBlk;
 
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording
  2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (5 preceding siblings ...)
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
@ 2019-01-03 10:18 ` elohimes
  6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
  To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch enables inflight I/O recording for
vhost-user-blk backend so that we could restart it safely.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/vhost-user-blk/vhost-user-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index 858221ad95..6d9a417f4a 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -327,7 +327,8 @@ vub_get_features(VuDev *dev)
 static uint64_t
 vub_get_protocol_features(VuDev *dev)
 {
-    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
+           1ull << VHOST_USER_PROTOCOL_F_SLAVE_SHMFD;
 }
 
 static int
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
@ 2019-01-03 17:02   ` Michael S. Tsirkin
  2019-01-04  2:31     ` Yongji Xie
  2019-01-03 17:13   ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2019-01-03 17:02 UTC (permalink / raw)
  To: elohimes
  Cc: marcandre.lureau, berrange, jasowang, maxime.coquelin, yury-kotov,
	wrfsh, qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
> 
> This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> and VHOST_USER_SET_SHM_FD to support providing shared
> memory to backend.

So this seems a bit vague. Since we are going to use it
for tracking in-flight I/O I would prefer it that we
actually call it that.


> 
> Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> required size of shared memory from backend. Then, qemu
> allocates memory and sends them

s/them/it/ ?

> back to backend through
> VHOST_USER_SET_SHM_FD.
> 
> Note that the shared memory should be used to record
> inflight I/O by backend. Qemu will clear it when vm reset.
> 
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Chai Wen <chaiwen@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  docs/interop/vhost-user.txt       |  41 +++++++++++
>  hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
>  hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-backend.h |   9 +++
>  include/hw/virtio/vhost.h         |  19 +++++
>  5 files changed, 272 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index c2194711d9..5ee9c28ab0 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
>     Offset: a 64-bit offset of this area from the start of the
>         supplied file descriptor
>  
> + * Shm description
> +   -----------------------------------
> +   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> +   -----------------------------------
> +
> +   Mmap_size: a 64-bit size of the shared memory
> +   Mmap_offset: a 64-bit offset of the shared memory from the start
> +                of the supplied file descriptor
> +   Dev_size: a 32-bit size of device region in shared memory
> +   Vq_size: a 32-bit size of each virtqueue region in shared memory
> +   Align: a 32-bit align of each region in shared memory
> +   Version: a 32-bit version of this shared memory
> +

This is an informal description so please avoid _ in field
names, just put a space in there. See e.g. log description.


>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {


For things to work, in-flight format must not change when
backend reconnects.

To encourage consistency, how about including a recommended format for
this buffer in this document?





> @@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserVringArea area;
> +        VhostUserShm shm;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -175,6 +189,7 @@ the ones that do:
>   * VHOST_USER_GET_PROTOCOL_FEATURES
>   * VHOST_USER_GET_VRING_BASE
>   * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> + * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>  
>  [ Also see the section on REPLY_ACK protocol extension. ]
>  
> @@ -188,6 +203,7 @@ in the ancillary data:
>   * VHOST_USER_SET_VRING_CALL
>   * VHOST_USER_SET_VRING_ERR
>   * VHOST_USER_SET_SLAVE_REQ_FD
> + * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>  
>  If Master is unable to send the full message or receives a wrong reply it will
>  close the connection. An optional reconnection mechanism can be implemented.
> @@ -397,6 +413,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_CONFIG         9
>  #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
>  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
> +#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
>  
>  Master message types
>  --------------------
> @@ -761,6 +778,30 @@ Master message types
>        was previously sent.
>        The value returned is an error indication; 0 is success.
>  
> + * VHOST_USER_GET_SHM_SIZE
> +      Id: 31
> +      Equivalent ioctl: N/A
> +      Master payload: shm description
> +
> +      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> +      successfully negotiated, master need to provide a shared memory to
> +      slave. This message is used by master to get required size from slave.
> +      The shared memory contains one region for device and several regions
> +      for virtqueue. The size of those two kinds of regions is specified
> +      by dev_size field and vq_size filed. The align field specify the alignment
> +      of those regions.
> +
> + * VHOST_USER_SET_SHM_FD
> +      Id: 32
> +      Equivalent ioctl: N/A
> +      Master payload: shm description
> +
> +      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> +      successfully negotiated, master uses this message to set shared memory
> +      for slave. The memory fd is passed in the ancillary data. The shared
> +      memory should be used to record inflight I/O by slave. And master will
> +      clear it when vm reset.
> +
>  Slave message types
>  -------------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e09bed0e4a..8cdf3b5121 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
>      VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> +    VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
>      VHOST_USER_POSTCOPY_ADVISE  = 28,
>      VHOST_USER_POSTCOPY_LISTEN  = 29,
>      VHOST_USER_POSTCOPY_END     = 30,
> +    VHOST_USER_GET_SHM_SIZE = 31,
> +    VHOST_USER_SET_SHM_FD = 32,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
> @@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
>      uint64_t offset;
>  } VhostUserVringArea;
>  
> +typedef struct VhostUserShm {
> +    uint64_t mmap_size;
> +    uint64_t mmap_offset;
> +    uint32_t dev_size;
> +    uint32_t vq_size;
> +    uint32_t align;
> +    uint32_t version;
> +} VhostUserShm;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -169,6 +181,7 @@ typedef union {
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
>          VhostUserVringArea area;
> +        VhostUserShm shm;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
>      return result;
>  }
>  
> +static int vhost_user_get_shm_size(struct vhost_dev *dev,
> +                                   struct vhost_shm *shm)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_GET_SHM_SIZE,
> +        .hdr.flags = VHOST_USER_VERSION,
> +        .hdr.size = sizeof(msg.payload.shm),
> +    };
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> +        shm->dev_size = 0;
> +        shm->vq_size = 0;
> +        return 0;
> +    }
> +
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
> +
> +    if (vhost_user_read(dev, &msg) < 0) {
> +        return -1;
> +    }
> +
> +    if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
> +        error_report("Received unexpected msg type. "
> +                     "Expected %d received %d",
> +                     VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
> +        return -1;
> +    }
> +
> +    if (msg.hdr.size != sizeof(msg.payload.shm)) {
> +        error_report("Received bad msg size.");
> +        return -1;
> +    }
> +
> +    shm->dev_size = msg.payload.shm.dev_size;
> +    shm->vq_size = msg.payload.shm.vq_size;
> +    shm->align = msg.payload.shm.align;
> +    shm->version = msg.payload.shm.version;
> +
> +    return 0;
> +}
> +
> +static int vhost_user_set_shm_fd(struct vhost_dev *dev,
> +                                 struct vhost_shm *shm)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_SET_SHM_FD,
> +        .hdr.flags = VHOST_USER_VERSION,
> +        .payload.shm.mmap_size = shm->mmap_size,
> +        .payload.shm.mmap_offset = 0,
> +        .payload.shm.dev_size = shm->dev_size,
> +        .payload.shm.vq_size = shm->vq_size,
> +        .payload.shm.align = shm->align,
> +        .payload.shm.version = shm->version,
> +        .hdr.size = sizeof(msg.payload.shm),
> +    };
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> +        return 0;
> +    }
> +
> +    if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  VhostUserState *vhost_user_init(void)
>  {
>      VhostUserState *user = g_new0(struct VhostUserState, 1);
> @@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
>          .vhost_crypto_create_session = vhost_user_crypto_create_session,
>          .vhost_crypto_close_session = vhost_user_crypto_close_session,
>          .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
> +        .vhost_get_shm_size = vhost_user_get_shm_size,
> +        .vhost_set_shm_fd = vhost_user_set_shm_fd,
>  };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 569c4053ea..7a38fed50f 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
>      hdev->config_ops = ops;
>  }
>  
> +void vhost_dev_reset_shm(struct vhost_shm *shm)
> +{
> +    if (shm->addr) {
> +        memset(shm->addr, 0, shm->mmap_size);
> +    }
> +}
> +
> +void vhost_dev_free_shm(struct vhost_shm *shm)
> +{
> +    if (shm->addr) {
> +        qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
> +        shm->addr = NULL;
> +        shm->fd = -1;
> +    }
> +}
> +
> +int vhost_dev_alloc_shm(struct vhost_shm *shm)
> +{
> +    Error *err = NULL;
> +    int fd = -1;
> +    void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
> +                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> +                                  &fd, &err);
> +    if (err) {
> +        error_report_err(err);
> +        return -1;
> +    }
> +
> +    shm->addr = addr;
> +    shm->fd = fd;
> +
> +    return 0;
> +}
> +
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> +    if (shm->addr) {
> +        qemu_put_be64(f, shm->mmap_size);
> +        qemu_put_be32(f, shm->dev_size);
> +        qemu_put_be32(f, shm->vq_size);
> +        qemu_put_be32(f, shm->align);
> +        qemu_put_be32(f, shm->version);
> +        qemu_put_buffer(f, shm->addr, shm->mmap_size);
> +    } else {
> +        qemu_put_be64(f, 0);
> +    }
> +}
> +
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> +    uint64_t mmap_size;
> +
> +    mmap_size = qemu_get_be64(f);
> +    if (!mmap_size) {
> +        return 0;
> +    }
> +
> +    vhost_dev_free_shm(shm);
> +
> +    shm->mmap_size = mmap_size;
> +    shm->dev_size = qemu_get_be32(f);
> +    shm->vq_size = qemu_get_be32(f);
> +    shm->align = qemu_get_be32(f);
> +    shm->version = qemu_get_be32(f);
> +
> +    if (vhost_dev_alloc_shm(shm)) {
> +        return -ENOMEM;
> +    }
> +
> +    qemu_get_buffer(f, shm->addr, mmap_size);
> +
> +    return 0;
> +}
> +
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> +    int r;
> +
> +    if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
> +        r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
> +        if (r) {
> +            VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
> +            return -errno;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> +    int r;
> +
> +    if (dev->vhost_ops->vhost_get_shm_size) {
> +        r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
> +        if (r) {
> +            VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
> +            return -errno;
> +        }
> +
> +        if (!shm->dev_size && !shm->vq_size) {
> +            return 0;
> +        }
> +
> +        shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
> +                         dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
> +
> +        if (vhost_dev_alloc_shm(shm)) {
> +            return -ENOMEM;
> +        }
> +
> +        vhost_dev_reset_shm(shm);
> +    }
> +
> +    return 0;
> +}
> +
>  /* Host notifiers must be enabled at this point. */
>  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>  {
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81283ec50f..4e7f13c9e9 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
>      VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>  } VhostSetConfigType;
>  
> +struct vhost_shm;
>  struct vhost_dev;
>  struct vhost_log;
>  struct vhost_memory;
> @@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
>  typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
>                                                  MemoryRegionSection *section);
>  
> +typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
> +                                     struct vhost_shm *shm);
> +
> +typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
> +                                   struct vhost_shm *shm);
> +
>  typedef struct VhostOps {
>      VhostBackendType backend_type;
>      vhost_backend_init vhost_backend_init;
> @@ -142,6 +149,8 @@ typedef struct VhostOps {
>      vhost_crypto_create_session_op vhost_crypto_create_session;
>      vhost_crypto_close_session_op vhost_crypto_close_session;
>      vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> +    vhost_get_shm_size_op vhost_get_shm_size;
> +    vhost_set_shm_fd_op vhost_set_shm_fd;
>  } VhostOps;
>  
>  extern const VhostOps user_ops;
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a7f449fa87..b6e3d6ab56 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -7,6 +7,17 @@
>  #include "exec/memory.h"
>  
>  /* Generic structures common for any vhost based device. */
> +
> +struct vhost_shm {
> +    void *addr;
> +    uint64_t mmap_size;
> +    uint32_t dev_size;
> +    uint32_t vq_size;
> +    uint32_t align;
> +    uint32_t version;
> +    int fd;
> +};
> +
>  struct vhost_virtqueue {
>      int kick;
>      int call;
> @@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
>   */
>  void vhost_dev_set_config_notifier(struct vhost_dev *dev,
>                                     const VhostDevConfigOps *ops);
> +
> +void vhost_dev_reset_shm(struct vhost_shm *shm);
> +void vhost_dev_free_shm(struct vhost_shm *shm);
> +int vhost_dev_alloc_shm(struct vhost_shm *shm);
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
>  #endif
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
  2019-01-03 17:02   ` Michael S. Tsirkin
@ 2019-01-03 17:13   ` Michael S. Tsirkin
  2019-01-04  3:20     ` Yongji Xie
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2019-01-03 17:13 UTC (permalink / raw)
  To: elohimes
  Cc: marcandre.lureau, berrange, jasowang, maxime.coquelin, yury-kotov,
	wrfsh, qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
> 
> This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> and VHOST_USER_SET_SHM_FD to support providing shared
> memory to backend.
> 
> Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> required size of shared memory from backend. Then, qemu
> allocates memory and sends them back to backend through
> VHOST_USER_SET_SHM_FD.

So this does create a security concern that remote
can supply a very big area.
How about returning a buffer from client to qemu?


> Note that the shared memory should be used to record
> inflight I/O by backend. Qemu will clear it when vm reset.
> 
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Chai Wen <chaiwen@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  docs/interop/vhost-user.txt       |  41 +++++++++++
>  hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
>  hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-backend.h |   9 +++
>  include/hw/virtio/vhost.h         |  19 +++++
>  5 files changed, 272 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index c2194711d9..5ee9c28ab0 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
>     Offset: a 64-bit offset of this area from the start of the
>         supplied file descriptor
>  
> + * Shm description
> +   -----------------------------------
> +   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> +   -----------------------------------
> +
> +   Mmap_size: a 64-bit size of the shared memory
> +   Mmap_offset: a 64-bit offset of the shared memory from the start
> +                of the supplied file descriptor
> +   Dev_size: a 32-bit size of device region in shared memory
> +   Vq_size: a 32-bit size of each virtqueue region in shared memory
> +   Align: a 32-bit align of each region in shared memory
> +   Version: a 32-bit version of this shared memory
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserVringArea area;
> +        VhostUserShm shm;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -175,6 +189,7 @@ the ones that do:
>   * VHOST_USER_GET_PROTOCOL_FEATURES
>   * VHOST_USER_GET_VRING_BASE
>   * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> + * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>  
>  [ Also see the section on REPLY_ACK protocol extension. ]
>  
> @@ -188,6 +203,7 @@ in the ancillary data:
>   * VHOST_USER_SET_VRING_CALL
>   * VHOST_USER_SET_VRING_ERR
>   * VHOST_USER_SET_SLAVE_REQ_FD
> + * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>  
>  If Master is unable to send the full message or receives a wrong reply it will
>  close the connection. An optional reconnection mechanism can be implemented.
> @@ -397,6 +413,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_CONFIG         9
>  #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
>  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
> +#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
>  
>  Master message types
>  --------------------
> @@ -761,6 +778,30 @@ Master message types
>        was previously sent.
>        The value returned is an error indication; 0 is success.
>  
> + * VHOST_USER_GET_SHM_SIZE
> +      Id: 31
> +      Equivalent ioctl: N/A
> +      Master payload: shm description
> +
> +      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> +      successfully negotiated, master need to provide a shared memory to
> +      slave. This message is used by master to get required size from slave.
> +      The shared memory contains one region for device and several regions
> +      for virtqueue. The size of those two kinds of regions is specified
> +      by dev_size field and vq_size filed. The align field specify the alignment
> +      of those regions.
> +
> + * VHOST_USER_SET_SHM_FD
> +      Id: 32
> +      Equivalent ioctl: N/A
> +      Master payload: shm description
> +
> +      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> +      successfully negotiated, master uses this message to set shared memory
> +      for slave. The memory fd is passed in the ancillary data. The shared
> +      memory should be used to record inflight I/O by slave. And master will
> +      clear it when vm reset.
> +
>  Slave message types
>  -------------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e09bed0e4a..8cdf3b5121 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
>      VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> +    VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
>      VHOST_USER_POSTCOPY_ADVISE  = 28,
>      VHOST_USER_POSTCOPY_LISTEN  = 29,
>      VHOST_USER_POSTCOPY_END     = 30,
> +    VHOST_USER_GET_SHM_SIZE = 31,
> +    VHOST_USER_SET_SHM_FD = 32,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
> @@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
>      uint64_t offset;
>  } VhostUserVringArea;
>  
> +typedef struct VhostUserShm {
> +    uint64_t mmap_size;
> +    uint64_t mmap_offset;
> +    uint32_t dev_size;
> +    uint32_t vq_size;
> +    uint32_t align;
> +    uint32_t version;
> +} VhostUserShm;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -169,6 +181,7 @@ typedef union {
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
>          VhostUserVringArea area;
> +        VhostUserShm shm;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
>      return result;
>  }
>  
> +static int vhost_user_get_shm_size(struct vhost_dev *dev,
> +                                   struct vhost_shm *shm)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_GET_SHM_SIZE,
> +        .hdr.flags = VHOST_USER_VERSION,
> +        .hdr.size = sizeof(msg.payload.shm),
> +    };
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> +        shm->dev_size = 0;
> +        shm->vq_size = 0;
> +        return 0;
> +    }
> +
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
> +
> +    if (vhost_user_read(dev, &msg) < 0) {
> +        return -1;
> +    }
> +
> +    if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
> +        error_report("Received unexpected msg type. "
> +                     "Expected %d received %d",
> +                     VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
> +        return -1;
> +    }
> +
> +    if (msg.hdr.size != sizeof(msg.payload.shm)) {
> +        error_report("Received bad msg size.");
> +        return -1;
> +    }
> +
> +    shm->dev_size = msg.payload.shm.dev_size;
> +    shm->vq_size = msg.payload.shm.vq_size;
> +    shm->align = msg.payload.shm.align;
> +    shm->version = msg.payload.shm.version;
> +
> +    return 0;
> +}
> +
> +static int vhost_user_set_shm_fd(struct vhost_dev *dev,
> +                                 struct vhost_shm *shm)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_SET_SHM_FD,
> +        .hdr.flags = VHOST_USER_VERSION,
> +        .payload.shm.mmap_size = shm->mmap_size,
> +        .payload.shm.mmap_offset = 0,
> +        .payload.shm.dev_size = shm->dev_size,
> +        .payload.shm.vq_size = shm->vq_size,
> +        .payload.shm.align = shm->align,
> +        .payload.shm.version = shm->version,
> +        .hdr.size = sizeof(msg.payload.shm),
> +    };
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> +        return 0;
> +    }
> +
> +    if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  VhostUserState *vhost_user_init(void)
>  {
>      VhostUserState *user = g_new0(struct VhostUserState, 1);
> @@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
>          .vhost_crypto_create_session = vhost_user_crypto_create_session,
>          .vhost_crypto_close_session = vhost_user_crypto_close_session,
>          .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
> +        .vhost_get_shm_size = vhost_user_get_shm_size,
> +        .vhost_set_shm_fd = vhost_user_set_shm_fd,
>  };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 569c4053ea..7a38fed50f 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
>      hdev->config_ops = ops;
>  }
>  
> +void vhost_dev_reset_shm(struct vhost_shm *shm)
> +{
> +    if (shm->addr) {
> +        memset(shm->addr, 0, shm->mmap_size);
> +    }
> +}
> +
> +void vhost_dev_free_shm(struct vhost_shm *shm)
> +{
> +    if (shm->addr) {
> +        qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
> +        shm->addr = NULL;
> +        shm->fd = -1;
> +    }
> +}
> +
> +int vhost_dev_alloc_shm(struct vhost_shm *shm)
> +{
> +    Error *err = NULL;
> +    int fd = -1;
> +    void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
> +                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> +                                  &fd, &err);
> +    if (err) {
> +        error_report_err(err);
> +        return -1;
> +    }
> +
> +    shm->addr = addr;
> +    shm->fd = fd;
> +
> +    return 0;
> +}
> +
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> +    if (shm->addr) {
> +        qemu_put_be64(f, shm->mmap_size);
> +        qemu_put_be32(f, shm->dev_size);
> +        qemu_put_be32(f, shm->vq_size);
> +        qemu_put_be32(f, shm->align);
> +        qemu_put_be32(f, shm->version);
> +        qemu_put_buffer(f, shm->addr, shm->mmap_size);
> +    } else {
> +        qemu_put_be64(f, 0);
> +    }
> +}
> +
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> +    uint64_t mmap_size;
> +
> +    mmap_size = qemu_get_be64(f);
> +    if (!mmap_size) {
> +        return 0;
> +    }
> +
> +    vhost_dev_free_shm(shm);
> +
> +    shm->mmap_size = mmap_size;
> +    shm->dev_size = qemu_get_be32(f);
> +    shm->vq_size = qemu_get_be32(f);
> +    shm->align = qemu_get_be32(f);
> +    shm->version = qemu_get_be32(f);
> +
> +    if (vhost_dev_alloc_shm(shm)) {
> +        return -ENOMEM;
> +    }
> +
> +    qemu_get_buffer(f, shm->addr, mmap_size);
> +
> +    return 0;
> +}
> +
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> +    int r;
> +
> +    if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
> +        r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
> +        if (r) {
> +            VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
> +            return -errno;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> +    int r;
> +
> +    if (dev->vhost_ops->vhost_get_shm_size) {
> +        r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
> +        if (r) {
> +            VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
> +            return -errno;
> +        }
> +
> +        if (!shm->dev_size && !shm->vq_size) {
> +            return 0;
> +        }
> +
> +        shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
> +                         dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
> +
> +        if (vhost_dev_alloc_shm(shm)) {
> +            return -ENOMEM;
> +        }
> +
> +        vhost_dev_reset_shm(shm);
> +    }
> +
> +    return 0;
> +}
> +
>  /* Host notifiers must be enabled at this point. */
>  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>  {
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81283ec50f..4e7f13c9e9 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
>      VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>  } VhostSetConfigType;
>  
> +struct vhost_shm;
>  struct vhost_dev;
>  struct vhost_log;
>  struct vhost_memory;
> @@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
>  typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
>                                                  MemoryRegionSection *section);
>  
> +typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
> +                                     struct vhost_shm *shm);
> +
> +typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
> +                                   struct vhost_shm *shm);
> +
>  typedef struct VhostOps {
>      VhostBackendType backend_type;
>      vhost_backend_init vhost_backend_init;
> @@ -142,6 +149,8 @@ typedef struct VhostOps {
>      vhost_crypto_create_session_op vhost_crypto_create_session;
>      vhost_crypto_close_session_op vhost_crypto_close_session;
>      vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> +    vhost_get_shm_size_op vhost_get_shm_size;
> +    vhost_set_shm_fd_op vhost_set_shm_fd;
>  } VhostOps;
>  
>  extern const VhostOps user_ops;
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a7f449fa87..b6e3d6ab56 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -7,6 +7,17 @@
>  #include "exec/memory.h"
>  
>  /* Generic structures common for any vhost based device. */
> +
> +struct vhost_shm {
> +    void *addr;
> +    uint64_t mmap_size;
> +    uint32_t dev_size;
> +    uint32_t vq_size;
> +    uint32_t align;
> +    uint32_t version;
> +    int fd;
> +};
> +
>  struct vhost_virtqueue {
>      int kick;
>      int call;
> @@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
>   */
>  void vhost_dev_set_config_notifier(struct vhost_dev *dev,
>                                     const VhostDevConfigOps *ops);
> +
> +void vhost_dev_reset_shm(struct vhost_shm *shm);
> +void vhost_dev_free_shm(struct vhost_shm *shm);
> +int vhost_dev_alloc_shm(struct vhost_shm *shm);
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
>  #endif
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2019-01-03 17:02   ` Michael S. Tsirkin
@ 2019-01-04  2:31     ` Yongji Xie
  2019-01-04  2:41       ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Yongji Xie @ 2019-01-04  2:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
	Coquelin, Maxime, Yury Kotov,
	Евгений Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Fri, 4 Jan 2019 at 01:02, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > and VHOST_USER_SET_SHM_FD to support providing shared
> > memory to backend.
>
> So this seems a bit vague. Since we are going to use it
> for tracking in-flight I/O I would prefer it that we
> actually call it that.
>
>

So how about VHOST_USER_GET_INFLIGHT_SIZE and VHOST_USER_SET_INFLIHGT_FD?

> >
> > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > required size of shared memory from backend. Then, qemu
> > allocates memory and sends them
>
> s/them/it/ ?
>

Will fix it in v4.

> > back to backend through
> > VHOST_USER_SET_SHM_FD.
> >
> > Note that the shared memory should be used to record
> > inflight I/O by backend. Qemu will clear it when vm reset.
> >
> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > Signed-off-by: Chai Wen <chaiwen@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > ---
> >  docs/interop/vhost-user.txt       |  41 +++++++++++
> >  hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
> >  hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
> >  include/hw/virtio/vhost-backend.h |   9 +++
> >  include/hw/virtio/vhost.h         |  19 +++++
> >  5 files changed, 272 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index c2194711d9..5ee9c28ab0 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> >     Offset: a 64-bit offset of this area from the start of the
> >         supplied file descriptor
> >
> > + * Shm description
> > +   -----------------------------------
> > +   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> > +   -----------------------------------
> > +
> > +   Mmap_size: a 64-bit size of the shared memory
> > +   Mmap_offset: a 64-bit offset of the shared memory from the start
> > +                of the supplied file descriptor
> > +   Dev_size: a 32-bit size of device region in shared memory
> > +   Vq_size: a 32-bit size of each virtqueue region in shared memory
> > +   Align: a 32-bit align of each region in shared memory
> > +   Version: a 32-bit version of this shared memory
> > +
>
> This is an informal description so please avoid _ in field
> names, just put a space in there. See e.g. log description.
>
>
Got it!

> >  In QEMU the vhost-user message is implemented with the following struct:
> >
> >  typedef struct VhostUserMsg {
>
>
> For things to work, in-flight format must not change when
> backend reconnects.
>

I'm not sure whether there will be some cases that we want to add some fields to
the inflight area without stopping vm.

> To encourage consistency, how about including a recommended format for
> this buffer in this document?
>
>

Sure. Will add it in v4.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2019-01-04  2:31     ` Yongji Xie
@ 2019-01-04  2:41       ` Michael S. Tsirkin
  2019-01-04  3:16         ` Yongji Xie
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2019-01-04  2:41 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
	Coquelin, Maxime, Yury Kotov,
	Евгений Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Fri, Jan 04, 2019 at 10:31:34AM +0800, Yongji Xie wrote:
> On Fri, 4 Jan 2019 at 01:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > > From: Xie Yongji <xieyongji@baidu.com>
> > >
> > > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > > and VHOST_USER_SET_SHM_FD to support providing shared
> > > memory to backend.
> >
> > So this seems a bit vague. Since we are going to use it
> > for tracking in-flight I/O I would prefer it that we
> > actually call it that.
> >
> >
> 
> So how about VHOST_USER_GET_INFLIGHT_SIZE and VHOST_USER_SET_INFLIHGT_FD?

Sounds good.

> > >
> > > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > > required size of shared memory from backend. Then, qemu
> > > allocates memory and sends them
> >
> > s/them/it/ ?
> >
> 
> Will fix it in v4.
> 
> > > back to backend through
> > > VHOST_USER_SET_SHM_FD.
> > >
> > > Note that the shared memory should be used to record
> > > inflight I/O by backend. Qemu will clear it when vm reset.
> > >
> > > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > > Signed-off-by: Chai Wen <chaiwen@baidu.com>
> > > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > > ---
> > >  docs/interop/vhost-user.txt       |  41 +++++++++++
> > >  hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
> > >  hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
> > >  include/hw/virtio/vhost-backend.h |   9 +++
> > >  include/hw/virtio/vhost.h         |  19 +++++
> > >  5 files changed, 272 insertions(+)
> > >
> > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > index c2194711d9..5ee9c28ab0 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> > >     Offset: a 64-bit offset of this area from the start of the
> > >         supplied file descriptor
> > >
> > > + * Shm description
> > > +   -----------------------------------
> > > +   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> > > +   -----------------------------------
> > > +
> > > +   Mmap_size: a 64-bit size of the shared memory
> > > +   Mmap_offset: a 64-bit offset of the shared memory from the start
> > > +                of the supplied file descriptor
> > > +   Dev_size: a 32-bit size of device region in shared memory
> > > +   Vq_size: a 32-bit size of each virtqueue region in shared memory
> > > +   Align: a 32-bit align of each region in shared memory
> > > +   Version: a 32-bit version of this shared memory
> > > +
> >
> > This is an informal description so please avoid _ in field
> > names, just put a space in there. See e.g. log description.
> >
> >
> Got it!
> 
> > >  In QEMU the vhost-user message is implemented with the following struct:
> > >
> > >  typedef struct VhostUserMsg {
> >
> >
> > For things to work, in-flight format must not change when
> > backend reconnects.
> >
> 
> I'm not sure whether there will be some cases that we want to add some fields to
> the inflight area without stopping vm.

Sorry I'm not sure I understand this comment. All I am saying is that
when one backend disconnects and another reconnects they must agree on
the format, so it's a good idea to document it.

> > To encourage consistency, how about including a recommended format for
> > this buffer in this document?
> >
> >
> 
> Sure. Will add it in v4.
> 
> Thanks,
> Yongji

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2019-01-04  2:41       ` Michael S. Tsirkin
@ 2019-01-04  3:16         ` Yongji Xie
  0 siblings, 0 replies; 14+ messages in thread
From: Yongji Xie @ 2019-01-04  3:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
	Coquelin, Maxime, Yury Kotov,
	Евгений Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Fri, 4 Jan 2019 at 10:41, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 04, 2019 at 10:31:34AM +0800, Yongji Xie wrote:
> > On Fri, 4 Jan 2019 at 01:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > > > From: Xie Yongji <xieyongji@baidu.com>
> > > >
> > > > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > > > and VHOST_USER_SET_SHM_FD to support providing shared
> > > > memory to backend.
> > >
> > > So this seems a bit vague. Since we are going to use it
> > > for tracking in-flight I/O I would prefer it that we
> > > actually call it that.
> > >
> > >
> >
> > So how about VHOST_USER_GET_INFLIGHT_SIZE and VHOST_USER_SET_INFLIHGT_FD?
>
> Sounds good.
>
> > > >
> > > > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > > > required size of shared memory from backend. Then, qemu
> > > > allocates memory and sends them
> > >
> > > s/them/it/ ?
> > >
> >
> > Will fix it in v4.
> >
> > > > back to backend through
> > > > VHOST_USER_SET_SHM_FD.
> > > >
> > > > Note that the shared memory should be used to record
> > > > inflight I/O by backend. Qemu will clear it when vm reset.
> > > >
> > > > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > > > Signed-off-by: Chai Wen <chaiwen@baidu.com>
> > > > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > > > ---
> > > >  docs/interop/vhost-user.txt       |  41 +++++++++++
> > > >  hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
> > > >  hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
> > > >  include/hw/virtio/vhost-backend.h |   9 +++
> > > >  include/hw/virtio/vhost.h         |  19 +++++
> > > >  5 files changed, 272 insertions(+)
> > > >
> > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > index c2194711d9..5ee9c28ab0 100644
> > > > --- a/docs/interop/vhost-user.txt
> > > > +++ b/docs/interop/vhost-user.txt
> > > > @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> > > >     Offset: a 64-bit offset of this area from the start of the
> > > >         supplied file descriptor
> > > >
> > > > + * Shm description
> > > > +   -----------------------------------
> > > > +   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> > > > +   -----------------------------------
> > > > +
> > > > +   Mmap_size: a 64-bit size of the shared memory
> > > > +   Mmap_offset: a 64-bit offset of the shared memory from the start
> > > > +                of the supplied file descriptor
> > > > +   Dev_size: a 32-bit size of device region in shared memory
> > > > +   Vq_size: a 32-bit size of each virtqueue region in shared memory
> > > > +   Align: a 32-bit align of each region in shared memory
> > > > +   Version: a 32-bit version of this shared memory
> > > > +
> > >
> > > This is an informal description so please avoid _ in field
> > > names, just put a space in there. See e.g. log description.
> > >
> > >
> > Got it!
> >
> > > >  In QEMU the vhost-user message is implemented with the following struct:
> > > >
> > > >  typedef struct VhostUserMsg {
> > >
> > >
> > > For things to work, in-flight format must not change when
> > > backend reconnects.
> > >
> >
> > I'm not sure whether there will be some cases that we want to add some fields to
> > the inflight area without stopping vm.
>
> Sorry I'm not sure I understand this comment. All I am saying is that
> when one backend disconnects and another reconnects they must agree on
> the format, so it's a good idea to document it.
>

Oh, sorry. I may have misunderstood. I will document the format in v4.
Thank you.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2019-01-03 17:13   ` Michael S. Tsirkin
@ 2019-01-04  3:20     ` Yongji Xie
  0 siblings, 0 replies; 14+ messages in thread
From: Yongji Xie @ 2019-01-04  3:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
	Coquelin, Maxime, Yury Kotov,
	Евгений Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Fri, 4 Jan 2019 at 01:13, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > and VHOST_USER_SET_SHM_FD to support providing shared
> > memory to backend.
> >
> > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > required size of shared memory from backend. Then, qemu
> > allocates memory and sends them back to backend through
> > VHOST_USER_SET_SHM_FD.
>
> So this does create a security concern that remote
> can supply a very big area.
> How about returning a buffer from client to qemu?
>

That's a good idea! Will do it v4.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-04  3:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
2019-01-03 17:02   ` Michael S. Tsirkin
2019-01-04  2:31     ` Yongji Xie
2019-01-04  2:41       ` Michael S. Tsirkin
2019-01-04  3:16         ` Yongji Xie
2019-01-03 17:13   ` Michael S. Tsirkin
2019-01-04  3:20     ` Yongji Xie
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).