* [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
@ 2019-01-03 10:18 ` elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
` (5 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
yury-kotov, wrfsh
Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
From: Xie Yongji <xieyongji@baidu.com>
Enable "nowait" option to make QEMU not do a connect
on client sockets during initialization of the chardev.
Then we can use qemu_chr_fe_wait_connected() to connect
when necessary. Now it would be used for unix domain
socket of vhost-user-blk device to support reconnect.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
chardev/char-socket.c | 56 +++++++++++++++++++++----------------------
qapi/char.json | 3 +--
qemu-options.hx | 9 ++++---
3 files changed, 35 insertions(+), 33 deletions(-)
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index eaa8e8b68f..f803f4f7d3 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -1072,37 +1072,37 @@ static void qmp_chardev_open_socket(Chardev *chr,
s->reconnect_time = reconnect;
}
- if (s->reconnect_time) {
- tcp_chr_connect_async(chr);
- } else {
- if (s->is_listen) {
- char *name;
- s->listener = qio_net_listener_new();
+ if (s->is_listen) {
+ char *name;
+ s->listener = qio_net_listener_new();
- name = g_strdup_printf("chardev-tcp-listener-%s", chr->label);
- qio_net_listener_set_name(s->listener, name);
- g_free(name);
+ name = g_strdup_printf("chardev-tcp-listener-%s", chr->label);
+ qio_net_listener_set_name(s->listener, name);
+ g_free(name);
- if (qio_net_listener_open_sync(s->listener, s->addr, errp) < 0) {
- object_unref(OBJECT(s->listener));
- s->listener = NULL;
- goto error;
- }
+ if (qio_net_listener_open_sync(s->listener, s->addr, errp) < 0) {
+ object_unref(OBJECT(s->listener));
+ s->listener = NULL;
+ goto error;
+ }
- qapi_free_SocketAddress(s->addr);
- s->addr = socket_local_address(s->listener->sioc[0]->fd, errp);
- update_disconnected_filename(s);
+ qapi_free_SocketAddress(s->addr);
+ s->addr = socket_local_address(s->listener->sioc[0]->fd, errp);
+ update_disconnected_filename(s);
- if (is_waitconnect &&
- qemu_chr_wait_connected(chr, errp) < 0) {
- return;
- }
- if (!s->ioc) {
- qio_net_listener_set_client_func_full(s->listener,
- tcp_chr_accept,
- chr, NULL,
- chr->gcontext);
- }
+ if (is_waitconnect &&
+ qemu_chr_wait_connected(chr, errp) < 0) {
+ return;
+ }
+ if (!s->ioc) {
+ qio_net_listener_set_client_func_full(s->listener,
+ tcp_chr_accept,
+ chr, NULL,
+ chr->gcontext);
+ }
+ } else if (is_waitconnect) {
+ if (s->reconnect_time) {
+ tcp_chr_connect_async(chr);
} else if (qemu_chr_wait_connected(chr, errp) < 0) {
goto error;
}
@@ -1120,7 +1120,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
Error **errp)
{
bool is_listen = qemu_opt_get_bool(opts, "server", false);
- bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
+ bool is_waitconnect = qemu_opt_get_bool(opts, "wait", true);
bool is_telnet = qemu_opt_get_bool(opts, "telnet", false);
bool is_tn3270 = qemu_opt_get_bool(opts, "tn3270", false);
bool is_websock = qemu_opt_get_bool(opts, "websocket", false);
diff --git a/qapi/char.json b/qapi/char.json
index 77ed847972..6a3b5bcd71 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -249,8 +249,7 @@
# or connect to (server=false)
# @tls-creds: the ID of the TLS credentials object (since 2.6)
# @server: create server socket (default: true)
-# @wait: wait for incoming connection on server
-# sockets (default: false).
+# @wait: wait for being connected or connecting to (default: false)
# @nodelay: set TCP_NODELAY socket option (default: false)
# @telnet: enable telnet protocol on server
# sockets (default: false)
diff --git a/qemu-options.hx b/qemu-options.hx
index df42116ecc..66d99c6e83 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2556,8 +2556,9 @@ undefined if TCP options are specified for a unix socket.
@option{server} specifies that the socket shall be a listening socket.
-@option{nowait} specifies that QEMU should not block waiting for a client to
-connect to a listening socket.
+@option{nowait} specifies that QEMU should not wait for being connected on
+server sockets or try to do a sync/async connect on client sockets during
+initialization of the chardev.
@option{telnet} specifies that traffic on the socket should interpret telnet
escape sequences.
@@ -3093,7 +3094,9 @@ I/O to a location or wait for a connection from a location. By default
the TCP Net Console is sent to @var{host} at the @var{port}. If you use
the @var{server} option QEMU will wait for a client socket application
to connect to the port before continuing, unless the @code{nowait}
-option was specified. The @code{nodelay} option disables the Nagle buffering
+option was specified. And the @code{nowait} option could also be
+used when @var{noserver} is set to disallow QEMU to connect during
+initialization. The @code{nodelay} option disables the Nagle buffering
algorithm. The @code{reconnect} option only applies if @var{noserver} is
set, if the connection goes down it will attempt to reconnect at the
given interval. If @var{host} is omitted, 0.0.0.0 is assumed. Only
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets elohimes
@ 2019-01-03 10:18 ` elohimes
2019-01-03 17:02 ` Michael S. Tsirkin
2019-01-03 17:13 ` Michael S. Tsirkin
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
` (4 subsequent siblings)
6 siblings, 2 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
yury-kotov, wrfsh
Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
From: Xie Yongji <xieyongji@baidu.com>
This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD to support providing shared
memory to backend.
Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
required size of shared memory from backend. Then, qemu
allocates memory and sends them back to backend through
VHOST_USER_SET_SHM_FD.
Note that the shared memory should be used to record
inflight I/O by backend. Qemu will clear it when vm reset.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
docs/interop/vhost-user.txt | 41 +++++++++++
hw/virtio/vhost-user.c | 86 ++++++++++++++++++++++
hw/virtio/vhost.c | 117 ++++++++++++++++++++++++++++++
include/hw/virtio/vhost-backend.h | 9 +++
include/hw/virtio/vhost.h | 19 +++++
5 files changed, 272 insertions(+)
diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index c2194711d9..5ee9c28ab0 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -142,6 +142,19 @@ Depending on the request type, payload can be:
Offset: a 64-bit offset of this area from the start of the
supplied file descriptor
+ * Shm description
+ -----------------------------------
+ | mmap_size | mmap_offset | dev_size | vq_size | align | version |
+ -----------------------------------
+
+ Mmap_size: a 64-bit size of the shared memory
+ Mmap_offset: a 64-bit offset of the shared memory from the start
+ of the supplied file descriptor
+ Dev_size: a 32-bit size of device region in shared memory
+ Vq_size: a 32-bit size of each virtqueue region in shared memory
+ Align: a 32-bit align of each region in shared memory
+ Version: a 32-bit version of this shared memory
+
In QEMU the vhost-user message is implemented with the following struct:
typedef struct VhostUserMsg {
@@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
struct vhost_iotlb_msg iotlb;
VhostUserConfig config;
VhostUserVringArea area;
+ VhostUserShm shm;
};
} QEMU_PACKED VhostUserMsg;
@@ -175,6 +189,7 @@ the ones that do:
* VHOST_USER_GET_PROTOCOL_FEATURES
* VHOST_USER_GET_VRING_BASE
* VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
+ * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
[ Also see the section on REPLY_ACK protocol extension. ]
@@ -188,6 +203,7 @@ in the ancillary data:
* VHOST_USER_SET_VRING_CALL
* VHOST_USER_SET_VRING_ERR
* VHOST_USER_SET_SLAVE_REQ_FD
+ * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
If Master is unable to send the full message or receives a wrong reply it will
close the connection. An optional reconnection mechanism can be implemented.
@@ -397,6 +413,7 @@ Protocol features
#define VHOST_USER_PROTOCOL_F_CONFIG 9
#define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10
#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
+#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
Master message types
--------------------
@@ -761,6 +778,30 @@ Master message types
was previously sent.
The value returned is an error indication; 0 is success.
+ * VHOST_USER_GET_SHM_SIZE
+ Id: 31
+ Equivalent ioctl: N/A
+ Master payload: shm description
+
+ When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
+ successfully negotiated, master need to provide a shared memory to
+ slave. This message is used by master to get required size from slave.
+ The shared memory contains one region for device and several regions
+ for virtqueue. The size of those two kinds of regions is specified
+ by dev_size field and vq_size filed. The align field specify the alignment
+ of those regions.
+
+ * VHOST_USER_SET_SHM_FD
+ Id: 32
+ Equivalent ioctl: N/A
+ Master payload: shm description
+
+ When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
+ successfully negotiated, master uses this message to set shared memory
+ for slave. The memory fd is passed in the ancillary data. The shared
+ memory should be used to record inflight I/O by slave. And master will
+ clear it when vm reset.
+
Slave message types
-------------------
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e09bed0e4a..8cdf3b5121 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
VHOST_USER_PROTOCOL_F_CONFIG = 9,
VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+ VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
VHOST_USER_PROTOCOL_F_MAX
};
@@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
VHOST_USER_POSTCOPY_ADVISE = 28,
VHOST_USER_POSTCOPY_LISTEN = 29,
VHOST_USER_POSTCOPY_END = 30,
+ VHOST_USER_GET_SHM_SIZE = 31,
+ VHOST_USER_SET_SHM_FD = 32,
VHOST_USER_MAX
} VhostUserRequest;
@@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
uint64_t offset;
} VhostUserVringArea;
+typedef struct VhostUserShm {
+ uint64_t mmap_size;
+ uint64_t mmap_offset;
+ uint32_t dev_size;
+ uint32_t vq_size;
+ uint32_t align;
+ uint32_t version;
+} VhostUserShm;
+
typedef struct {
VhostUserRequest request;
@@ -169,6 +181,7 @@ typedef union {
VhostUserConfig config;
VhostUserCryptoSession session;
VhostUserVringArea area;
+ VhostUserShm shm;
} VhostUserPayload;
typedef struct VhostUserMsg {
@@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
return result;
}
+static int vhost_user_get_shm_size(struct vhost_dev *dev,
+ struct vhost_shm *shm)
+{
+ VhostUserMsg msg = {
+ .hdr.request = VHOST_USER_GET_SHM_SIZE,
+ .hdr.flags = VHOST_USER_VERSION,
+ .hdr.size = sizeof(msg.payload.shm),
+ };
+
+ if (!virtio_has_feature(dev->protocol_features,
+ VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
+ shm->dev_size = 0;
+ shm->vq_size = 0;
+ return 0;
+ }
+
+ if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+ return -1;
+ }
+
+ if (vhost_user_read(dev, &msg) < 0) {
+ return -1;
+ }
+
+ if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
+ error_report("Received unexpected msg type. "
+ "Expected %d received %d",
+ VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
+ return -1;
+ }
+
+ if (msg.hdr.size != sizeof(msg.payload.shm)) {
+ error_report("Received bad msg size.");
+ return -1;
+ }
+
+ shm->dev_size = msg.payload.shm.dev_size;
+ shm->vq_size = msg.payload.shm.vq_size;
+ shm->align = msg.payload.shm.align;
+ shm->version = msg.payload.shm.version;
+
+ return 0;
+}
+
+static int vhost_user_set_shm_fd(struct vhost_dev *dev,
+ struct vhost_shm *shm)
+{
+ VhostUserMsg msg = {
+ .hdr.request = VHOST_USER_SET_SHM_FD,
+ .hdr.flags = VHOST_USER_VERSION,
+ .payload.shm.mmap_size = shm->mmap_size,
+ .payload.shm.mmap_offset = 0,
+ .payload.shm.dev_size = shm->dev_size,
+ .payload.shm.vq_size = shm->vq_size,
+ .payload.shm.align = shm->align,
+ .payload.shm.version = shm->version,
+ .hdr.size = sizeof(msg.payload.shm),
+ };
+
+ if (!virtio_has_feature(dev->protocol_features,
+ VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
+ return 0;
+ }
+
+ if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
+ return -1;
+ }
+
+ return 0;
+}
+
VhostUserState *vhost_user_init(void)
{
VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
.vhost_crypto_create_session = vhost_user_crypto_create_session,
.vhost_crypto_close_session = vhost_user_crypto_close_session,
.vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
+ .vhost_get_shm_size = vhost_user_get_shm_size,
+ .vhost_set_shm_fd = vhost_user_set_shm_fd,
};
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 569c4053ea..7a38fed50f 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
hdev->config_ops = ops;
}
+void vhost_dev_reset_shm(struct vhost_shm *shm)
+{
+ if (shm->addr) {
+ memset(shm->addr, 0, shm->mmap_size);
+ }
+}
+
+void vhost_dev_free_shm(struct vhost_shm *shm)
+{
+ if (shm->addr) {
+ qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
+ shm->addr = NULL;
+ shm->fd = -1;
+ }
+}
+
+int vhost_dev_alloc_shm(struct vhost_shm *shm)
+{
+ Error *err = NULL;
+ int fd = -1;
+ void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
+ F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
+ &fd, &err);
+ if (err) {
+ error_report_err(err);
+ return -1;
+ }
+
+ shm->addr = addr;
+ shm->fd = fd;
+
+ return 0;
+}
+
+void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
+{
+ if (shm->addr) {
+ qemu_put_be64(f, shm->mmap_size);
+ qemu_put_be32(f, shm->dev_size);
+ qemu_put_be32(f, shm->vq_size);
+ qemu_put_be32(f, shm->align);
+ qemu_put_be32(f, shm->version);
+ qemu_put_buffer(f, shm->addr, shm->mmap_size);
+ } else {
+ qemu_put_be64(f, 0);
+ }
+}
+
+int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
+{
+ uint64_t mmap_size;
+
+ mmap_size = qemu_get_be64(f);
+ if (!mmap_size) {
+ return 0;
+ }
+
+ vhost_dev_free_shm(shm);
+
+ shm->mmap_size = mmap_size;
+ shm->dev_size = qemu_get_be32(f);
+ shm->vq_size = qemu_get_be32(f);
+ shm->align = qemu_get_be32(f);
+ shm->version = qemu_get_be32(f);
+
+ if (vhost_dev_alloc_shm(shm)) {
+ return -ENOMEM;
+ }
+
+ qemu_get_buffer(f, shm->addr, mmap_size);
+
+ return 0;
+}
+
+int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
+{
+ int r;
+
+ if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
+ r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
+ if (r) {
+ VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
+ return -errno;
+ }
+ }
+
+ return 0;
+}
+
+int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
+{
+ int r;
+
+ if (dev->vhost_ops->vhost_get_shm_size) {
+ r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
+ if (r) {
+ VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
+ return -errno;
+ }
+
+ if (!shm->dev_size && !shm->vq_size) {
+ return 0;
+ }
+
+ shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
+ dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
+
+ if (vhost_dev_alloc_shm(shm)) {
+ return -ENOMEM;
+ }
+
+ vhost_dev_reset_shm(shm);
+ }
+
+ return 0;
+}
+
/* Host notifiers must be enabled at this point. */
int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
{
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81283ec50f..4e7f13c9e9 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
} VhostSetConfigType;
+struct vhost_shm;
struct vhost_dev;
struct vhost_log;
struct vhost_memory;
@@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
MemoryRegionSection *section);
+typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
+ struct vhost_shm *shm);
+
+typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
+ struct vhost_shm *shm);
+
typedef struct VhostOps {
VhostBackendType backend_type;
vhost_backend_init vhost_backend_init;
@@ -142,6 +149,8 @@ typedef struct VhostOps {
vhost_crypto_create_session_op vhost_crypto_create_session;
vhost_crypto_close_session_op vhost_crypto_close_session;
vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
+ vhost_get_shm_size_op vhost_get_shm_size;
+ vhost_set_shm_fd_op vhost_set_shm_fd;
} VhostOps;
extern const VhostOps user_ops;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a7f449fa87..b6e3d6ab56 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -7,6 +7,17 @@
#include "exec/memory.h"
/* Generic structures common for any vhost based device. */
+
+struct vhost_shm {
+ void *addr;
+ uint64_t mmap_size;
+ uint32_t dev_size;
+ uint32_t vq_size;
+ uint32_t align;
+ uint32_t version;
+ int fd;
+};
+
struct vhost_virtqueue {
int kick;
int call;
@@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
*/
void vhost_dev_set_config_notifier(struct vhost_dev *dev,
const VhostDevConfigOps *ops);
+
+void vhost_dev_reset_shm(struct vhost_shm *shm);
+void vhost_dev_free_shm(struct vhost_shm *shm);
+int vhost_dev_alloc_shm(struct vhost_shm *shm);
+void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
+int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
+int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
+int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
@ 2019-01-03 17:02 ` Michael S. Tsirkin
2019-01-04 2:31 ` Yongji Xie
2019-01-03 17:13 ` Michael S. Tsirkin
1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2019-01-03 17:02 UTC (permalink / raw)
To: elohimes
Cc: marcandre.lureau, berrange, jasowang, maxime.coquelin, yury-kotov,
wrfsh, qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> and VHOST_USER_SET_SHM_FD to support providing shared
> memory to backend.
So this seems a bit vague. Since we are going to use it
for tracking in-flight I/O I would prefer it that we
actually call it that.
>
> Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> required size of shared memory from backend. Then, qemu
> allocates memory and sends them
s/them/it/ ?
> back to backend through
> VHOST_USER_SET_SHM_FD.
>
> Note that the shared memory should be used to record
> inflight I/O by backend. Qemu will clear it when vm reset.
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Chai Wen <chaiwen@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
> docs/interop/vhost-user.txt | 41 +++++++++++
> hw/virtio/vhost-user.c | 86 ++++++++++++++++++++++
> hw/virtio/vhost.c | 117 ++++++++++++++++++++++++++++++
> include/hw/virtio/vhost-backend.h | 9 +++
> include/hw/virtio/vhost.h | 19 +++++
> 5 files changed, 272 insertions(+)
>
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index c2194711d9..5ee9c28ab0 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> Offset: a 64-bit offset of this area from the start of the
> supplied file descriptor
>
> + * Shm description
> + -----------------------------------
> + | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> + -----------------------------------
> +
> + Mmap_size: a 64-bit size of the shared memory
> + Mmap_offset: a 64-bit offset of the shared memory from the start
> + of the supplied file descriptor
> + Dev_size: a 32-bit size of device region in shared memory
> + Vq_size: a 32-bit size of each virtqueue region in shared memory
> + Align: a 32-bit align of each region in shared memory
> + Version: a 32-bit version of this shared memory
> +
This is an informal description so please avoid _ in field
names, just put a space in there. See e.g. log description.
> In QEMU the vhost-user message is implemented with the following struct:
>
> typedef struct VhostUserMsg {
For things to work, in-flight format must not change when
backend reconnects.
To encourage consistency, how about including a recommended format for
this buffer in this document?
> @@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
> struct vhost_iotlb_msg iotlb;
> VhostUserConfig config;
> VhostUserVringArea area;
> + VhostUserShm shm;
> };
> } QEMU_PACKED VhostUserMsg;
>
> @@ -175,6 +189,7 @@ the ones that do:
> * VHOST_USER_GET_PROTOCOL_FEATURES
> * VHOST_USER_GET_VRING_BASE
> * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> + * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>
> [ Also see the section on REPLY_ACK protocol extension. ]
>
> @@ -188,6 +203,7 @@ in the ancillary data:
> * VHOST_USER_SET_VRING_CALL
> * VHOST_USER_SET_VRING_ERR
> * VHOST_USER_SET_SLAVE_REQ_FD
> + * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>
> If Master is unable to send the full message or receives a wrong reply it will
> close the connection. An optional reconnection mechanism can be implemented.
> @@ -397,6 +413,7 @@ Protocol features
> #define VHOST_USER_PROTOCOL_F_CONFIG 9
> #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10
> #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
> +#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
>
> Master message types
> --------------------
> @@ -761,6 +778,30 @@ Master message types
> was previously sent.
> The value returned is an error indication; 0 is success.
>
> + * VHOST_USER_GET_SHM_SIZE
> + Id: 31
> + Equivalent ioctl: N/A
> + Master payload: shm description
> +
> + When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> + successfully negotiated, master need to provide a shared memory to
> + slave. This message is used by master to get required size from slave.
> + The shared memory contains one region for device and several regions
> + for virtqueue. The size of those two kinds of regions is specified
> + by dev_size field and vq_size filed. The align field specify the alignment
> + of those regions.
> +
> + * VHOST_USER_SET_SHM_FD
> + Id: 32
> + Equivalent ioctl: N/A
> + Master payload: shm description
> +
> + When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> + successfully negotiated, master uses this message to set shared memory
> + for slave. The memory fd is passed in the ancillary data. The shared
> + memory should be used to record inflight I/O by slave. And master will
> + clear it when vm reset.
> +
> Slave message types
> -------------------
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e09bed0e4a..8cdf3b5121 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
> VHOST_USER_PROTOCOL_F_CONFIG = 9,
> VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
> VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> + VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
> VHOST_USER_PROTOCOL_F_MAX
> };
>
> @@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
> VHOST_USER_POSTCOPY_ADVISE = 28,
> VHOST_USER_POSTCOPY_LISTEN = 29,
> VHOST_USER_POSTCOPY_END = 30,
> + VHOST_USER_GET_SHM_SIZE = 31,
> + VHOST_USER_SET_SHM_FD = 32,
> VHOST_USER_MAX
> } VhostUserRequest;
>
> @@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
> uint64_t offset;
> } VhostUserVringArea;
>
> +typedef struct VhostUserShm {
> + uint64_t mmap_size;
> + uint64_t mmap_offset;
> + uint32_t dev_size;
> + uint32_t vq_size;
> + uint32_t align;
> + uint32_t version;
> +} VhostUserShm;
> +
> typedef struct {
> VhostUserRequest request;
>
> @@ -169,6 +181,7 @@ typedef union {
> VhostUserConfig config;
> VhostUserCryptoSession session;
> VhostUserVringArea area;
> + VhostUserShm shm;
> } VhostUserPayload;
>
> typedef struct VhostUserMsg {
> @@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
> return result;
> }
>
> +static int vhost_user_get_shm_size(struct vhost_dev *dev,
> + struct vhost_shm *shm)
> +{
> + VhostUserMsg msg = {
> + .hdr.request = VHOST_USER_GET_SHM_SIZE,
> + .hdr.flags = VHOST_USER_VERSION,
> + .hdr.size = sizeof(msg.payload.shm),
> + };
> +
> + if (!virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> + shm->dev_size = 0;
> + shm->vq_size = 0;
> + return 0;
> + }
> +
> + if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> + return -1;
> + }
> +
> + if (vhost_user_read(dev, &msg) < 0) {
> + return -1;
> + }
> +
> + if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
> + error_report("Received unexpected msg type. "
> + "Expected %d received %d",
> + VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
> + return -1;
> + }
> +
> + if (msg.hdr.size != sizeof(msg.payload.shm)) {
> + error_report("Received bad msg size.");
> + return -1;
> + }
> +
> + shm->dev_size = msg.payload.shm.dev_size;
> + shm->vq_size = msg.payload.shm.vq_size;
> + shm->align = msg.payload.shm.align;
> + shm->version = msg.payload.shm.version;
> +
> + return 0;
> +}
> +
> +static int vhost_user_set_shm_fd(struct vhost_dev *dev,
> + struct vhost_shm *shm)
> +{
> + VhostUserMsg msg = {
> + .hdr.request = VHOST_USER_SET_SHM_FD,
> + .hdr.flags = VHOST_USER_VERSION,
> + .payload.shm.mmap_size = shm->mmap_size,
> + .payload.shm.mmap_offset = 0,
> + .payload.shm.dev_size = shm->dev_size,
> + .payload.shm.vq_size = shm->vq_size,
> + .payload.shm.align = shm->align,
> + .payload.shm.version = shm->version,
> + .hdr.size = sizeof(msg.payload.shm),
> + };
> +
> + if (!virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> + return 0;
> + }
> +
> + if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> VhostUserState *vhost_user_init(void)
> {
> VhostUserState *user = g_new0(struct VhostUserState, 1);
> @@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
> .vhost_crypto_create_session = vhost_user_crypto_create_session,
> .vhost_crypto_close_session = vhost_user_crypto_close_session,
> .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
> + .vhost_get_shm_size = vhost_user_get_shm_size,
> + .vhost_set_shm_fd = vhost_user_set_shm_fd,
> };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 569c4053ea..7a38fed50f 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
> hdev->config_ops = ops;
> }
>
> +void vhost_dev_reset_shm(struct vhost_shm *shm)
> +{
> + if (shm->addr) {
> + memset(shm->addr, 0, shm->mmap_size);
> + }
> +}
> +
> +void vhost_dev_free_shm(struct vhost_shm *shm)
> +{
> + if (shm->addr) {
> + qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
> + shm->addr = NULL;
> + shm->fd = -1;
> + }
> +}
> +
> +int vhost_dev_alloc_shm(struct vhost_shm *shm)
> +{
> + Error *err = NULL;
> + int fd = -1;
> + void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
> + F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> + &fd, &err);
> + if (err) {
> + error_report_err(err);
> + return -1;
> + }
> +
> + shm->addr = addr;
> + shm->fd = fd;
> +
> + return 0;
> +}
> +
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> + if (shm->addr) {
> + qemu_put_be64(f, shm->mmap_size);
> + qemu_put_be32(f, shm->dev_size);
> + qemu_put_be32(f, shm->vq_size);
> + qemu_put_be32(f, shm->align);
> + qemu_put_be32(f, shm->version);
> + qemu_put_buffer(f, shm->addr, shm->mmap_size);
> + } else {
> + qemu_put_be64(f, 0);
> + }
> +}
> +
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> + uint64_t mmap_size;
> +
> + mmap_size = qemu_get_be64(f);
> + if (!mmap_size) {
> + return 0;
> + }
> +
> + vhost_dev_free_shm(shm);
> +
> + shm->mmap_size = mmap_size;
> + shm->dev_size = qemu_get_be32(f);
> + shm->vq_size = qemu_get_be32(f);
> + shm->align = qemu_get_be32(f);
> + shm->version = qemu_get_be32(f);
> +
> + if (vhost_dev_alloc_shm(shm)) {
> + return -ENOMEM;
> + }
> +
> + qemu_get_buffer(f, shm->addr, mmap_size);
> +
> + return 0;
> +}
> +
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> + int r;
> +
> + if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
> + r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
> + if (r) {
> + VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
> + return -errno;
> + }
> + }
> +
> + return 0;
> +}
> +
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> + int r;
> +
> + if (dev->vhost_ops->vhost_get_shm_size) {
> + r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
> + if (r) {
> + VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
> + return -errno;
> + }
> +
> + if (!shm->dev_size && !shm->vq_size) {
> + return 0;
> + }
> +
> + shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
> + dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
> +
> + if (vhost_dev_alloc_shm(shm)) {
> + return -ENOMEM;
> + }
> +
> + vhost_dev_reset_shm(shm);
> + }
> +
> + return 0;
> +}
> +
> /* Host notifiers must be enabled at this point. */
> int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> {
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81283ec50f..4e7f13c9e9 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
> VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
> } VhostSetConfigType;
>
> +struct vhost_shm;
> struct vhost_dev;
> struct vhost_log;
> struct vhost_memory;
> @@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
> MemoryRegionSection *section);
>
> +typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
> + struct vhost_shm *shm);
> +
> +typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
> + struct vhost_shm *shm);
> +
> typedef struct VhostOps {
> VhostBackendType backend_type;
> vhost_backend_init vhost_backend_init;
> @@ -142,6 +149,8 @@ typedef struct VhostOps {
> vhost_crypto_create_session_op vhost_crypto_create_session;
> vhost_crypto_close_session_op vhost_crypto_close_session;
> vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> + vhost_get_shm_size_op vhost_get_shm_size;
> + vhost_set_shm_fd_op vhost_set_shm_fd;
> } VhostOps;
>
> extern const VhostOps user_ops;
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a7f449fa87..b6e3d6ab56 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -7,6 +7,17 @@
> #include "exec/memory.h"
>
> /* Generic structures common for any vhost based device. */
> +
> +struct vhost_shm {
> + void *addr;
> + uint64_t mmap_size;
> + uint32_t dev_size;
> + uint32_t vq_size;
> + uint32_t align;
> + uint32_t version;
> + int fd;
> +};
> +
> struct vhost_virtqueue {
> int kick;
> int call;
> @@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
> */
> void vhost_dev_set_config_notifier(struct vhost_dev *dev,
> const VhostDevConfigOps *ops);
> +
> +void vhost_dev_reset_shm(struct vhost_shm *shm);
> +void vhost_dev_free_shm(struct vhost_shm *shm);
> +int vhost_dev_alloc_shm(struct vhost_shm *shm);
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
> #endif
> --
> 2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
2019-01-03 17:02 ` Michael S. Tsirkin
@ 2019-01-04 2:31 ` Yongji Xie
2019-01-04 2:41 ` Michael S. Tsirkin
0 siblings, 1 reply; 14+ messages in thread
From: Yongji Xie @ 2019-01-04 2:31 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
Coquelin, Maxime, Yury Kotov,
Евгений Яковлев,
qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
On Fri, 4 Jan 2019 at 01:02, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > and VHOST_USER_SET_SHM_FD to support providing shared
> > memory to backend.
>
> So this seems a bit vague. Since we are going to use it
> for tracking in-flight I/O I would prefer it that we
> actually call it that.
>
>
So how about VHOST_USER_GET_INFLIGHT_SIZE and VHOST_USER_SET_INFLIHGT_FD?
> >
> > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > required size of shared memory from backend. Then, qemu
> > allocates memory and sends them
>
> s/them/it/ ?
>
Will fix it in v4.
> > back to backend through
> > VHOST_USER_SET_SHM_FD.
> >
> > Note that the shared memory should be used to record
> > inflight I/O by backend. Qemu will clear it when vm reset.
> >
> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > Signed-off-by: Chai Wen <chaiwen@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > ---
> > docs/interop/vhost-user.txt | 41 +++++++++++
> > hw/virtio/vhost-user.c | 86 ++++++++++++++++++++++
> > hw/virtio/vhost.c | 117 ++++++++++++++++++++++++++++++
> > include/hw/virtio/vhost-backend.h | 9 +++
> > include/hw/virtio/vhost.h | 19 +++++
> > 5 files changed, 272 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index c2194711d9..5ee9c28ab0 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> > Offset: a 64-bit offset of this area from the start of the
> > supplied file descriptor
> >
> > + * Shm description
> > + -----------------------------------
> > + | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> > + -----------------------------------
> > +
> > + Mmap_size: a 64-bit size of the shared memory
> > + Mmap_offset: a 64-bit offset of the shared memory from the start
> > + of the supplied file descriptor
> > + Dev_size: a 32-bit size of device region in shared memory
> > + Vq_size: a 32-bit size of each virtqueue region in shared memory
> > + Align: a 32-bit align of each region in shared memory
> > + Version: a 32-bit version of this shared memory
> > +
>
> This is an informal description so please avoid _ in field
> names, just put a space in there. See e.g. log description.
>
>
Got it!
> > In QEMU the vhost-user message is implemented with the following struct:
> >
> > typedef struct VhostUserMsg {
>
>
> For things to work, in-flight format must not change when
> backend reconnects.
>
I'm not sure whether there will be some cases that we want to add some fields to
the inflight area without stopping vm.
> To encourage consistency, how about including a recommended format for
> this buffer in this document?
>
>
Sure. Will add it in v4.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
2019-01-04 2:31 ` Yongji Xie
@ 2019-01-04 2:41 ` Michael S. Tsirkin
2019-01-04 3:16 ` Yongji Xie
0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2019-01-04 2:41 UTC (permalink / raw)
To: Yongji Xie
Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
Coquelin, Maxime, Yury Kotov,
Евгений Яковлев,
qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
On Fri, Jan 04, 2019 at 10:31:34AM +0800, Yongji Xie wrote:
> On Fri, 4 Jan 2019 at 01:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > > From: Xie Yongji <xieyongji@baidu.com>
> > >
> > > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > > and VHOST_USER_SET_SHM_FD to support providing shared
> > > memory to backend.
> >
> > So this seems a bit vague. Since we are going to use it
> > for tracking in-flight I/O I would prefer it that we
> > actually call it that.
> >
> >
>
> So how about VHOST_USER_GET_INFLIGHT_SIZE and VHOST_USER_SET_INFLIHGT_FD?
Sounds good.
> > >
> > > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > > required size of shared memory from backend. Then, qemu
> > > allocates memory and sends them
> >
> > s/them/it/ ?
> >
>
> Will fix it in v4.
>
> > > back to backend through
> > > VHOST_USER_SET_SHM_FD.
> > >
> > > Note that the shared memory should be used to record
> > > inflight I/O by backend. Qemu will clear it when vm reset.
> > >
> > > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > > Signed-off-by: Chai Wen <chaiwen@baidu.com>
> > > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > > ---
> > > docs/interop/vhost-user.txt | 41 +++++++++++
> > > hw/virtio/vhost-user.c | 86 ++++++++++++++++++++++
> > > hw/virtio/vhost.c | 117 ++++++++++++++++++++++++++++++
> > > include/hw/virtio/vhost-backend.h | 9 +++
> > > include/hw/virtio/vhost.h | 19 +++++
> > > 5 files changed, 272 insertions(+)
> > >
> > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > index c2194711d9..5ee9c28ab0 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> > > Offset: a 64-bit offset of this area from the start of the
> > > supplied file descriptor
> > >
> > > + * Shm description
> > > + -----------------------------------
> > > + | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> > > + -----------------------------------
> > > +
> > > + Mmap_size: a 64-bit size of the shared memory
> > > + Mmap_offset: a 64-bit offset of the shared memory from the start
> > > + of the supplied file descriptor
> > > + Dev_size: a 32-bit size of device region in shared memory
> > > + Vq_size: a 32-bit size of each virtqueue region in shared memory
> > > + Align: a 32-bit align of each region in shared memory
> > > + Version: a 32-bit version of this shared memory
> > > +
> >
> > This is an informal description so please avoid _ in field
> > names, just put a space in there. See e.g. log description.
> >
> >
> Got it!
>
> > > In QEMU the vhost-user message is implemented with the following struct:
> > >
> > > typedef struct VhostUserMsg {
> >
> >
> > For things to work, in-flight format must not change when
> > backend reconnects.
> >
>
> I'm not sure whether there will be some cases that we want to add some fields to
> the inflight area without stopping vm.
Sorry I'm not sure I understand this comment. All I am saying is that
when one backend disconnects and another reconnects they must agree on
the format, so it's a good idea to document it.
> > To encourage consistency, how about including a recommended format for
> > this buffer in this document?
> >
> >
>
> Sure. Will add it in v4.
>
> Thanks,
> Yongji
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
2019-01-04 2:41 ` Michael S. Tsirkin
@ 2019-01-04 3:16 ` Yongji Xie
0 siblings, 0 replies; 14+ messages in thread
From: Yongji Xie @ 2019-01-04 3:16 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
Coquelin, Maxime, Yury Kotov,
Евгений Яковлев,
qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
On Fri, 4 Jan 2019 at 10:41, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jan 04, 2019 at 10:31:34AM +0800, Yongji Xie wrote:
> > On Fri, 4 Jan 2019 at 01:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > > > From: Xie Yongji <xieyongji@baidu.com>
> > > >
> > > > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > > > and VHOST_USER_SET_SHM_FD to support providing shared
> > > > memory to backend.
> > >
> > > So this seems a bit vague. Since we are going to use it
> > > for tracking in-flight I/O I would prefer it that we
> > > actually call it that.
> > >
> > >
> >
> > So how about VHOST_USER_GET_INFLIGHT_SIZE and VHOST_USER_SET_INFLIHGT_FD?
>
> Sounds good.
>
> > > >
> > > > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > > > required size of shared memory from backend. Then, qemu
> > > > allocates memory and sends them
> > >
> > > s/them/it/ ?
> > >
> >
> > Will fix it in v4.
> >
> > > > back to backend through
> > > > VHOST_USER_SET_SHM_FD.
> > > >
> > > > Note that the shared memory should be used to record
> > > > inflight I/O by backend. Qemu will clear it when vm reset.
> > > >
> > > > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > > > Signed-off-by: Chai Wen <chaiwen@baidu.com>
> > > > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > > > ---
> > > > docs/interop/vhost-user.txt | 41 +++++++++++
> > > > hw/virtio/vhost-user.c | 86 ++++++++++++++++++++++
> > > > hw/virtio/vhost.c | 117 ++++++++++++++++++++++++++++++
> > > > include/hw/virtio/vhost-backend.h | 9 +++
> > > > include/hw/virtio/vhost.h | 19 +++++
> > > > 5 files changed, 272 insertions(+)
> > > >
> > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > index c2194711d9..5ee9c28ab0 100644
> > > > --- a/docs/interop/vhost-user.txt
> > > > +++ b/docs/interop/vhost-user.txt
> > > > @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> > > > Offset: a 64-bit offset of this area from the start of the
> > > > supplied file descriptor
> > > >
> > > > + * Shm description
> > > > + -----------------------------------
> > > > + | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> > > > + -----------------------------------
> > > > +
> > > > + Mmap_size: a 64-bit size of the shared memory
> > > > + Mmap_offset: a 64-bit offset of the shared memory from the start
> > > > + of the supplied file descriptor
> > > > + Dev_size: a 32-bit size of device region in shared memory
> > > > + Vq_size: a 32-bit size of each virtqueue region in shared memory
> > > > + Align: a 32-bit align of each region in shared memory
> > > > + Version: a 32-bit version of this shared memory
> > > > +
> > >
> > > This is an informal description so please avoid _ in field
> > > names, just put a space in there. See e.g. log description.
> > >
> > >
> > Got it!
> >
> > > > In QEMU the vhost-user message is implemented with the following struct:
> > > >
> > > > typedef struct VhostUserMsg {
> > >
> > >
> > > For things to work, in-flight format must not change when
> > > backend reconnects.
> > >
> >
> > I'm not sure whether there will be some cases that we want to add some fields to
> > the inflight area without stopping vm.
>
> Sorry I'm not sure I understand this comment. All I am saying is that
> when one backend disconnects and another reconnects they must agree on
> the format, so it's a good idea to document it.
>
Oh, sorry. I may have misunderstood. I will document the format in v4.
Thank you.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
2019-01-03 17:02 ` Michael S. Tsirkin
@ 2019-01-03 17:13 ` Michael S. Tsirkin
2019-01-04 3:20 ` Yongji Xie
1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2019-01-03 17:13 UTC (permalink / raw)
To: elohimes
Cc: marcandre.lureau, berrange, jasowang, maxime.coquelin, yury-kotov,
wrfsh, qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> and VHOST_USER_SET_SHM_FD to support providing shared
> memory to backend.
>
> Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> required size of shared memory from backend. Then, qemu
> allocates memory and sends them back to backend through
> VHOST_USER_SET_SHM_FD.
So this does create a security concern that remote
can supply a very big area.
How about returning a buffer from client to qemu?
> Note that the shared memory should be used to record
> inflight I/O by backend. Qemu will clear it when vm reset.
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Chai Wen <chaiwen@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
> docs/interop/vhost-user.txt | 41 +++++++++++
> hw/virtio/vhost-user.c | 86 ++++++++++++++++++++++
> hw/virtio/vhost.c | 117 ++++++++++++++++++++++++++++++
> include/hw/virtio/vhost-backend.h | 9 +++
> include/hw/virtio/vhost.h | 19 +++++
> 5 files changed, 272 insertions(+)
>
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index c2194711d9..5ee9c28ab0 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
> Offset: a 64-bit offset of this area from the start of the
> supplied file descriptor
>
> + * Shm description
> + -----------------------------------
> + | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> + -----------------------------------
> +
> + Mmap_size: a 64-bit size of the shared memory
> + Mmap_offset: a 64-bit offset of the shared memory from the start
> + of the supplied file descriptor
> + Dev_size: a 32-bit size of device region in shared memory
> + Vq_size: a 32-bit size of each virtqueue region in shared memory
> + Align: a 32-bit align of each region in shared memory
> + Version: a 32-bit version of this shared memory
> +
> In QEMU the vhost-user message is implemented with the following struct:
>
> typedef struct VhostUserMsg {
> @@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
> struct vhost_iotlb_msg iotlb;
> VhostUserConfig config;
> VhostUserVringArea area;
> + VhostUserShm shm;
> };
> } QEMU_PACKED VhostUserMsg;
>
> @@ -175,6 +189,7 @@ the ones that do:
> * VHOST_USER_GET_PROTOCOL_FEATURES
> * VHOST_USER_GET_VRING_BASE
> * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> + * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>
> [ Also see the section on REPLY_ACK protocol extension. ]
>
> @@ -188,6 +203,7 @@ in the ancillary data:
> * VHOST_USER_SET_VRING_CALL
> * VHOST_USER_SET_VRING_ERR
> * VHOST_USER_SET_SLAVE_REQ_FD
> + * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>
> If Master is unable to send the full message or receives a wrong reply it will
> close the connection. An optional reconnection mechanism can be implemented.
> @@ -397,6 +413,7 @@ Protocol features
> #define VHOST_USER_PROTOCOL_F_CONFIG 9
> #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10
> #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
> +#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
>
> Master message types
> --------------------
> @@ -761,6 +778,30 @@ Master message types
> was previously sent.
> The value returned is an error indication; 0 is success.
>
> + * VHOST_USER_GET_SHM_SIZE
> + Id: 31
> + Equivalent ioctl: N/A
> + Master payload: shm description
> +
> + When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> + successfully negotiated, master need to provide a shared memory to
> + slave. This message is used by master to get required size from slave.
> + The shared memory contains one region for device and several regions
> + for virtqueue. The size of those two kinds of regions is specified
> + by dev_size field and vq_size filed. The align field specify the alignment
> + of those regions.
> +
> + * VHOST_USER_SET_SHM_FD
> + Id: 32
> + Equivalent ioctl: N/A
> + Master payload: shm description
> +
> + When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> + successfully negotiated, master uses this message to set shared memory
> + for slave. The memory fd is passed in the ancillary data. The shared
> + memory should be used to record inflight I/O by slave. And master will
> + clear it when vm reset.
> +
> Slave message types
> -------------------
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e09bed0e4a..8cdf3b5121 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
> VHOST_USER_PROTOCOL_F_CONFIG = 9,
> VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
> VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> + VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
> VHOST_USER_PROTOCOL_F_MAX
> };
>
> @@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
> VHOST_USER_POSTCOPY_ADVISE = 28,
> VHOST_USER_POSTCOPY_LISTEN = 29,
> VHOST_USER_POSTCOPY_END = 30,
> + VHOST_USER_GET_SHM_SIZE = 31,
> + VHOST_USER_SET_SHM_FD = 32,
> VHOST_USER_MAX
> } VhostUserRequest;
>
> @@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
> uint64_t offset;
> } VhostUserVringArea;
>
> +typedef struct VhostUserShm {
> + uint64_t mmap_size;
> + uint64_t mmap_offset;
> + uint32_t dev_size;
> + uint32_t vq_size;
> + uint32_t align;
> + uint32_t version;
> +} VhostUserShm;
> +
> typedef struct {
> VhostUserRequest request;
>
> @@ -169,6 +181,7 @@ typedef union {
> VhostUserConfig config;
> VhostUserCryptoSession session;
> VhostUserVringArea area;
> + VhostUserShm shm;
> } VhostUserPayload;
>
> typedef struct VhostUserMsg {
> @@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
> return result;
> }
>
> +static int vhost_user_get_shm_size(struct vhost_dev *dev,
> + struct vhost_shm *shm)
> +{
> + VhostUserMsg msg = {
> + .hdr.request = VHOST_USER_GET_SHM_SIZE,
> + .hdr.flags = VHOST_USER_VERSION,
> + .hdr.size = sizeof(msg.payload.shm),
> + };
> +
> + if (!virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> + shm->dev_size = 0;
> + shm->vq_size = 0;
> + return 0;
> + }
> +
> + if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> + return -1;
> + }
> +
> + if (vhost_user_read(dev, &msg) < 0) {
> + return -1;
> + }
> +
> + if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
> + error_report("Received unexpected msg type. "
> + "Expected %d received %d",
> + VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
> + return -1;
> + }
> +
> + if (msg.hdr.size != sizeof(msg.payload.shm)) {
> + error_report("Received bad msg size.");
> + return -1;
> + }
> +
> + shm->dev_size = msg.payload.shm.dev_size;
> + shm->vq_size = msg.payload.shm.vq_size;
> + shm->align = msg.payload.shm.align;
> + shm->version = msg.payload.shm.version;
> +
> + return 0;
> +}
> +
> +static int vhost_user_set_shm_fd(struct vhost_dev *dev,
> + struct vhost_shm *shm)
> +{
> + VhostUserMsg msg = {
> + .hdr.request = VHOST_USER_SET_SHM_FD,
> + .hdr.flags = VHOST_USER_VERSION,
> + .payload.shm.mmap_size = shm->mmap_size,
> + .payload.shm.mmap_offset = 0,
> + .payload.shm.dev_size = shm->dev_size,
> + .payload.shm.vq_size = shm->vq_size,
> + .payload.shm.align = shm->align,
> + .payload.shm.version = shm->version,
> + .hdr.size = sizeof(msg.payload.shm),
> + };
> +
> + if (!virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> + return 0;
> + }
> +
> + if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> VhostUserState *vhost_user_init(void)
> {
> VhostUserState *user = g_new0(struct VhostUserState, 1);
> @@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
> .vhost_crypto_create_session = vhost_user_crypto_create_session,
> .vhost_crypto_close_session = vhost_user_crypto_close_session,
> .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
> + .vhost_get_shm_size = vhost_user_get_shm_size,
> + .vhost_set_shm_fd = vhost_user_set_shm_fd,
> };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 569c4053ea..7a38fed50f 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
> hdev->config_ops = ops;
> }
>
> +void vhost_dev_reset_shm(struct vhost_shm *shm)
> +{
> + if (shm->addr) {
> + memset(shm->addr, 0, shm->mmap_size);
> + }
> +}
> +
> +void vhost_dev_free_shm(struct vhost_shm *shm)
> +{
> + if (shm->addr) {
> + qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
> + shm->addr = NULL;
> + shm->fd = -1;
> + }
> +}
> +
> +int vhost_dev_alloc_shm(struct vhost_shm *shm)
> +{
> + Error *err = NULL;
> + int fd = -1;
> + void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
> + F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> + &fd, &err);
> + if (err) {
> + error_report_err(err);
> + return -1;
> + }
> +
> + shm->addr = addr;
> + shm->fd = fd;
> +
> + return 0;
> +}
> +
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> + if (shm->addr) {
> + qemu_put_be64(f, shm->mmap_size);
> + qemu_put_be32(f, shm->dev_size);
> + qemu_put_be32(f, shm->vq_size);
> + qemu_put_be32(f, shm->align);
> + qemu_put_be32(f, shm->version);
> + qemu_put_buffer(f, shm->addr, shm->mmap_size);
> + } else {
> + qemu_put_be64(f, 0);
> + }
> +}
> +
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> + uint64_t mmap_size;
> +
> + mmap_size = qemu_get_be64(f);
> + if (!mmap_size) {
> + return 0;
> + }
> +
> + vhost_dev_free_shm(shm);
> +
> + shm->mmap_size = mmap_size;
> + shm->dev_size = qemu_get_be32(f);
> + shm->vq_size = qemu_get_be32(f);
> + shm->align = qemu_get_be32(f);
> + shm->version = qemu_get_be32(f);
> +
> + if (vhost_dev_alloc_shm(shm)) {
> + return -ENOMEM;
> + }
> +
> + qemu_get_buffer(f, shm->addr, mmap_size);
> +
> + return 0;
> +}
> +
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> + int r;
> +
> + if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
> + r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
> + if (r) {
> + VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
> + return -errno;
> + }
> + }
> +
> + return 0;
> +}
> +
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> + int r;
> +
> + if (dev->vhost_ops->vhost_get_shm_size) {
> + r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
> + if (r) {
> + VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
> + return -errno;
> + }
> +
> + if (!shm->dev_size && !shm->vq_size) {
> + return 0;
> + }
> +
> + shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
> + dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
> +
> + if (vhost_dev_alloc_shm(shm)) {
> + return -ENOMEM;
> + }
> +
> + vhost_dev_reset_shm(shm);
> + }
> +
> + return 0;
> +}
> +
> /* Host notifiers must be enabled at this point. */
> int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> {
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81283ec50f..4e7f13c9e9 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
> VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
> } VhostSetConfigType;
>
> +struct vhost_shm;
> struct vhost_dev;
> struct vhost_log;
> struct vhost_memory;
> @@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
> MemoryRegionSection *section);
>
> +typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
> + struct vhost_shm *shm);
> +
> +typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
> + struct vhost_shm *shm);
> +
> typedef struct VhostOps {
> VhostBackendType backend_type;
> vhost_backend_init vhost_backend_init;
> @@ -142,6 +149,8 @@ typedef struct VhostOps {
> vhost_crypto_create_session_op vhost_crypto_create_session;
> vhost_crypto_close_session_op vhost_crypto_close_session;
> vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> + vhost_get_shm_size_op vhost_get_shm_size;
> + vhost_set_shm_fd_op vhost_set_shm_fd;
> } VhostOps;
>
> extern const VhostOps user_ops;
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a7f449fa87..b6e3d6ab56 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -7,6 +7,17 @@
> #include "exec/memory.h"
>
> /* Generic structures common for any vhost based device. */
> +
> +struct vhost_shm {
> + void *addr;
> + uint64_t mmap_size;
> + uint32_t dev_size;
> + uint32_t vq_size;
> + uint32_t align;
> + uint32_t version;
> + int fd;
> +};
> +
> struct vhost_virtqueue {
> int kick;
> int call;
> @@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
> */
> void vhost_dev_set_config_notifier(struct vhost_dev *dev,
> const VhostDevConfigOps *ops);
> +
> +void vhost_dev_reset_shm(struct vhost_shm *shm);
> +void vhost_dev_free_shm(struct vhost_shm *shm);
> +int vhost_dev_alloc_shm(struct vhost_shm *shm);
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
> #endif
> --
> 2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend
2019-01-03 17:13 ` Michael S. Tsirkin
@ 2019-01-04 3:20 ` Yongji Xie
0 siblings, 0 replies; 14+ messages in thread
From: Yongji Xie @ 2019-01-04 3:20 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Marc-André Lureau, Daniel P. Berrangé, Jason Wang,
Coquelin, Maxime, Yury Kotov,
Евгений Яковлев,
qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
On Fri, 4 Jan 2019 at 01:13, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jan 03, 2019 at 06:18:14PM +0800, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > and VHOST_USER_SET_SHM_FD to support providing shared
> > memory to backend.
> >
> > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > required size of shared memory from backend. Then, qemu
> > allocates memory and sends them back to backend through
> > VHOST_USER_SET_SHM_FD.
>
> So this does create a security concern that remote
> can supply a very big area.
> How about returning a buffer from client to qemu?
>
That's a good idea! Will do it v4.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v3 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc()
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 1/7] char-socket: Enable "nowait" option on client sockets elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
@ 2019-01-03 10:18 ` elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
` (3 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
yury-kotov, wrfsh
Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
From: Xie Yongji <xieyongji@baidu.com>
Introduce vu_queue_map_desc() which should be
independent with vu_queue_pop();
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
contrib/libvhost-user/libvhost-user.c | 88 ++++++++++++++++-----------
1 file changed, 51 insertions(+), 37 deletions(-)
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index a6b46cdc03..23bd52264c 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -1853,49 +1853,20 @@ virtqueue_alloc_element(size_t sz,
return elem;
}
-void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+static void *
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
{
- unsigned int i, head, max, desc_len;
+ struct vring_desc *desc = vq->vring.desc;
uint64_t desc_addr, read_len;
+ unsigned int desc_len;
+ unsigned int max = vq->vring.num;
+ unsigned int i = idx;
VuVirtqElement *elem;
- unsigned out_num, in_num;
+ unsigned int out_num = 0, in_num = 0;
struct iovec iov[VIRTQUEUE_MAX_SIZE];
struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE];
- struct vring_desc *desc;
int rc;
- if (unlikely(dev->broken) ||
- unlikely(!vq->vring.avail)) {
- return NULL;
- }
-
- if (vu_queue_empty(dev, vq)) {
- return NULL;
- }
- /* Needed after virtio_queue_empty(), see comment in
- * virtqueue_num_heads(). */
- smp_rmb();
-
- /* When we start there are none of either input nor output. */
- out_num = in_num = 0;
-
- max = vq->vring.num;
- if (vq->inuse >= vq->vring.num) {
- vu_panic(dev, "Virtqueue size exceeded");
- return NULL;
- }
-
- if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
- return NULL;
- }
-
- if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
- vring_set_avail_event(vq, vq->last_avail_idx);
- }
-
- i = head;
- desc = vq->vring.desc;
if (desc[i].flags & VRING_DESC_F_INDIRECT) {
if (desc[i].len % sizeof(struct vring_desc)) {
vu_panic(dev, "Invalid size for indirect buffer table");
@@ -1947,12 +1918,13 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
} while (rc == VIRTQUEUE_READ_DESC_MORE);
if (rc == VIRTQUEUE_READ_DESC_ERROR) {
+ vu_panic(dev, "read descriptor error");
return NULL;
}
/* Now copy what we have collected and mapped */
elem = virtqueue_alloc_element(sz, out_num, in_num);
- elem->index = head;
+ elem->index = idx;
for (i = 0; i < out_num; i++) {
elem->out_sg[i] = iov[i];
}
@@ -1960,6 +1932,48 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
elem->in_sg[i] = iov[out_num + i];
}
+ return elem;
+}
+
+void *
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+{
+ unsigned int head;
+ VuVirtqElement *elem;
+
+ if (unlikely(dev->broken) ||
+ unlikely(!vq->vring.avail)) {
+ return NULL;
+ }
+
+ if (vu_queue_empty(dev, vq)) {
+ return NULL;
+ }
+ /*
+ * Needed after virtio_queue_empty(), see comment in
+ * virtqueue_num_heads().
+ */
+ smp_rmb();
+
+ if (vq->inuse >= vq->vring.num) {
+ vu_panic(dev, "Virtqueue size exceeded");
+ return NULL;
+ }
+
+ if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
+ return NULL;
+ }
+
+ if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
+ vring_set_avail_event(vq, vq->last_avail_idx);
+ }
+
+ elem = vu_queue_map_desc(dev, vq, head, sz);
+
+ if (!elem) {
+ return NULL;
+ }
+
vq->inuse++;
return elem;
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v3 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
` (2 preceding siblings ...)
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
@ 2019-01-03 10:18 ` elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
` (2 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
yury-kotov, wrfsh
Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
From: Xie Yongji <xieyongji@baidu.com>
This patch adds support for VHOST_USER_GET_SHM_SIZE and
VHOST_USER_SET_SHM_FD message to get shared memory from qemu.
Then we maintain a "bitmap" of all descriptors in
the shared memory for each queue to record inflight I/O.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
contrib/libvhost-user/libvhost-user.c | 221 +++++++++++++++++++++++++-
contrib/libvhost-user/libvhost-user.h | 33 ++++
2 files changed, 248 insertions(+), 6 deletions(-)
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 23bd52264c..f18f5e6e62 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -53,6 +53,18 @@
_min1 < _min2 ? _min1 : _min2; })
#endif
+/* Round number down to multiple */
+#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
+
+/* Round number up to multiple */
+#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
+
+/* Align each region to cache line size in shared memory */
+#define SHM_ALIGNMENT 64
+
+/* The version of shared memory */
+#define SHM_VERSION 1
+
#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
/* The version of the protocol we support */
@@ -100,6 +112,8 @@ vu_request_to_string(unsigned int req)
REQ(VHOST_USER_POSTCOPY_ADVISE),
REQ(VHOST_USER_POSTCOPY_LISTEN),
REQ(VHOST_USER_POSTCOPY_END),
+ REQ(VHOST_USER_GET_SHM_SIZE),
+ REQ(VHOST_USER_SET_SHM_FD),
REQ(VHOST_USER_MAX),
};
#undef REQ
@@ -890,6 +904,41 @@ vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
return true;
}
+static int
+vu_check_queue_inflights(VuDev *dev, VuVirtq *vq)
+{
+ int i = 0;
+
+ if ((dev->protocol_features &
+ VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+ return 0;
+ }
+
+ if (unlikely(!vq->shm)) {
+ return -1;
+ }
+
+ vq->used_idx = vq->vring.used->idx;
+ vq->inflight_num = 0;
+ for (i = 0; i < vq->vring.num; i++) {
+ if (vq->shm->inflight[i] == 0) {
+ continue;
+ }
+
+ vq->inflight_desc[vq->inflight_num++] = i;
+ vq->inuse++;
+ }
+ vq->shadow_avail_idx = vq->last_avail_idx = vq->inuse + vq->used_idx;
+
+ /* in case of I/O hang after reconnecting */
+ if (eventfd_write(vq->kick_fd, 1) ||
+ eventfd_write(vq->call_fd, 1)) {
+ return -1;
+ }
+
+ return 0;
+}
+
static bool
vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
{
@@ -925,6 +974,10 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
dev->vq[index].kick_fd, index);
}
+ if (vu_check_queue_inflights(dev, &dev->vq[index])) {
+ vu_panic(dev, "Failed to check inflights for vq: %d\n", index);
+ }
+
return false;
}
@@ -1215,6 +1268,115 @@ vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
return true;
}
+static int
+vu_setup_shm(VuDev *dev)
+{
+ int i;
+ char *addr = (char *)dev->shm_info.addr;
+ uint64_t size = 0;
+ uint32_t vq_size = ALIGN_UP(dev->shm_info.vq_size, dev->shm_info.align);
+
+ if (dev->shm_info.version != SHM_VERSION) {
+ DPRINT("Invalid version for shm: %d", dev->shm_info.version);
+ return -1;
+ }
+
+ if (dev->shm_info.dev_size != 0) {
+ DPRINT("Invalid dev_size for shm: %d", dev->shm_info.dev_size);
+ return -1;
+ }
+
+ if (dev->shm_info.vq_size != sizeof(VuVirtqShm)) {
+ DPRINT("Invalid vq_size for shm: %d", dev->shm_info.vq_size);
+ return -1;
+ }
+
+ for (i = 0; i < VHOST_MAX_NR_VIRTQUEUE; i++) {
+ size += vq_size;
+ if (size > dev->shm_info.mmap_size) {
+ break;
+ }
+ dev->vq[i].shm = (VuVirtqShm *)addr;
+ addr += vq_size;
+ }
+
+ return 0;
+}
+
+static bool
+vu_get_shm_size(VuDev *dev, VhostUserMsg *vmsg)
+{
+ if (vmsg->size != sizeof(vmsg->payload.shm)) {
+ vu_panic(dev, "Invalid get_shm_size message:%d", vmsg->size);
+ vmsg->size = 0;
+ return true;
+ }
+
+ vmsg->payload.shm.dev_size = 0;
+ vmsg->payload.shm.vq_size = sizeof(VuVirtqShm);
+ vmsg->payload.shm.align = SHM_ALIGNMENT;
+ vmsg->payload.shm.version = SHM_VERSION;
+
+ DPRINT("send shm dev_size: %"PRId32"\n", vmsg->payload.shm.dev_size);
+ DPRINT("send shm vq_size: %"PRId32"\n", vmsg->payload.shm.vq_size);
+ DPRINT("send shm align: %"PRId32"\n", vmsg->payload.shm.align);
+ DPRINT("send shm version: %"PRId32"\n", vmsg->payload.shm.version);
+
+ return true;
+}
+
+static bool
+vu_set_shm_fd(VuDev *dev, VhostUserMsg *vmsg)
+{
+ int fd;
+ uint64_t mmap_size, mmap_offset;
+ void *rc;
+
+ if (vmsg->fd_num != 1 ||
+ vmsg->size != sizeof(vmsg->payload.shm)) {
+ vu_panic(dev, "Invalid set_shm_fd message size:%d fds:%d",
+ vmsg->size, vmsg->fd_num);
+ return false;
+ }
+
+ fd = vmsg->fds[0];
+ mmap_size = vmsg->payload.shm.mmap_size;
+ mmap_offset = vmsg->payload.shm.mmap_offset;
+ DPRINT("set_shm_fd mmap_size: %"PRId64"\n", mmap_size);
+ DPRINT("set_shm_fd mmap_offset: %"PRId64"\n", mmap_offset);
+ DPRINT("set_shm_fd dev_size: %"PRId32"\n", vmsg->payload.shm.dev_size);
+ DPRINT("set_shm_fd vq_size: %"PRId32"\n", vmsg->payload.shm.vq_size);
+ DPRINT("set_shm_fd align: %"PRId32"\n", vmsg->payload.shm.align);
+ DPRINT("set_shm_fd version: %"PRId32"\n", vmsg->payload.shm.version);
+
+ rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+ fd, mmap_offset);
+
+ close(fd);
+
+ if (rc == MAP_FAILED) {
+ vu_panic(dev, "set_shm_fd mmap error: %s", strerror(errno));
+ return false;
+ }
+
+ if (dev->shm_info.addr) {
+ munmap(dev->shm_info.addr, dev->shm_info.mmap_size);
+ }
+ dev->shm_info.addr = rc;
+ dev->shm_info.mmap_size = mmap_size;
+ dev->shm_info.dev_size = vmsg->payload.shm.dev_size;
+ dev->shm_info.vq_size = vmsg->payload.shm.vq_size;
+ dev->shm_info.align = vmsg->payload.shm.align;
+ dev->shm_info.version = vmsg->payload.shm.version;
+
+ if (vu_setup_shm(dev)) {
+ vu_panic(dev, "setup shm failed");
+ return false;
+ }
+
+ return false;
+}
+
static bool
vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
{
@@ -1292,6 +1454,10 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
return vu_set_postcopy_listen(dev, vmsg);
case VHOST_USER_POSTCOPY_END:
return vu_set_postcopy_end(dev, vmsg);
+ case VHOST_USER_GET_SHM_SIZE:
+ return vu_get_shm_size(dev, vmsg);
+ case VHOST_USER_SET_SHM_FD:
+ return vu_set_shm_fd(dev, vmsg);
default:
vmsg_close_fds(vmsg);
vu_panic(dev, "Unhandled request: %d", vmsg->request);
@@ -1359,8 +1525,13 @@ vu_deinit(VuDev *dev)
close(vq->err_fd);
vq->err_fd = -1;
}
+ vq->shm = NULL;
}
+ if (dev->shm_info.addr) {
+ munmap(dev->shm_info.addr, dev->shm_info.mmap_size);
+ dev->shm_info.addr = NULL;
+ }
vu_close_log(dev);
if (dev->slave_fd != -1) {
@@ -1829,12 +2000,6 @@ virtqueue_map_desc(VuDev *dev,
*p_num_sg = num_sg;
}
-/* Round number down to multiple */
-#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
-
-/* Round number up to multiple */
-#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
-
static void *
virtqueue_alloc_element(size_t sz,
unsigned out_num, unsigned in_num)
@@ -1935,9 +2100,44 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
return elem;
}
+static int
+vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+ if ((dev->protocol_features &
+ VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+ return 0;
+ }
+
+ if (unlikely(!vq->shm)) {
+ return -1;
+ }
+
+ vq->shm->inflight[desc_idx] = 1;
+
+ return 0;
+}
+
+static int
+vu_queue_inflight_put(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+ if ((dev->protocol_features &
+ VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+ return 0;
+ }
+
+ if (unlikely(!vq->shm)) {
+ return -1;
+ }
+
+ vq->shm->inflight[desc_idx] = 0;
+
+ return 0;
+}
+
void *
vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
{
+ int i;
unsigned int head;
VuVirtqElement *elem;
@@ -1946,6 +2146,12 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
return NULL;
}
+ if (unlikely(vq->inflight_num > 0)) {
+ i = (--vq->inflight_num);
+ elem = vu_queue_map_desc(dev, vq, vq->inflight_desc[i], sz);
+ return elem;
+ }
+
if (vu_queue_empty(dev, vq)) {
return NULL;
}
@@ -1976,6 +2182,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
vq->inuse++;
+ vu_queue_inflight_get(dev, vq, head);
+
return elem;
}
@@ -2121,4 +2329,5 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
{
vu_queue_fill(dev, vq, elem, len, 0);
vu_queue_flush(dev, vq, 1);
+ vu_queue_inflight_put(dev, vq, elem->index);
}
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 4aa55b4d2d..fdfda688d2 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
VHOST_USER_PROTOCOL_F_CONFIG = 9,
VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+ VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
VHOST_USER_PROTOCOL_F_MAX
};
@@ -91,6 +92,8 @@ typedef enum VhostUserRequest {
VHOST_USER_POSTCOPY_ADVISE = 28,
VHOST_USER_POSTCOPY_LISTEN = 29,
VHOST_USER_POSTCOPY_END = 30,
+ VHOST_USER_GET_SHM_SIZE = 31,
+ VHOST_USER_SET_SHM_FD = 32,
VHOST_USER_MAX
} VhostUserRequest;
@@ -138,6 +141,15 @@ typedef struct VhostUserVringArea {
uint64_t offset;
} VhostUserVringArea;
+typedef struct VhostUserShm {
+ uint64_t mmap_size;
+ uint64_t mmap_offset;
+ uint32_t dev_size;
+ uint32_t vq_size;
+ uint32_t align;
+ uint32_t version;
+} VhostUserShm;
+
#if defined(_WIN32)
# define VU_PACKED __attribute__((gcc_struct, packed))
#else
@@ -163,6 +175,7 @@ typedef struct VhostUserMsg {
VhostUserLog log;
VhostUserConfig config;
VhostUserVringArea area;
+ VhostUserShm shm;
} payload;
int fds[VHOST_MEMORY_MAX_NREGIONS];
@@ -234,9 +247,19 @@ typedef struct VuRing {
uint32_t flags;
} VuRing;
+typedef struct VuVirtqShm {
+ char inflight[VIRTQUEUE_MAX_SIZE];
+} VuVirtqShm;
+
typedef struct VuVirtq {
VuRing vring;
+ VuVirtqShm *shm;
+
+ uint16_t inflight_desc[VIRTQUEUE_MAX_SIZE];
+
+ uint16_t inflight_num;
+
/* Next head to pop */
uint16_t last_avail_idx;
@@ -279,11 +302,21 @@ typedef void (*vu_set_watch_cb) (VuDev *dev, int fd, int condition,
vu_watch_cb cb, void *data);
typedef void (*vu_remove_watch_cb) (VuDev *dev, int fd);
+typedef struct VuDevShmInfo {
+ void *addr;
+ uint64_t mmap_size;
+ uint32_t dev_size;
+ uint32_t vq_size;
+ uint32_t align;
+ uint32_t version;
+} VuDevShmInfo;
+
struct VuDev {
int sock;
uint32_t nregions;
VuDevRegion regions[VHOST_MEMORY_MAX_NREGIONS];
VuVirtq vq[VHOST_MAX_NR_VIRTQUEUE];
+ VuDevShmInfo shm_info;
int log_call_fd;
int slave_fd;
uint64_t log_size;
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v3 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
` (3 preceding siblings ...)
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
@ 2019-01-03 10:18 ` elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes
6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
yury-kotov, wrfsh
Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
From: Xie Yongji <xieyongji@baidu.com>
This patch add supports for vhost-user-blk device to provide
shared memory to backend.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
hw/block/vhost-user-blk.c | 26 ++++++++++++++++++++++++++
include/hw/virtio/vhost-user-blk.h | 1 +
2 files changed, 27 insertions(+)
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 1451940845..27028cf996 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -126,6 +126,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
}
s->dev.acked_features = vdev->guest_features;
+
+ ret = vhost_dev_set_shm(&s->dev, s->shm);
+ if (ret < 0) {
+ error_report("Error set shared memory: %d", -ret);
+ goto err_guest_notifiers;
+ }
+
ret = vhost_dev_start(&s->dev, vdev);
if (ret < 0) {
error_report("Error starting vhost: %d", -ret);
@@ -245,6 +252,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
}
}
+static void vhost_user_blk_reset(VirtIODevice *vdev)
+{
+ VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ vhost_dev_reset_shm(s->shm);
+}
+
static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -284,6 +298,8 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
vhost_user_blk_handle_output);
}
+ s->shm = g_new0(struct vhost_shm, 1);
+
s->dev.nvqs = s->num_queues;
s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
s->dev.vq_index = 0;
@@ -309,12 +325,19 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
s->blkcfg.num_queues = s->num_queues;
}
+ ret = vhost_dev_init_shm(&s->dev, s->shm);
+ if (ret < 0) {
+ error_setg(errp, "vhost-user-blk: init shared memory failed");
+ goto vhost_err;
+ }
+
return;
vhost_err:
vhost_dev_cleanup(&s->dev);
virtio_err:
g_free(s->dev.vqs);
+ g_free(s->shm);
virtio_cleanup(vdev);
vhost_user_cleanup(user);
@@ -329,7 +352,9 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
vhost_user_blk_set_status(vdev, 0);
vhost_dev_cleanup(&s->dev);
+ vhost_dev_free_shm(s->shm);
g_free(s->dev.vqs);
+ g_free(s->shm);
virtio_cleanup(vdev);
if (s->vhost_user) {
@@ -379,6 +404,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data)
vdc->set_config = vhost_user_blk_set_config;
vdc->get_features = vhost_user_blk_get_features;
vdc->set_status = vhost_user_blk_set_status;
+ vdc->reset = vhost_user_blk_reset;
}
static const TypeInfo vhost_user_blk_info = {
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index d52944aeeb..bb706d70b3 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -36,6 +36,7 @@ typedef struct VHostUserBlk {
uint32_t queue_size;
uint32_t config_wce;
struct vhost_dev dev;
+ struct vhost_shm *shm;
VhostUserState *vhost_user;
} VHostUserBlk;
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v3 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
` (4 preceding siblings ...)
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
@ 2019-01-03 10:18 ` elohimes
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes
6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
yury-kotov, wrfsh
Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
From: Xie Yongji <xieyongji@baidu.com>
Since we now support the message VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD. The backend is able to restart
safely because it can record inflight I/O in shared memory.
This patch allows qemu to reconnect the backend after
connection closed.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Ni Xun <nixun@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
hw/block/vhost-user-blk.c | 205 +++++++++++++++++++++++------
include/hw/virtio/vhost-user-blk.h | 4 +
2 files changed, 168 insertions(+), 41 deletions(-)
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 27028cf996..c0725e8e4a 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
.vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
};
-static void vhost_user_blk_start(VirtIODevice *vdev)
+static int vhost_user_blk_start(VirtIODevice *vdev)
{
VHostUserBlk *s = VHOST_USER_BLK(vdev);
BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
@@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
if (!k->set_guest_notifiers) {
error_report("binding does not support guest notifiers");
- return;
+ return -ENOSYS;
}
ret = vhost_dev_enable_notifiers(&s->dev, vdev);
if (ret < 0) {
error_report("Error enabling host notifiers: %d", -ret);
- return;
+ return ret;
}
ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
@@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
vhost_virtqueue_mask(&s->dev, vdev, i, false);
}
- return;
+ return ret;
err_guest_notifiers:
k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
err_host_notifiers:
vhost_dev_disable_notifiers(&s->dev, vdev);
+ return ret;
}
static void vhost_user_blk_stop(VirtIODevice *vdev)
@@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
if (ret < 0) {
error_report("vhost guest notifier cleanup failed: %d", ret);
- return;
}
vhost_dev_disable_notifiers(&s->dev, vdev);
@@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
{
VHostUserBlk *s = VHOST_USER_BLK(vdev);
bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
+ int ret;
if (!vdev->vm_running) {
should_start = false;
}
- if (s->dev.started == should_start) {
+ if (s->should_start == should_start) {
+ return;
+ }
+
+ if (!s->connected || s->dev.started == should_start) {
+ s->should_start = should_start;
return;
}
if (should_start) {
- vhost_user_blk_start(vdev);
+ s->should_start = true;
+ /*
+ * make sure vhost_user_blk_handle_output() ignores fake
+ * guest kick by vhost_dev_enable_notifiers()
+ */
+ barrier();
+ ret = vhost_user_blk_start(vdev);
+ if (ret < 0) {
+ error_report("vhost-user-blk: vhost start failed: %s",
+ strerror(-ret));
+ qemu_chr_fe_disconnect(&s->chardev);
+ }
} else {
vhost_user_blk_stop(vdev);
+ /*
+ * make sure vhost_user_blk_handle_output() ignore fake
+ * guest kick by vhost_dev_disable_notifiers()
+ */
+ barrier();
+ s->should_start = false;
}
-
}
static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
@@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
{
VHostUserBlk *s = VHOST_USER_BLK(vdev);
- int i;
+ int i, ret;
if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
return;
}
+ if (s->should_start) {
+ return;
+ }
+ s->should_start = true;
+
+ if (!s->connected) {
+ return;
+ }
+
if (s->dev.started) {
return;
}
@@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
/* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
* vhost here instead of waiting for .set_status().
*/
- vhost_user_blk_start(vdev);
+ ret = vhost_user_blk_start(vdev);
+ if (ret < 0) {
+ error_report("vhost-user-blk: vhost start failed: %s",
+ strerror(-ret));
+ qemu_chr_fe_disconnect(&s->chardev);
+ return;
+ }
/* Kick right away to begin processing requests already in vring */
for (i = 0; i < s->dev.nvqs; i++) {
@@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
vhost_dev_reset_shm(s->shm);
}
+static int vhost_user_blk_connect(DeviceState *dev)
+{
+ VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+ VHostUserBlk *s = VHOST_USER_BLK(vdev);
+ int ret = 0;
+
+ if (s->connected) {
+ return 0;
+ }
+ s->connected = true;
+
+ s->dev.nvqs = s->num_queues;
+ s->dev.vqs = s->vqs;
+ s->dev.vq_index = 0;
+ s->dev.backend_features = 0;
+
+ vhost_dev_set_config_notifier(&s->dev, &blk_ops);
+
+ ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
+ if (ret < 0) {
+ error_report("vhost-user-blk: vhost initialization failed: %s",
+ strerror(-ret));
+ return ret;
+ }
+
+ /* restore vhost state */
+ if (s->should_start) {
+ ret = vhost_user_blk_start(vdev);
+ if (ret < 0) {
+ error_report("vhost-user-blk: vhost start failed: %s",
+ strerror(-ret));
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static void vhost_user_blk_disconnect(DeviceState *dev)
+{
+ VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+ VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ if (!s->connected) {
+ return;
+ }
+ s->connected = false;
+
+ if (s->dev.started) {
+ vhost_user_blk_stop(vdev);
+ }
+
+ vhost_dev_cleanup(&s->dev);
+}
+
+static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
+ void *opaque)
+{
+ DeviceState *dev = opaque;
+ VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+ VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ qemu_chr_fe_disconnect(&s->chardev);
+
+ return true;
+}
+
+static void vhost_user_blk_event(void *opaque, int event)
+{
+ DeviceState *dev = opaque;
+ VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+ VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ switch (event) {
+ case CHR_EVENT_OPENED:
+ if (vhost_user_blk_connect(dev) < 0) {
+ qemu_chr_fe_disconnect(&s->chardev);
+ return;
+ }
+ s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
+ vhost_user_blk_watch, dev);
+ break;
+ case CHR_EVENT_CLOSED:
+ vhost_user_blk_disconnect(dev);
+ if (s->watch) {
+ g_source_remove(s->watch);
+ s->watch = 0;
+ }
+ break;
+ }
+}
+
static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
VHostUserBlk *s = VHOST_USER_BLK(vdev);
VhostUserState *user;
int i, ret;
+ Error *err = NULL;
if (!s->chardev.chr) {
error_setg(errp, "vhost-user-blk: chardev is mandatory");
@@ -299,26 +429,28 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
}
s->shm = g_new0(struct vhost_shm, 1);
-
- s->dev.nvqs = s->num_queues;
- s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
- s->dev.vq_index = 0;
- s->dev.backend_features = 0;
-
- vhost_dev_set_config_notifier(&s->dev, &blk_ops);
-
- ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
- if (ret < 0) {
- error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
- strerror(-ret));
- goto virtio_err;
- }
+ s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
+ s->watch = 0;
+ s->should_start = false;
+ s->connected = false;
+
+ qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
+ NULL, (void *)dev, NULL, true);
+
+reconnect:
+ do {
+ if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
+ error_report_err(err);
+ err = NULL;
+ sleep(1);
+ }
+ } while (!s->connected);
ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
- sizeof(struct virtio_blk_config));
+ sizeof(struct virtio_blk_config));
if (ret < 0) {
- error_setg(errp, "vhost-user-blk: get block config failed");
- goto vhost_err;
+ error_report("vhost-user-blk: get block config failed");
+ goto reconnect;
}
if (s->blkcfg.num_queues != s->num_queues) {
@@ -327,22 +459,11 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
ret = vhost_dev_init_shm(&s->dev, s->shm);
if (ret < 0) {
- error_setg(errp, "vhost-user-blk: init shared memory failed");
- goto vhost_err;
+ error_report("vhost-user-blk: init shared memory failed");
+ goto reconnect;
}
return;
-
-vhost_err:
- vhost_dev_cleanup(&s->dev);
-virtio_err:
- g_free(s->dev.vqs);
- g_free(s->shm);
- virtio_cleanup(vdev);
-
- vhost_user_cleanup(user);
- g_free(user);
- s->vhost_user = NULL;
}
static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
@@ -351,9 +472,11 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
VHostUserBlk *s = VHOST_USER_BLK(dev);
vhost_user_blk_set_status(vdev, 0);
+ qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL,
+ NULL, NULL, NULL, false);
vhost_dev_cleanup(&s->dev);
vhost_dev_free_shm(s->shm);
- g_free(s->dev.vqs);
+ g_free(s->vqs);
g_free(s->shm);
virtio_cleanup(vdev);
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index bb706d70b3..c17d47402b 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -38,6 +38,10 @@ typedef struct VHostUserBlk {
struct vhost_dev dev;
struct vhost_shm *shm;
VhostUserState *vhost_user;
+ struct vhost_virtqueue *vqs;
+ guint watch;
+ bool should_start;
+ bool connected;
} VHostUserBlk;
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v3 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording
2019-01-03 10:18 [Qemu-devel] [PATCH v3 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
` (5 preceding siblings ...)
2019-01-03 10:18 ` [Qemu-devel] [PATCH v3 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
@ 2019-01-03 10:18 ` elohimes
6 siblings, 0 replies; 14+ messages in thread
From: elohimes @ 2019-01-03 10:18 UTC (permalink / raw)
To: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
yury-kotov, wrfsh
Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji
From: Xie Yongji <xieyongji@baidu.com>
This patch enables inflight I/O recording for
vhost-user-blk backend so that we could restart it safely.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
contrib/vhost-user-blk/vhost-user-blk.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index 858221ad95..6d9a417f4a 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -327,7 +327,8 @@ vub_get_features(VuDev *dev)
static uint64_t
vub_get_protocol_features(VuDev *dev)
{
- return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
+ 1ull << VHOST_USER_PROTOCOL_F_SLAVE_SHMFD;
}
static int
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread