* [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication
@ 2017-12-05 3:33 Wei Wang
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 1/7] vhost-user: share the vhost-user protocol related structures Wei Wang
` (12 more replies)
0 siblings, 13 replies; 77+ messages in thread
From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw)
To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha,
pbonzini
Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang
Vhost-pci is a point-to-point based inter-VM communication solution. This
patch series implements the vhost-pci-net device setup and emulation. The
device is implemented as a virtio device, and it is set up via the
vhost-user protocol to get the neessary info (e.g the memory info of the
remote VM, vring info).
Currently, only the fundamental functions are implemented. More features,
such as MQ and live migration, will be updated in the future.
The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
http://dpdk.org/ml/archives/dev/2017-November/082615.html
v2->v3 changes:
1) static device creation: instead of creating and hot-plugging the
device when receiving a vhost-user msg, the device is not created
via the qemu booting command line.
2) remove vqs: rq and ctrlq are removed in this version.
- receive vq: the receive vq is not needed anymore. The PMD directly
shares the remote txq and rxq - grab from remote txq to
receive packets, and put to rxq to send packets.
- ctrlq: the ctrlq is replaced by the first 4KB metadata area of the
device Bar-2.
3) simpler implementation: the entire implementation has been tailored
from ~1800 LOC to ~850 LOC.
Wei Wang (7):
vhost-user: share the vhost-user protocol related structures
vhost-pci-net: add vhost-pci-net
virtio/virtio-pci.c: add vhost-pci-net-pci
vhost-pci-slave: add vhost-pci slave implementation
vhost-user: VHOST_USER_SET_VHOST_PCI msg
vhost-pci-slave: handle VHOST_USER_SET_VHOST_PCI
virtio/vhost.c: vhost-pci needs remote gpa
hw/net/Makefile.objs | 2 +-
hw/net/vhost_net.c | 37 +++
hw/net/vhost_pci_net.c | 137 +++++++++++
hw/virtio/Makefile.objs | 1 +
hw/virtio/vhost-pci-slave.c | 310 +++++++++++++++++++++++++
hw/virtio/vhost-user.c | 117 ++--------
hw/virtio/vhost.c | 63 +++--
hw/virtio/virtio-pci.c | 55 +++++
hw/virtio/virtio-pci.h | 14 ++
include/hw/pci/pci.h | 1 +
include/hw/virtio/vhost-backend.h | 2 +
include/hw/virtio/vhost-pci-net.h | 42 ++++
include/hw/virtio/vhost-pci-slave.h | 12 +
include/hw/virtio/vhost-user.h | 108 +++++++++
include/hw/virtio/vhost.h | 2 +
include/net/vhost_net.h | 2 +
include/standard-headers/linux/vhost_pci_net.h | 65 ++++++
include/standard-headers/linux/virtio_ids.h | 1 +
18 files changed, 851 insertions(+), 120 deletions(-)
create mode 100644 hw/net/vhost_pci_net.c
create mode 100644 hw/virtio/vhost-pci-slave.c
create mode 100644 include/hw/virtio/vhost-pci-net.h
create mode 100644 include/hw/virtio/vhost-pci-slave.h
create mode 100644 include/hw/virtio/vhost-user.h
create mode 100644 include/standard-headers/linux/vhost_pci_net.h
--
2.7.4
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 77+ messages in thread* [virtio-dev] [PATCH v3 1/7] vhost-user: share the vhost-user protocol related structures 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang @ 2017-12-05 3:33 ` Wei Wang 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net Wei Wang ` (11 subsequent siblings) 12 siblings, 0 replies; 77+ messages in thread From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw) To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang Put the vhost-user protocol related data structures to vhost-user.h, so that they can be used in other implementations (e.g. a slave implementation). Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- hw/virtio/vhost-user.c | 100 +------------------------------------- include/hw/virtio/vhost-user.h | 106 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 107 insertions(+), 99 deletions(-) create mode 100644 include/hw/virtio/vhost-user.h diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 093675e..e512f5a 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -12,6 +12,7 @@ #include "qapi/error.h" #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-backend.h" +#include "hw/virtio/vhost-user.h" #include "hw/virtio/virtio-net.h" #include "chardev/char-fe.h" #include "sysemu/kvm.h" @@ -23,105 +24,6 @@ #include <sys/un.h> #include <linux/vhost.h> -#define VHOST_MEMORY_MAX_NREGIONS 8 -#define VHOST_USER_F_PROTOCOL_FEATURES 30 - -enum VhostUserProtocolFeature { - VHOST_USER_PROTOCOL_F_MQ = 0, - VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, - VHOST_USER_PROTOCOL_F_RARP = 2, - VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, - VHOST_USER_PROTOCOL_F_NET_MTU = 4, - VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5, - VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6, - - VHOST_USER_PROTOCOL_F_MAX -}; - -#define VHOST_USER_PROTOCOL_FEATURE_MASK ((1 << VHOST_USER_PROTOCOL_F_MAX) - 1) - -typedef enum VhostUserRequest { - VHOST_USER_NONE = 0, - VHOST_USER_GET_FEATURES = 1, - VHOST_USER_SET_FEATURES = 2, - VHOST_USER_SET_OWNER = 3, - VHOST_USER_RESET_OWNER = 4, - VHOST_USER_SET_MEM_TABLE = 5, - VHOST_USER_SET_LOG_BASE = 6, - VHOST_USER_SET_LOG_FD = 7, - VHOST_USER_SET_VRING_NUM = 8, - VHOST_USER_SET_VRING_ADDR = 9, - VHOST_USER_SET_VRING_BASE = 10, - VHOST_USER_GET_VRING_BASE = 11, - VHOST_USER_SET_VRING_KICK = 12, - VHOST_USER_SET_VRING_CALL = 13, - VHOST_USER_SET_VRING_ERR = 14, - VHOST_USER_GET_PROTOCOL_FEATURES = 15, - VHOST_USER_SET_PROTOCOL_FEATURES = 16, - VHOST_USER_GET_QUEUE_NUM = 17, - VHOST_USER_SET_VRING_ENABLE = 18, - VHOST_USER_SEND_RARP = 19, - VHOST_USER_NET_SET_MTU = 20, - VHOST_USER_SET_SLAVE_REQ_FD = 21, - VHOST_USER_IOTLB_MSG = 22, - VHOST_USER_SET_VRING_ENDIAN = 23, - VHOST_USER_MAX -} VhostUserRequest; - -typedef enum VhostUserSlaveRequest { - VHOST_USER_SLAVE_NONE = 0, - VHOST_USER_SLAVE_IOTLB_MSG = 1, - VHOST_USER_SLAVE_MAX -} VhostUserSlaveRequest; - -typedef struct VhostUserMemoryRegion { - uint64_t guest_phys_addr; - uint64_t memory_size; - uint64_t userspace_addr; - uint64_t mmap_offset; -} VhostUserMemoryRegion; - -typedef struct VhostUserMemory { - uint32_t nregions; - uint32_t padding; - VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]; -} VhostUserMemory; - -typedef struct VhostUserLog { - uint64_t mmap_size; - uint64_t mmap_offset; -} VhostUserLog; - -typedef struct VhostUserMsg { - VhostUserRequest request; - -#define VHOST_USER_VERSION_MASK (0x3) -#define VHOST_USER_REPLY_MASK (0x1<<2) -#define VHOST_USER_NEED_REPLY_MASK (0x1 << 3) - uint32_t flags; - uint32_t size; /* the following payload size */ - union { -#define VHOST_USER_VRING_IDX_MASK (0xff) -#define VHOST_USER_VRING_NOFD_MASK (0x1<<8) - uint64_t u64; - struct vhost_vring_state state; - struct vhost_vring_addr addr; - VhostUserMemory memory; - VhostUserLog log; - struct vhost_iotlb_msg iotlb; - } payload; -} QEMU_PACKED VhostUserMsg; - -static VhostUserMsg m __attribute__ ((unused)); -#define VHOST_USER_HDR_SIZE (sizeof(m.request) \ - + sizeof(m.flags) \ - + sizeof(m.size)) - -#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE) - -/* The version of the protocol we support */ -#define VHOST_USER_VERSION (0x1) - struct vhost_user { CharBackend *chr; int slave_fd; diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h new file mode 100644 index 0000000..d76e9ad --- /dev/null +++ b/include/hw/virtio/vhost-user.h @@ -0,0 +1,106 @@ +#ifndef VHOST_USER_H +#define VHOST_USER_H + +#include <linux/vhost.h> + +#define VHOST_MEMORY_MAX_NREGIONS 8 +#define VHOST_USER_F_PROTOCOL_FEATURES 30 + +enum VhostUserProtocolFeature { + VHOST_USER_PROTOCOL_F_MQ = 0, + VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, + VHOST_USER_PROTOCOL_F_RARP = 2, + VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, + VHOST_USER_PROTOCOL_F_NET_MTU = 4, + VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5, + VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6, + + VHOST_USER_PROTOCOL_F_MAX +}; + +#define VHOST_USER_PROTOCOL_FEATURE_MASK ((1 << VHOST_USER_PROTOCOL_F_MAX) - 1) + +typedef enum VhostUserRequest { + VHOST_USER_NONE = 0, + VHOST_USER_GET_FEATURES = 1, + VHOST_USER_SET_FEATURES = 2, + VHOST_USER_SET_OWNER = 3, + VHOST_USER_RESET_OWNER = 4, + VHOST_USER_SET_MEM_TABLE = 5, + VHOST_USER_SET_LOG_BASE = 6, + VHOST_USER_SET_LOG_FD = 7, + VHOST_USER_SET_VRING_NUM = 8, + VHOST_USER_SET_VRING_ADDR = 9, + VHOST_USER_SET_VRING_BASE = 10, + VHOST_USER_GET_VRING_BASE = 11, + VHOST_USER_SET_VRING_KICK = 12, + VHOST_USER_SET_VRING_CALL = 13, + VHOST_USER_SET_VRING_ERR = 14, + VHOST_USER_GET_PROTOCOL_FEATURES = 15, + VHOST_USER_SET_PROTOCOL_FEATURES = 16, + VHOST_USER_GET_QUEUE_NUM = 17, + VHOST_USER_SET_VRING_ENABLE = 18, + VHOST_USER_SEND_RARP = 19, + VHOST_USER_NET_SET_MTU = 20, + VHOST_USER_SET_SLAVE_REQ_FD = 21, + VHOST_USER_IOTLB_MSG = 22, + VHOST_USER_SET_VRING_ENDIAN = 23, + VHOST_USER_MAX +} VhostUserRequest; + +typedef enum VhostUserSlaveRequest { + VHOST_USER_SLAVE_NONE = 0, + VHOST_USER_SLAVE_IOTLB_MSG = 1, + VHOST_USER_SLAVE_MAX +} VhostUserSlaveRequest; + +typedef struct VhostUserMemoryRegion { + uint64_t guest_phys_addr; + uint64_t memory_size; + uint64_t userspace_addr; + uint64_t mmap_offset; +} VhostUserMemoryRegion; + +typedef struct VhostUserMemory { + uint32_t nregions; + uint32_t padding; + VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]; +} VhostUserMemory; + +typedef struct VhostUserLog { + uint64_t mmap_size; + uint64_t mmap_offset; +} VhostUserLog; + +typedef struct VhostUserMsg { + VhostUserRequest request; + +#define VHOST_USER_VERSION_MASK (0x3) +#define VHOST_USER_REPLY_MASK (0x1 << 2) +#define VHOST_USER_NEED_REPLY_MASK (0x1 << 3) + uint32_t flags; + /* The following payload size */ + uint32_t size; + union { +#define VHOST_USER_VRING_IDX_MASK (0xff) +#define VHOST_USER_VRING_NOFD_MASK (0x1 << 8) + uint64_t u64; + struct vhost_vring_state state; + struct vhost_vring_addr addr; + VhostUserMemory memory; + VhostUserLog log; + struct vhost_iotlb_msg iotlb; + } payload; +} QEMU_PACKED VhostUserMsg; + +static VhostUserMsg m __attribute__ ((unused)); +#define VHOST_USER_HDR_SIZE (sizeof(m.request) \ + + sizeof(m.flags) \ + + sizeof(m.size)) + +#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE) + +/* The version of the protocol we support */ +#define VHOST_USER_VERSION (0x1) + +#endif -- 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 77+ messages in thread
* [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 1/7] vhost-user: share the vhost-user protocol related structures Wei Wang @ 2017-12-05 3:33 ` Wei Wang 2017-12-05 14:59 ` Stefan Hajnoczi 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 3/7] virtio/virtio-pci.c: add vhost-pci-net-pci Wei Wang ` (10 subsequent siblings) 12 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw) To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang Add the vhost-pci-net device emulation. The device uses bar 2 to expose the remote VM's memory to the guest. The first 4KB of the the bar area stores the metadata which describes the remote memory and vring info. Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- hw/net/Makefile.objs | 2 +- hw/net/vhost_pci_net.c | 106 +++++++++++++++++++++++++ include/hw/virtio/vhost-pci-net.h | 37 +++++++++ include/standard-headers/linux/vhost_pci_net.h | 60 ++++++++++++++ include/standard-headers/linux/virtio_ids.h | 1 + 5 files changed, 205 insertions(+), 1 deletion(-) create mode 100644 hw/net/vhost_pci_net.c create mode 100644 include/hw/virtio/vhost-pci-net.h create mode 100644 include/standard-headers/linux/vhost_pci_net.h diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs index 4171af0..8b392fb 100644 --- a/hw/net/Makefile.objs +++ b/hw/net/Makefile.objs @@ -36,7 +36,7 @@ obj-$(CONFIG_MILKYMIST) += milkymist-minimac2.o obj-$(CONFIG_PSERIES) += spapr_llan.o obj-$(CONFIG_XILINX_ETHLITE) += xilinx_ethlite.o -obj-$(CONFIG_VIRTIO) += virtio-net.o +obj-$(CONFIG_VIRTIO) += virtio-net.o vhost_pci_net.o obj-y += vhost_net.o obj-$(CONFIG_ETSEC) += fsl_etsec/etsec.o fsl_etsec/registers.o \ diff --git a/hw/net/vhost_pci_net.c b/hw/net/vhost_pci_net.c new file mode 100644 index 0000000..db8f954 --- /dev/null +++ b/hw/net/vhost_pci_net.c @@ -0,0 +1,106 @@ +/* + * vhost-pci-net support + * + * Copyright Intel, Inc. 2017 + * + * Authors: + * Wei Wang <wei.w.wang@intel.com> + * Zhiyong Yang <zhiyong.yang@intel.com> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ + +#include "qemu/osdep.h" +#include "qemu/iov.h" +#include "qemu/error-report.h" +#include "hw/pci/pci.h" +#include "hw/virtio/virtio-access.h" +#include "hw/virtio/vhost-pci-net.h" +#include "hw/virtio/virtio-net.h" + +static uint64_t vpnet_get_features(VirtIODevice *vdev, uint64_t features, + Error **errp) +{ + VhostPCINet *vpnet = VHOST_PCI_NET(vdev); + features |= vpnet->host_features; + + return features; +} + +static void vpnet_get_config(VirtIODevice *vdev, uint8_t *config) +{ + VhostPCINet *vpnet = VHOST_PCI_NET(vdev); + struct vpnet_config netcfg; + + virtio_stw_p(vdev, &netcfg.status, vpnet->status); + memcpy(config, &netcfg, vpnet->config_size); +} + +static void vpnet_device_realize(DeviceState *dev, Error **errp) +{ + VirtIODevice *vdev = VIRTIO_DEVICE(dev); + VhostPCINet *vpnet = VHOST_PCI_NET(vdev); + + virtio_init(vdev, "vhost-pci-net", VIRTIO_ID_VHOST_PCI_NET, + vpnet->config_size); + + memory_region_init_ram(&vpnet->metadata_region, NULL, + "Metadata", METADATA_SIZE, NULL); + memory_region_add_subregion(&vpnet->bar_region, 0, + &vpnet->metadata_region); + vpnet->metadata = memory_region_get_ram_ptr(&vpnet->metadata_region); + memset(vpnet->metadata, 0, METADATA_SIZE); +} + +static void vpnet_device_unrealize(DeviceState *dev, Error **errp) +{ + VirtIODevice *vdev = VIRTIO_DEVICE(dev); + + virtio_cleanup(vdev); +} + +static Property vpnet_properties[] = { + DEFINE_PROP_BIT("mrg_rxbuf", VhostPCINet, host_features, + VIRTIO_NET_F_MRG_RXBUF, true), + DEFINE_PROP_CHR("chardev", VhostPCINet, chr_be), + DEFINE_PROP_END_OF_LIST(), +}; + +static void vpnet_instance_init(Object *obj) +{ + VhostPCINet *vpnet = VHOST_PCI_NET(obj); + + vpnet->config_size = sizeof(struct vpnet_config); +} + +static void vpnet_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass); + + dc->props = vpnet_properties; + set_bit(DEVICE_CATEGORY_NETWORK, dc->categories); + vdc->realize = vpnet_device_realize; + vdc->unrealize = vpnet_device_unrealize; + vdc->get_config = vpnet_get_config; + vdc->get_features = vpnet_get_features; +} + +static const TypeInfo vpnet_info = { + .name = TYPE_VHOST_PCI_NET, + .parent = TYPE_VIRTIO_DEVICE, + .instance_size = sizeof(VhostPCINet), + .instance_init = vpnet_instance_init, + .class_init = vpnet_class_init, +}; + +static void virtio_register_types(void) +{ + type_register_static(&vpnet_info); +} + +type_init(virtio_register_types) diff --git a/include/hw/virtio/vhost-pci-net.h b/include/hw/virtio/vhost-pci-net.h new file mode 100644 index 0000000..0c6886d --- /dev/null +++ b/include/hw/virtio/vhost-pci-net.h @@ -0,0 +1,37 @@ +/* + * Virtio Network Device + * + * Copyright Intel, Corp. 2017 + * + * Authors: + * Wei Wang <wei.w.wang@intel.com> + * Zhiyong Yang <zhiyong.yang@intel.com> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef _QEMU_VHOST_PCI_NET_H +#define _QEMU_VHOST_PCI_NET_H + +#include "standard-headers/linux/vhost_pci_net.h" +#include "hw/virtio/virtio.h" +#include "chardev/char-fe.h" + +#define TYPE_VHOST_PCI_NET "vhost-pci-net-device" +#define VHOST_PCI_NET(obj) \ + OBJECT_CHECK(VhostPCINet, (obj), TYPE_VHOST_PCI_NET) + +typedef struct VhostPCINet { + VirtIODevice parent_obj; + MemoryRegion bar_region; + MemoryRegion metadata_region; + struct vpnet_metadata *metadata; + uint32_t host_features; + size_t config_size; + uint16_t status; + CharBackend chr_be; +} VhostPCINet; + +#endif diff --git a/include/standard-headers/linux/vhost_pci_net.h b/include/standard-headers/linux/vhost_pci_net.h new file mode 100644 index 0000000..cfb2413 --- /dev/null +++ b/include/standard-headers/linux/vhost_pci_net.h @@ -0,0 +1,60 @@ +#ifndef _LINUX_VHOST_PCI_NET_H +#define _LINUX_VHOST_PCI_NET_H + +/* This header is BSD licensed so anyone can use the definitions to implement + * compatible drivers/servers. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of Intel nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL Intel OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. */ + +#include "standard-headers/linux/virtio_ids.h" + +#define METADATA_SIZE 4096 +#define MAX_REMOTE_REGION 8 + +struct vpnet_config { + uint16_t status; +}; + +struct vpnet_remote_mem { + uint64_t gpa; + uint64_t size; +}; + +struct vpnet_remote_vq { + uint16_t last_avail_idx; + int32_t vring_enabled; + uint32_t vring_num; + uint64_t desc_gpa; + uint64_t avail_gpa; + uint64_t used_gpa; +}; + +struct vpnet_metadata { + uint32_t nregions; + uint32_t nvqs; + struct vpnet_remote_mem mem[MAX_REMOTE_REGION]; + struct vpnet_remote_vq vq[0]; +}; + +#endif diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h index 6d5c3b2..333bbd1 100644 --- a/include/standard-headers/linux/virtio_ids.h +++ b/include/standard-headers/linux/virtio_ids.h @@ -43,5 +43,6 @@ #define VIRTIO_ID_INPUT 18 /* virtio input */ #define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */ #define VIRTIO_ID_CRYPTO 20 /* virtio crypto */ +#define VIRTIO_ID_VHOST_PCI_NET 21 /* vhost-pci-net */ #endif /* _LINUX_VIRTIO_IDS_H */ -- 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net Wei Wang @ 2017-12-05 14:59 ` Stefan Hajnoczi 2017-12-05 15:17 ` Michael S. Tsirkin ` (2 more replies) 0 siblings, 3 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-05 14:59 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 1006 bytes --] On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > the remote VM's memory to the guest. The first 4KB of the the bar area > stores the metadata which describes the remote memory and vring info. This device looks like the beginning of a new "vhost-pci" virtio device type. There are layering violations: 1. This has nothing to do with virtio-net or networking, it's purely vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? 2. VirtIODevice does not know about PCI. It should work over virtio-ccw or virtio-mmio. This patch talks about BARs inside a VirtIODevice so there is a problem here. I'm concerned that there is no clear architecture and elements of the virtio architecture are being mixed up with no justification. Can you explain what you're trying to do? Please post a specification for the vhost-pci device so the operation of the device can be discussed and is clear to reviewers. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 14:59 ` Stefan Hajnoczi @ 2017-12-05 15:17 ` Michael S. Tsirkin 2017-12-05 15:55 ` Michael S. Tsirkin 2017-12-06 10:17 ` Wei Wang 2 siblings, 0 replies; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-05 15:17 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev, qemu-devel, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang On Tue, Dec 05, 2017 at 02:59:50PM +0000, Stefan Hajnoczi wrote: > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > > the remote VM's memory to the guest. The first 4KB of the the bar area > > stores the metadata which describes the remote memory and vring info. > > This device looks like the beginning of a new "vhost-pci" virtio device > type. There are layering violations: > > 1. This has nothing to do with virtio-net or networking, it's purely > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? > > 2. VirtIODevice does not know about PCI. It should work over virtio-ccw > or virtio-mmio. This patch talks about BARs inside a VirtIODevice so > there is a problem here. > > I'm concerned that there is no clear architecture and elements of the > virtio architecture are being mixed up with no justification. > > Can you explain what you're trying to do? A specification was posted here: https://lists.oasis-open.org/archives/virtio-comment/201606/msg00009.html I gather there have been some changes since. > Please post a specification for the vhost-pci device so the operation of > the device can be discussed and is clear to reviewers. I'm not sure a full respin of the spec is strictly required at this point. A list of differences with last spec posted would be appreciated. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 14:59 ` Stefan Hajnoczi 2017-12-05 15:17 ` Michael S. Tsirkin @ 2017-12-05 15:55 ` Michael S. Tsirkin 2017-12-05 16:41 ` Stefan Hajnoczi 2017-12-06 10:17 ` Wei Wang 2 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-05 15:55 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev, qemu-devel, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang On Tue, Dec 05, 2017 at 02:59:50PM +0000, Stefan Hajnoczi wrote: > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > > the remote VM's memory to the guest. The first 4KB of the the bar area > > stores the metadata which describes the remote memory and vring info. > > This device looks like the beginning of a new "vhost-pci" virtio device > type. There are layering violations: > > 1. This has nothing to do with virtio-net or networking, it's purely > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? > > 2. VirtIODevice does not know about PCI. It should work over virtio-ccw > or virtio-mmio. This patch talks about BARs inside a VirtIODevice so > there is a problem here. I think the point is how memory is exposed to another guest. This device exposes it as a pci bar. I don't think e.g. ccw can do this, it's all hypercall-based. > I'm concerned that there is no clear architecture and elements of the > virtio architecture are being mixed up with no justification. > > Can you explain what you're trying to do? > > Please post a specification for the vhost-pci device so the operation of > the device can be discussed and is clear to reviewers. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 15:55 ` Michael S. Tsirkin @ 2017-12-05 16:41 ` Stefan Hajnoczi 2017-12-05 16:53 ` Michael S. Tsirkin 0 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-05 16:41 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Wei Wang, virtio-dev, qemu-devel, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 1880 bytes --] On Tue, Dec 05, 2017 at 05:55:45PM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 02:59:50PM +0000, Stefan Hajnoczi wrote: > > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > > > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > > > the remote VM's memory to the guest. The first 4KB of the the bar area > > > stores the metadata which describes the remote memory and vring info. > > > > This device looks like the beginning of a new "vhost-pci" virtio device > > type. There are layering violations: > > > > 1. This has nothing to do with virtio-net or networking, it's purely > > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? > > > > 2. VirtIODevice does not know about PCI. It should work over virtio-ccw > > or virtio-mmio. This patch talks about BARs inside a VirtIODevice so > > there is a problem here. > > I think the point is how memory is exposed to another guest. This > device exposes it as a pci bar. I don't think e.g. ccw can do this, > it's all hypercall-based. Yes, that's why the BAR issue needs to be discussed. In terms of the patches, the clean way to do it is for the vhost-pci device to have a memory region that is not called "BAR". The virtio-pci transport can expose it as a BAR but the device doesn't need to know about it. Other transports that support memory mapping could then work with this device too. The VIRTIO specification needs to capture this transport requirement somehow too so it's clear that the vhost device can only run over transports that support memory mapping. That said, it's not clear to me why the vhost-pci device is a VIRTIO device. It doesn't use virtqueues or the configuration space. It only uses the vhost-user chardev and the mapped memory. Isn't it better to make it a PCI device? Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 16:41 ` Stefan Hajnoczi @ 2017-12-05 16:53 ` Michael S. Tsirkin 2017-12-05 17:00 ` [virtio-dev] Re: [Qemu-devel] " Cornelia Huck 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-05 16:53 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev, qemu-devel, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang On Tue, Dec 05, 2017 at 04:41:54PM +0000, Stefan Hajnoczi wrote: > On Tue, Dec 05, 2017 at 05:55:45PM +0200, Michael S. Tsirkin wrote: > > On Tue, Dec 05, 2017 at 02:59:50PM +0000, Stefan Hajnoczi wrote: > > > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > > > > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > > > > the remote VM's memory to the guest. The first 4KB of the the bar area > > > > stores the metadata which describes the remote memory and vring info. > > > > > > This device looks like the beginning of a new "vhost-pci" virtio device > > > type. There are layering violations: > > > > > > 1. This has nothing to do with virtio-net or networking, it's purely > > > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? > > > > > > 2. VirtIODevice does not know about PCI. It should work over virtio-ccw > > > or virtio-mmio. This patch talks about BARs inside a VirtIODevice so > > > there is a problem here. > > > > I think the point is how memory is exposed to another guest. This > > device exposes it as a pci bar. I don't think e.g. ccw can do this, > > it's all hypercall-based. > > Yes, that's why the BAR issue needs to be discussed. > > In terms of the patches, the clean way to do it is for the > vhost-pci device to have a memory region that is not called "BAR". The > virtio-pci transport can expose it as a BAR but the device doesn't need > to know about it. Other transports that support memory mapping could > then work with this device too. True, though mmio is pretty much a legacy transport at this point at least from qemu perspective as arm devs don't seem to be working on virtio 1.0 support in qemu. So I am not sure how much of a priority should transport isolation be. > The VIRTIO specification needs to capture this transport requirement > somehow too so it's clear that the vhost device can only run over > transports that support memory mapping. > > That said, it's not clear to me why the vhost-pci device is a VIRTIO > device. It doesn't use virtqueues or the configuration space. It only > uses the vhost-user chardev and the mapped memory. Isn't it better to > make it a PCI device? > > Stefan Seems similar enough to me, except The roles of device and driver are reversed here. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 16:53 ` Michael S. Tsirkin @ 2017-12-05 17:00 ` Cornelia Huck 2017-12-05 18:06 ` Michael S. Tsirkin 0 siblings, 1 reply; 77+ messages in thread From: Cornelia Huck @ 2017-12-05 17:00 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Stefan Hajnoczi, virtio-dev, avi.cohen, zhiyong.yang, jan.kiszka, jasowang, qemu-devel, Wei Wang, marcandre.lureau, pbonzini On Tue, 5 Dec 2017 18:53:29 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Dec 05, 2017 at 04:41:54PM +0000, Stefan Hajnoczi wrote: > > On Tue, Dec 05, 2017 at 05:55:45PM +0200, Michael S. Tsirkin wrote: > > > On Tue, Dec 05, 2017 at 02:59:50PM +0000, Stefan Hajnoczi wrote: > > > > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > > > > > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > > > > > the remote VM's memory to the guest. The first 4KB of the the bar area > > > > > stores the metadata which describes the remote memory and vring info. > > > > > > > > This device looks like the beginning of a new "vhost-pci" virtio device > > > > type. There are layering violations: > > > > > > > > 1. This has nothing to do with virtio-net or networking, it's purely > > > > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? > > > > > > > > 2. VirtIODevice does not know about PCI. It should work over virtio-ccw > > > > or virtio-mmio. This patch talks about BARs inside a VirtIODevice so > > > > there is a problem here. > > > > > > I think the point is how memory is exposed to another guest. This > > > device exposes it as a pci bar. I don't think e.g. ccw can do this, > > > it's all hypercall-based. > > > > Yes, that's why the BAR issue needs to be discussed. > > > > In terms of the patches, the clean way to do it is for the > > vhost-pci device to have a memory region that is not called "BAR". The > > virtio-pci transport can expose it as a BAR but the device doesn't need > > to know about it. Other transports that support memory mapping could > > then work with this device too. > > True, though mmio is pretty much a legacy transport at this point > at least from qemu perspective as arm devs don't seem to be working > on virtio 1.0 support in qemu. So I am not sure how much > of a priority should transport isolation be. I currently don't see an easy way to make this work via ccw, FWIW. We would need a dedicated mechanism for it, and I'm not sure what the gain would be. > > > The VIRTIO specification needs to capture this transport requirement > > somehow too so it's clear that the vhost device can only run over > > transports that support memory mapping. > > > > That said, it's not clear to me why the vhost-pci device is a VIRTIO > > device. It doesn't use virtqueues or the configuration space. It only > > uses the vhost-user chardev and the mapped memory. Isn't it better to > > make it a PCI device? > > > > Stefan > > Seems similar enough to me, except The roles of device and driver are > reversed here. > But will anything other than pci ever make use of this? --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 17:00 ` [virtio-dev] Re: [Qemu-devel] " Cornelia Huck @ 2017-12-05 18:06 ` Michael S. Tsirkin 0 siblings, 0 replies; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-05 18:06 UTC (permalink / raw) To: Cornelia Huck Cc: Stefan Hajnoczi, virtio-dev, avi.cohen, zhiyong.yang, jan.kiszka, jasowang, qemu-devel, Wei Wang, marcandre.lureau, pbonzini On Tue, Dec 05, 2017 at 06:00:10PM +0100, Cornelia Huck wrote: > On Tue, 5 Dec 2017 18:53:29 +0200 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Tue, Dec 05, 2017 at 04:41:54PM +0000, Stefan Hajnoczi wrote: > > > On Tue, Dec 05, 2017 at 05:55:45PM +0200, Michael S. Tsirkin wrote: > > > > On Tue, Dec 05, 2017 at 02:59:50PM +0000, Stefan Hajnoczi wrote: > > > > > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > > > > > > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > > > > > > the remote VM's memory to the guest. The first 4KB of the the bar area > > > > > > stores the metadata which describes the remote memory and vring info. > > > > > > > > > > This device looks like the beginning of a new "vhost-pci" virtio device > > > > > type. There are layering violations: > > > > > > > > > > 1. This has nothing to do with virtio-net or networking, it's purely > > > > > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? > > > > > > > > > > 2. VirtIODevice does not know about PCI. It should work over virtio-ccw > > > > > or virtio-mmio. This patch talks about BARs inside a VirtIODevice so > > > > > there is a problem here. > > > > > > > > I think the point is how memory is exposed to another guest. This > > > > device exposes it as a pci bar. I don't think e.g. ccw can do this, > > > > it's all hypercall-based. > > > > > > Yes, that's why the BAR issue needs to be discussed. > > > > > > In terms of the patches, the clean way to do it is for the > > > vhost-pci device to have a memory region that is not called "BAR". The > > > virtio-pci transport can expose it as a BAR but the device doesn't need > > > to know about it. Other transports that support memory mapping could > > > then work with this device too. > > > > True, though mmio is pretty much a legacy transport at this point > > at least from qemu perspective as arm devs don't seem to be working > > on virtio 1.0 support in qemu. So I am not sure how much > > of a priority should transport isolation be. > > I currently don't see an easy way to make this work via ccw, FWIW. We > would need a dedicated mechanism for it, and I'm not sure what the gain > would be. > > > > > > The VIRTIO specification needs to capture this transport requirement > > > somehow too so it's clear that the vhost device can only run over > > > transports that support memory mapping. > > > > > > That said, it's not clear to me why the vhost-pci device is a VIRTIO > > > device. It doesn't use virtqueues or the configuration space. It only > > > uses the vhost-user chardev and the mapped memory. Isn't it better to > > > make it a PCI device? > > > > > > Stefan > > > > Seems similar enough to me, except The roles of device and driver are > > reversed here. > > > > But will anything other than pci ever make use of this? That's just it, I am not entirely sure. So IMHO it's fine to make it a pci specific thing for now. virtio started like that too. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-05 14:59 ` Stefan Hajnoczi 2017-12-05 15:17 ` Michael S. Tsirkin 2017-12-05 15:55 ` Michael S. Tsirkin @ 2017-12-06 10:17 ` Wei Wang 2017-12-06 12:01 ` Stefan Hajnoczi 2 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-06 10:17 UTC (permalink / raw) To: Stefan Hajnoczi Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang On 12/05/2017 10:59 PM, Stefan Hajnoczi wrote: > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: >> Add the vhost-pci-net device emulation. The device uses bar 2 to expose >> the remote VM's memory to the guest. The first 4KB of the the bar area >> stores the metadata which describes the remote memory and vring info. > This device looks like the beginning of a new "vhost-pci" virtio device > type. There are layering violations: > > 1. This has nothing to do with virtio-net or networking, it's purely > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? Here are a few things that are specific to vhost-pci-net here: 1) The device category here is set to NETWORK. 2) vhost-pci-net related features (e.g. future MQ feature) will be added to the property here. Right now, we only have vhost-pci-net. How about all focusing on the vhost-pci-net, and ignore vhost-pci for now? When future other types of devices are addded, we can abstract out a common vhost-pci layer? Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net 2017-12-06 10:17 ` Wei Wang @ 2017-12-06 12:01 ` Stefan Hajnoczi 0 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-06 12:01 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang, felipe, changpeng.liu [-- Attachment #1: Type: text/plain, Size: 1810 bytes --] On Wed, Dec 06, 2017 at 06:17:27PM +0800, Wei Wang wrote: > On 12/05/2017 10:59 PM, Stefan Hajnoczi wrote: > > On Tue, Dec 05, 2017 at 11:33:11AM +0800, Wei Wang wrote: > > > Add the vhost-pci-net device emulation. The device uses bar 2 to expose > > > the remote VM's memory to the guest. The first 4KB of the the bar area > > > stores the metadata which describes the remote memory and vring info. > > This device looks like the beginning of a new "vhost-pci" virtio device > > type. There are layering violations: > > > > 1. This has nothing to do with virtio-net or networking, it's purely > > vhost-pci. Why is it called vhost-pci-net instead of vhost-pci? > > Here are a few things that are specific to vhost-pci-net here: > > 1) The device category here is set to NETWORK. > 2) vhost-pci-net related features (e.g. future MQ feature) will be added to > the property here. > > Right now, we only have vhost-pci-net. How about all focusing on the > vhost-pci-net, and ignore vhost-pci for now? When future other types of > devices are addded, we can abstract out a common vhost-pci layer? That won't work well for a new device. It would be fine if this code was internal to QEMU, but it's a guest interface and changing it is relatively costly. Once this code ships the hardware interface is fixed and drivers depend on it. Changing the hardware interface requires driver upgrades inside the guest. All information needed to design vhost-pci for multiple device types is already available. The VIRTIO specification defines the device model and vhost-user supports at least virtio-net, virtio-scsi, and virtio-blk (in development). I have CCed Felipe (vhost-user-scsi) and Changpeng (vhost-user-blk) to see if they are interested in vhost-pci. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] [PATCH v3 3/7] virtio/virtio-pci.c: add vhost-pci-net-pci 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 1/7] vhost-user: share the vhost-user protocol related structures Wei Wang 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net Wei Wang @ 2017-12-05 3:33 ` Wei Wang 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation Wei Wang ` (9 subsequent siblings) 12 siblings, 0 replies; 77+ messages in thread From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw) To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang Add the virtio-pci emulation part of the vhost-pci device. BAR2 is used to expose the remote VM's memory to the guest, and its default size is set to 64GB. Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- hw/virtio/virtio-pci.c | 55 ++++++++++++++++++++++++++ hw/virtio/virtio-pci.h | 14 +++++++ include/hw/pci/pci.h | 1 + include/standard-headers/linux/vhost_pci_net.h | 3 ++ 4 files changed, 73 insertions(+) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index e92837c..2a614b8 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2386,6 +2386,60 @@ static const TypeInfo virtio_net_pci_info = { .class_init = virtio_net_pci_class_init, }; +/* vhost-pci-net */ + +static Property vpnet_pci_properties[] = { + DEFINE_PROP_END_OF_LIST(), +}; + +static void vpnet_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) +{ + VhostPCINetPCI *dev = VHOST_PCI_NET_PCI(vpci_dev); + DeviceState *vdev = DEVICE(&dev->vdev); + + memory_region_init(&dev->vdev.bar_region, NULL, "RemoteMemory", + REMOTE_MEM_BAR_SIZE); + pci_register_bar(&vpci_dev->pci_dev, REMOTE_MEM_BAR_ID, + PCI_BASE_ADDRESS_SPACE_MEMORY | + PCI_BASE_ADDRESS_MEM_PREFETCH | + PCI_BASE_ADDRESS_MEM_TYPE_64, + &dev->vdev.bar_region); + + qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus)); + virtio_pci_force_virtio_1(vpci_dev); + object_property_set_bool(OBJECT(vdev), true, "realized", errp); +} + +static void vpnet_pci_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + VirtioPCIClass *vpciklass = VIRTIO_PCI_CLASS(klass); + + k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET; + k->device_id = PCI_DEVICE_ID_VHOST_PCI_NET; + k->class_id = PCI_CLASS_NETWORK_ETHERNET; + set_bit(DEVICE_CATEGORY_NETWORK, dc->categories); + dc->props = vpnet_pci_properties; + vpciklass->realize = vpnet_pci_realize; +} + +static void vpnet_pci_instance_init(Object *obj) +{ + VhostPCINetPCI *dev = VHOST_PCI_NET_PCI(obj); + + virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev), + TYPE_VHOST_PCI_NET); +} + +static const TypeInfo vpnet_pci_info = { + .name = TYPE_VHOST_PCI_NET_PCI, + .parent = TYPE_VIRTIO_PCI, + .instance_size = sizeof(VhostPCINetPCI), + .instance_init = vpnet_pci_instance_init, + .class_init = vpnet_pci_class_init, +}; + /* virtio-rng-pci */ static void virtio_rng_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) @@ -2626,6 +2680,7 @@ static void virtio_pci_register_types(void) type_register_static(&virtio_balloon_pci_info); type_register_static(&virtio_serial_pci_info); type_register_static(&virtio_net_pci_info); + type_register_static(&vpnet_pci_info); #ifdef CONFIG_VHOST_SCSI type_register_static(&vhost_scsi_pci_info); #endif diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h index 12d3a90..cc58c17 100644 --- a/hw/virtio/virtio-pci.h +++ b/hw/virtio/virtio-pci.h @@ -18,6 +18,7 @@ #include "hw/pci/msi.h" #include "hw/virtio/virtio-blk.h" #include "hw/virtio/virtio-net.h" +#include "hw/virtio/vhost-pci-net.h" #include "hw/virtio/virtio-rng.h" #include "hw/virtio/virtio-serial.h" #include "hw/virtio/virtio-scsi.h" @@ -44,6 +45,7 @@ typedef struct VirtIOSCSIPCI VirtIOSCSIPCI; typedef struct VirtIOBalloonPCI VirtIOBalloonPCI; typedef struct VirtIOSerialPCI VirtIOSerialPCI; typedef struct VirtIONetPCI VirtIONetPCI; +typedef struct VhostPCINetPCI VhostPCINetPCI; typedef struct VHostSCSIPCI VHostSCSIPCI; typedef struct VHostUserSCSIPCI VHostUserSCSIPCI; typedef struct VirtIORngPCI VirtIORngPCI; @@ -293,6 +295,18 @@ struct VirtIONetPCI { }; /* + * vhost-pci-net-pci: This extends VirtioPCIProxy. + */ +#define TYPE_VHOST_PCI_NET_PCI "vhost-pci-net-pci" +#define VHOST_PCI_NET_PCI(obj) \ + OBJECT_CHECK(VhostPCINetPCI, (obj), TYPE_VHOST_PCI_NET_PCI) + +struct VhostPCINetPCI { + VirtIOPCIProxy parent_obj; + VhostPCINet vdev; +}; + +/* * virtio-9p-pci: This extends VirtioPCIProxy. */ diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 8d02a0a..7911fa8 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -85,6 +85,7 @@ extern bool pci_available; #define PCI_DEVICE_ID_VIRTIO_RNG 0x1005 #define PCI_DEVICE_ID_VIRTIO_9P 0x1009 #define PCI_DEVICE_ID_VIRTIO_VSOCK 0x1012 +#define PCI_DEVICE_ID_VHOST_PCI_NET 0x1014 #define PCI_VENDOR_ID_REDHAT 0x1b36 #define PCI_DEVICE_ID_REDHAT_BRIDGE 0x0001 diff --git a/include/standard-headers/linux/vhost_pci_net.h b/include/standard-headers/linux/vhost_pci_net.h index cfb2413..792261e 100644 --- a/include/standard-headers/linux/vhost_pci_net.h +++ b/include/standard-headers/linux/vhost_pci_net.h @@ -29,7 +29,10 @@ #include "standard-headers/linux/virtio_ids.h" +#define REMOTE_MEM_BAR_ID 2 +#define REMOTE_MEM_BAR_SIZE 0x1000000000 #define METADATA_SIZE 4096 + #define MAX_REMOTE_REGION 8 struct vpnet_config { -- 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 77+ messages in thread
* [virtio-dev] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (2 preceding siblings ...) 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 3/7] virtio/virtio-pci.c: add vhost-pci-net-pci Wei Wang @ 2017-12-05 3:33 ` Wei Wang 2017-12-05 15:56 ` [virtio-dev] " Stefan Hajnoczi ` (2 more replies) 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg Wei Wang ` (8 subsequent siblings) 12 siblings, 3 replies; 77+ messages in thread From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw) To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang The vhost-pci slave implementation is added to support the creation of the vhost-pci-net device. It follows the vhost-user protocol to get the master VM's info (e.g. memory regions, vring address). Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- hw/net/vhost_pci_net.c | 20 ++- hw/virtio/Makefile.objs | 1 + hw/virtio/vhost-pci-slave.c | 307 ++++++++++++++++++++++++++++++++++++ include/hw/virtio/vhost-pci-net.h | 3 + include/hw/virtio/vhost-pci-slave.h | 12 ++ 5 files changed, 342 insertions(+), 1 deletion(-) create mode 100644 hw/virtio/vhost-pci-slave.c create mode 100644 include/hw/virtio/vhost-pci-slave.h diff --git a/hw/net/vhost_pci_net.c b/hw/net/vhost_pci_net.c index db8f954..11184c3 100644 --- a/hw/net/vhost_pci_net.c +++ b/hw/net/vhost_pci_net.c @@ -21,6 +21,7 @@ #include "hw/virtio/virtio-access.h" #include "hw/virtio/vhost-pci-net.h" #include "hw/virtio/virtio-net.h" +#include "hw/virtio/vhost-pci-slave.h" static uint64_t vpnet_get_features(VirtIODevice *vdev, uint64_t features, Error **errp) @@ -45,6 +46,10 @@ static void vpnet_device_realize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VhostPCINet *vpnet = VHOST_PCI_NET(vdev); + qemu_chr_fe_set_handlers(&vpnet->chr_be, vp_slave_can_read, + vp_slave_read, vp_slave_event, NULL, + vpnet, NULL, true); + virtio_init(vdev, "vhost-pci-net", VIRTIO_ID_VHOST_PCI_NET, vpnet->config_size); @@ -59,7 +64,20 @@ static void vpnet_device_realize(DeviceState *dev, Error **errp) static void vpnet_device_unrealize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); - + VhostPCINet *vpnet = VHOST_PCI_NET(vdev); + int i, ret, nregions = vpnet->metadata->nregions; + + for (i = 0; i < nregions; i++) { + ret = munmap(vpnet->remote_mem_base[i], vpnet->remote_mem_map_size[i]); + if (ret < 0) { + error_report("%s: failed to unmap mr[%d]", __func__, i); + continue; + } + memory_region_del_subregion(&vpnet->bar_region, + &vpnet->remote_mem_region[i]); + } + + qemu_chr_fe_deinit(&vpnet->chr_be, true); virtio_cleanup(vdev); } diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index 765d363..5e81f2f 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -3,6 +3,7 @@ common-obj-y += virtio-rng.o common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o common-obj-y += virtio-bus.o common-obj-y += virtio-mmio.o +common-obj-y += vhost-pci-slave.o obj-y += virtio.o virtio-balloon.o obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o diff --git a/hw/virtio/vhost-pci-slave.c b/hw/virtio/vhost-pci-slave.c new file mode 100644 index 0000000..b1a8620 --- /dev/null +++ b/hw/virtio/vhost-pci-slave.c @@ -0,0 +1,307 @@ +/* + * Vhost-pci Slave + * + * Copyright Intel Corp. 2017 + * + * Authors: + * Wei Wang <wei.w.wang@intel.com> + * Zhiyong Yang <zhiyong.yang@intel.com> + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include <qemu/osdep.h> +#include <qemu/sockets.h> + +#include "monitor/qdev.h" +#include "qapi/error.h" +#include "qemu/config-file.h" +#include "qemu/error-report.h" +#include "hw/virtio/virtio-pci.h" +#include "hw/virtio/vhost-pci-slave.h" +#include "hw/virtio/vhost-user.h" +#include "hw/virtio/vhost-pci-net.h" + +#define VHOST_USER_PROTOCOL_FEATURES 0 + +static int vp_slave_write(CharBackend *chr_be, VhostUserMsg *msg) +{ + int size; + + if (!msg) { + return 0; + } + + /* The payload size has been assigned, plus the header size here */ + size = msg->size + VHOST_USER_HDR_SIZE; + msg->flags &= ~VHOST_USER_VERSION_MASK; + msg->flags |= VHOST_USER_VERSION; + + return qemu_chr_fe_write_all(chr_be, (const uint8_t *)msg, size) + == size ? 0 : -1; +} + +static int vp_slave_get_features(VhostPCINet *vpnet, CharBackend *chr_be, + VhostUserMsg *msg) +{ + /* Offer the initial features, which have the protocol feature bit set */ + msg->payload.u64 = (uint64_t)vpnet->host_features | + (1 << VHOST_USER_F_PROTOCOL_FEATURES); + msg->size = sizeof(msg->payload.u64); + msg->flags |= VHOST_USER_REPLY_MASK; + + return vp_slave_write(chr_be, msg); +} + +static void vp_slave_set_features(VhostPCINet *vpnet, VhostUserMsg *msg) +{ + vpnet->host_features = msg->payload.u64 & + ~(1 << VHOST_USER_F_PROTOCOL_FEATURES); +} + +void vp_slave_event(void *opaque, int event) +{ + switch (event) { + case CHR_EVENT_OPENED: + break; + case CHR_EVENT_CLOSED: + break; + } +} + +static int vp_slave_get_protocol_features(CharBackend *chr_be, + VhostUserMsg *msg) +{ + msg->payload.u64 = VHOST_USER_PROTOCOL_FEATURES; + msg->size = sizeof(msg->payload.u64); + msg->flags |= VHOST_USER_REPLY_MASK; + + return vp_slave_write(chr_be, msg); +} + +static int vp_slave_get_queue_num(CharBackend *chr_be, VhostUserMsg *msg) +{ + msg->payload.u64 = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX; + msg->size = sizeof(msg->payload.u64); + msg->flags |= VHOST_USER_REPLY_MASK; + + return vp_slave_write(chr_be, msg); +} + +/* Set up the vhost-pci-net device bar to map the remote memory */ +static int vp_slave_set_mem_table(VhostPCINet *vpnet, VhostUserMsg *msg, + int *fds, int fd_num) +{ + VhostUserMemory *mem = &msg->payload.memory; + VhostUserMemoryRegion *region = mem->regions; + uint32_t i, nregions = mem->nregions; + uint64_t bar_map_offset = METADATA_SIZE; + void *remote_mem_ptr; + + /* Sanity Check */ + if (fd_num != nregions) { + error_report("%s: fd num doesn't match region num", __func__); + return -1; + } + + vpnet->metadata->nregions = nregions; + vpnet->remote_mem_region = g_malloc(nregions * sizeof(MemoryRegion)); + + for (i = 0; i < nregions; i++) { + vpnet->remote_mem_map_size[i] = region[i].memory_size + + region[i].mmap_offset; + /* + * Map the remote memory by QEMU. They will then be exposed to the + * guest via a vhost-pci device BAR. The mapped base addr and size + * are recorded and will be used when cleaning up the device. + */ + vpnet->remote_mem_base[i] = mmap(NULL, vpnet->remote_mem_map_size[i], + PROT_READ | PROT_WRITE, MAP_SHARED, + fds[i], 0); + if (vpnet->remote_mem_base[i] == MAP_FAILED) { + error_report("%s: map peer memory region %d failed", __func__, i); + return -1; + } + + remote_mem_ptr = vpnet->remote_mem_base[i] + region[i].mmap_offset; + /* + * The BAR MMIO is different from the traditional one, because the + * it is set up as a regular RAM. Guest will be able to directly + * access it without VMExits, just like accessing its RAM memory. + */ + memory_region_init_ram_ptr(&vpnet->remote_mem_region[i], NULL, + "RemoteMemory", region[i].memory_size, + remote_mem_ptr); + /* + * The remote memory regions, which are scattered in the remote VM's + * address space, are put continuous in the BAR. + */ + memory_region_add_subregion(&vpnet->bar_region, bar_map_offset, + &vpnet->remote_mem_region[i]); + bar_map_offset += region[i].memory_size; + + vpnet->metadata->mem[i].gpa = region[i].guest_phys_addr; + vpnet->metadata->mem[i].size = region[i].memory_size; + } + + return 0; +} + +static void vp_slave_set_vring_num(VhostPCINet *vpnet, VhostUserMsg *msg) +{ + struct vhost_vring_state *state = &msg->payload.state; + + vpnet->metadata->vq[state->index].vring_num = state->num; +} + +static void vp_slave_set_vring_base(VhostPCINet *vpnet, VhostUserMsg *msg) +{ + struct vhost_vring_state *state = &msg->payload.state; + + vpnet->metadata->vq[state->index].last_avail_idx = state->num; +} + +static int vp_slave_get_vring_base(CharBackend *chr_be, VhostUserMsg *msg) +{ + msg->flags |= VHOST_USER_REPLY_MASK; + msg->size = sizeof(m.payload.state); + /* Send back the last_avail_idx, which is 0 here */ + msg->payload.state.num = 0; + + return vp_slave_write(chr_be, msg); +} + +static void vp_slave_set_vring_addr(VhostPCINet *vpnet, VhostUserMsg *msg) +{ + uint32_t index = msg->payload.addr.index; + + vpnet->metadata->vq[index].desc_gpa = msg->payload.addr.desc_user_addr; + vpnet->metadata->vq[index].avail_gpa = msg->payload.addr.avail_user_addr; + vpnet->metadata->vq[index].used_gpa = msg->payload.addr.used_user_addr; + vpnet->metadata->nvqs = msg->payload.addr.index + 1; +} + +static void vp_slave_set_vring_enable(VhostPCINet *vpnet, VhostUserMsg *msg) +{ + struct vhost_vring_state *state = &msg->payload.state; + + vpnet->metadata->vq[state->index].vring_enabled = (int)state->num; +} + +int vp_slave_can_read(void *opaque) +{ + return VHOST_USER_HDR_SIZE; +} + +void vp_slave_read(void *opaque, const uint8_t *buf, int size) +{ + int ret, fd_num, fds[VHOST_MEMORY_MAX_NREGIONS]; + VhostUserMsg msg; + uint8_t *p = (uint8_t *) &msg; + VhostPCINet *vpnet = (VhostPCINet *)opaque; + CharBackend *chr_be = &vpnet->chr_be; + + if (size != VHOST_USER_HDR_SIZE) { + error_report("%s: wrong message size received %d", __func__, size); + return; + } + + memcpy(p, buf, VHOST_USER_HDR_SIZE); + + if (msg.size) { + p += VHOST_USER_HDR_SIZE; + size = qemu_chr_fe_read_all(chr_be, p, msg.size); + if (size != msg.size) { + error_report("%s: wrong message size received %d != %d", __func__, + size, msg.size); + return; + } + } + + if (msg.request > VHOST_USER_MAX) { + error_report("%s: read an incorrect msg %d", __func__, msg.request); + return; + } + + switch (msg.request) { + case VHOST_USER_GET_FEATURES: + ret = vp_slave_get_features(vpnet, chr_be, &msg); + if (ret < 0) { + goto err_handling; + } + break; + case VHOST_USER_SET_FEATURES: + vp_slave_set_features(vpnet, &msg); + break; + case VHOST_USER_GET_PROTOCOL_FEATURES: + ret = vp_slave_get_protocol_features(chr_be, &msg); + if (ret < 0) { + goto err_handling; + } + break; + case VHOST_USER_SET_PROTOCOL_FEATURES: + break; + case VHOST_USER_GET_QUEUE_NUM: + ret = vp_slave_get_queue_num(chr_be, &msg); + if (ret < 0) { + goto err_handling; + } + break; + case VHOST_USER_SET_OWNER: + break; + case VHOST_USER_SET_MEM_TABLE: + fd_num = qemu_chr_fe_get_msgfds(chr_be, fds, sizeof(fds) / sizeof(int)); + vp_slave_set_mem_table(vpnet, &msg, fds, fd_num); + break; + case VHOST_USER_SET_VRING_NUM: + vp_slave_set_vring_num(vpnet, &msg); + break; + case VHOST_USER_SET_VRING_BASE: + vp_slave_set_vring_base(vpnet, &msg); + break; + case VHOST_USER_GET_VRING_BASE: + ret = vp_slave_get_vring_base(chr_be, &msg); + if (ret < 0) { + goto err_handling; + } + break; + case VHOST_USER_SET_VRING_ADDR: + vp_slave_set_vring_addr(vpnet, &msg); + break; + case VHOST_USER_SET_VRING_KICK: + /* Consume the fd */ + qemu_chr_fe_get_msgfds(chr_be, fds, 1); + /* + * This is a non-blocking eventfd. + * The receive function forces it to be blocking, + * so revert it back to non-blocking. + */ + qemu_set_nonblock(fds[0]); + break; + case VHOST_USER_SET_VRING_CALL: + /* Consume the fd, and revert it to non-blocking. */ + qemu_chr_fe_get_msgfds(chr_be, fds, 1); + qemu_set_nonblock(fds[0]); + break; + case VHOST_USER_SET_VRING_ENABLE: + vp_slave_set_vring_enable(vpnet, &msg); + break; + case VHOST_USER_SET_LOG_BASE: + break; + case VHOST_USER_SET_LOG_FD: + qemu_chr_fe_get_msgfds(chr_be, fds, 1); + close(fds[0]); + break; + case VHOST_USER_SEND_RARP: + break; + default: + error_report("vhost-pci-slave does not support msg request = %d", + msg.request); + break; + } + return; + +err_handling: + error_report("%s: handle request %d failed", __func__, msg.request); +} diff --git a/include/hw/virtio/vhost-pci-net.h b/include/hw/virtio/vhost-pci-net.h index 0c6886d..6f4ab6a 100644 --- a/include/hw/virtio/vhost-pci-net.h +++ b/include/hw/virtio/vhost-pci-net.h @@ -27,7 +27,10 @@ typedef struct VhostPCINet { VirtIODevice parent_obj; MemoryRegion bar_region; MemoryRegion metadata_region; + MemoryRegion *remote_mem_region; struct vpnet_metadata *metadata; + void *remote_mem_base[MAX_REMOTE_REGION]; + uint64_t remote_mem_map_size[MAX_REMOTE_REGION]; uint32_t host_features; size_t config_size; uint16_t status; diff --git a/include/hw/virtio/vhost-pci-slave.h b/include/hw/virtio/vhost-pci-slave.h new file mode 100644 index 0000000..1be6a87 --- /dev/null +++ b/include/hw/virtio/vhost-pci-slave.h @@ -0,0 +1,12 @@ +#ifndef QEMU_VHOST_PCI_SLAVE_H +#define QEMU_VHOST_PCI_SLAVE_H + +#include "linux-headers/linux/vhost.h" + +extern int vp_slave_can_read(void *opaque); + +extern void vp_slave_read(void *opaque, const uint8_t *buf, int size); + +extern void vp_slave_event(void *opaque, int event); + +#endif -- 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation Wei Wang @ 2017-12-05 15:56 ` Stefan Hajnoczi 2017-12-14 17:30 ` Stefan Hajnoczi 2017-12-14 17:48 ` Stefan Hajnoczi 2 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-05 15:56 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 2862 bytes --] On Tue, Dec 05, 2017 at 11:33:13AM +0800, Wei Wang wrote: > +static int vp_slave_write(CharBackend *chr_be, VhostUserMsg *msg) > +{ > + int size; > + > + if (!msg) { > + return 0; > + } > + > + /* The payload size has been assigned, plus the header size here */ > + size = msg->size + VHOST_USER_HDR_SIZE; > + msg->flags &= ~VHOST_USER_VERSION_MASK; > + msg->flags |= VHOST_USER_VERSION; > + > + return qemu_chr_fe_write_all(chr_be, (const uint8_t *)msg, size) > + == size ? 0 : -1; > +} qemu_chr_fe_write_all() is a blocking operation. If the socket fd cannot accept more data then this thread will block! This is a reliability problem. If the vhost-user master process hangs then the vhost-pci vhost-user slave will also hang :(. Please implement vhost-pci so that it does not hang. A guest with multiple vhost-pci devices should work even if one or more of them cannot communicate with the vhost-pci master. This is necessary for preventing denial-of-service on a software-defined network switch, for example. > +static void vp_slave_set_vring_num(VhostPCINet *vpnet, VhostUserMsg *msg) > +{ > + struct vhost_vring_state *state = &msg->payload.state; > + > + vpnet->metadata->vq[state->index].vring_num = state->num; The vhost-pci code cannot trust vhost-user input. This function allows the vhost-user master to perform out-of-bounds memory stores by setting state->index outside the vq[] array. All input must be validated! The security model is: 1. Guest must be able to corrupt QEMU memory or execute arbitrary code. 2. vhost-user master guest must not be able to corrupt vhost-user slave guest memory or execute arbitrary code. 3. vhost-user master must not be able to corrupt vhost-user memory or execute arbitrary code in vhost-user slave. 4. vhost-user slave must not be able to corrupt vhost-user memory or execute arbitrary code in vhost-user master. The only thing that is allowed is vhost-user slave QEMU and guest may write to vhost-user master guest memory. > +void vp_slave_read(void *opaque, const uint8_t *buf, int size) > +{ > + int ret, fd_num, fds[VHOST_MEMORY_MAX_NREGIONS]; > + VhostUserMsg msg; > + uint8_t *p = (uint8_t *) &msg; > + VhostPCINet *vpnet = (VhostPCINet *)opaque; > + CharBackend *chr_be = &vpnet->chr_be; > + > + if (size != VHOST_USER_HDR_SIZE) { > + error_report("%s: wrong message size received %d", __func__, size); > + return; > + } > + > + memcpy(p, buf, VHOST_USER_HDR_SIZE); > + > + if (msg.size) { > + p += VHOST_USER_HDR_SIZE; > + size = qemu_chr_fe_read_all(chr_be, p, msg.size); This is a blocking operation. See my comment about qemu_chr_fe_write_all(). This is also a buffer overflow since msg.size is not validated. All input from the vhost-user master needs to be validated. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation Wei Wang 2017-12-05 15:56 ` [virtio-dev] " Stefan Hajnoczi @ 2017-12-14 17:30 ` Stefan Hajnoczi 2017-12-14 17:48 ` Stefan Hajnoczi 2 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-14 17:30 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 744 bytes --] On Tue, Dec 05, 2017 at 11:33:13AM +0800, Wei Wang wrote: > +static void vp_slave_set_vring_addr(VhostPCINet *vpnet, VhostUserMsg *msg) > +{ > + uint32_t index = msg->payload.addr.index; > + > + vpnet->metadata->vq[index].desc_gpa = msg->payload.addr.desc_user_addr; > + vpnet->metadata->vq[index].avail_gpa = msg->payload.addr.avail_user_addr; > + vpnet->metadata->vq[index].used_gpa = msg->payload.addr.used_user_addr; Do user addresses need to be converted to guest physical addresses via a mem table lookup first? > + vpnet->metadata->nvqs = msg->payload.addr.index + 1; In case the vhost-user master sends messages in an unexpected order: vpnet->metadata->nvqs = MAX(vpnet->metadata->nvqs, msg->payload.addr.index + 1); [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation Wei Wang 2017-12-05 15:56 ` [virtio-dev] " Stefan Hajnoczi 2017-12-14 17:30 ` Stefan Hajnoczi @ 2017-12-14 17:48 ` Stefan Hajnoczi 2 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-14 17:48 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 1427 bytes --] On Tue, Dec 05, 2017 at 11:33:13AM +0800, Wei Wang wrote: > The vhost-pci slave implementation is added to support the creation of > the vhost-pci-net device. It follows the vhost-user protocol to get the > master VM's info (e.g. memory regions, vring address). How does the guest know when the QEMU vhost-user slave has finished initializing everything? It seems like a guest driver could access the device before things are initialized. How will reconnection work? > +static int vp_slave_get_features(VhostPCINet *vpnet, CharBackend *chr_be, > + VhostUserMsg *msg) > +{ > + /* Offer the initial features, which have the protocol feature bit set */ > + msg->payload.u64 = (uint64_t)vpnet->host_features | > + (1 << VHOST_USER_F_PROTOCOL_FEATURES); How can the vhost-user slave inside the guest participate in feature negotiation? It must be able to participate, otherwise slaves cannot disable features that QEMU supports but they don't want to support. It's not feasible to pass in host_features as a QEMU parameter because that would require libvirt, OpenStack, cloud providers, etc to add support so users can manually set the bits for their slave implementation. > +static int vp_slave_get_queue_num(CharBackend *chr_be, VhostUserMsg *msg) > +{ > + msg->payload.u64 = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX; The guest cannot limit the number of virtqueues? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (3 preceding siblings ...) 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation Wei Wang @ 2017-12-05 3:33 ` Wei Wang 2017-12-05 16:00 ` [virtio-dev] " Stefan Hajnoczi 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 6/7] vhost-pci-slave: handle VHOST_USER_SET_VHOST_PCI Wei Wang ` (7 subsequent siblings) 12 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw) To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang Add a new vhost-uer protocol msg, VHOST_USER_SET_VHOST_PCI. This msg is used to signal the vhost-pci device to start/stop working. Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- hw/net/vhost_net.c | 37 +++++++++++++++++++++++++++++++++++++ hw/virtio/vhost-pci-slave.c | 2 +- hw/virtio/vhost-user.c | 17 +++++++++++++++++ hw/virtio/vhost.c | 7 +++++++ include/hw/virtio/vhost-backend.h | 2 ++ include/hw/virtio/vhost-user.h | 2 ++ include/hw/virtio/vhost.h | 2 ++ include/net/vhost_net.h | 2 ++ 8 files changed, 70 insertions(+), 1 deletion(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index e037db6..107aa13 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -296,6 +296,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); VirtioBusState *vbus = VIRTIO_BUS(qbus); VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); + struct vhost_net *last_net; int r, e, i; if (!k->set_guest_notifiers) { @@ -341,6 +342,18 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, } } + last_net = get_vhost_net(ncs[total_queues - 1].peer); + if (vhost_pci_enabled(&last_net->dev)) { + /* + * All the msgs have been sync-ed to the vhost-pci device. This is the + * last one to signal the vhost-pci device to link up. + */ + r = vhost_set_vhost_pci(ncs[total_queues - 1].peer, true); + if (r < 0) { + goto err_start; + } + } + return 0; err_start: @@ -362,8 +375,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs, BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); VirtioBusState *vbus = VIRTIO_BUS(qbus); VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); + struct vhost_net *last_net; int i, r; + last_net = get_vhost_net(ncs[total_queues - 1].peer); + if (vhost_pci_enabled(&last_net->dev)) { + /* Signal the vhost-pci device to stop. */ + vhost_set_vhost_pci(ncs[total_queues - 1].peer, false); + } + for (i = 0; i < total_queues; i++) { vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev); } @@ -450,6 +470,18 @@ int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu) return vhost_ops->vhost_net_set_mtu(&net->dev, mtu); } +int vhost_set_vhost_pci(NetClientState *nc, bool up) +{ + VHostNetState *net = get_vhost_net(nc); + const VhostOps *vhost_ops = net->dev.vhost_ops; + + if (vhost_ops && vhost_ops->vhost_set_vhost_pci) { + return vhost_ops->vhost_set_vhost_pci(&net->dev, up); + } + + return 0; +} + #else uint64_t vhost_net_get_max_queues(VHostNetState *net) { @@ -521,4 +553,9 @@ int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu) { return 0; } + +int vhost_set_vhost_pci(NetClientState *nc, bool up) +{ + return 0; +} #endif diff --git a/hw/virtio/vhost-pci-slave.c b/hw/virtio/vhost-pci-slave.c index b1a8620..8052884 100644 --- a/hw/virtio/vhost-pci-slave.c +++ b/hw/virtio/vhost-pci-slave.c @@ -23,7 +23,7 @@ #include "hw/virtio/vhost-user.h" #include "hw/virtio/vhost-pci-net.h" -#define VHOST_USER_PROTOCOL_FEATURES 0 +#define VHOST_USER_PROTOCOL_FEATURES (1UL << VHOST_USER_PROTOCOL_F_VHOST_PCI) static int vp_slave_write(CharBackend *chr_be, VhostUserMsg *msg) { diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index e512f5a..a48003a 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -342,6 +342,22 @@ static int vhost_user_set_vring_enable(struct vhost_dev *dev, int enable) return 0; } +static int vhost_user_set_vhost_pci(struct vhost_dev *dev, bool up) +{ + VhostUserMsg msg = { + .request = VHOST_USER_SET_VHOST_PCI, + .flags = VHOST_USER_VERSION, + .payload.u64 = (uint64_t)up, + .size = sizeof(msg.payload.u64), + }; + + if (vhost_user_write(dev, &msg, NULL, 0) < 0) { + return -1; + } + + return 0; +} + static int vhost_user_get_vring_base(struct vhost_dev *dev, struct vhost_vring_state *ring) { @@ -844,6 +860,7 @@ const VhostOps user_ops = { .vhost_reset_device = vhost_user_reset_device, .vhost_get_vq_index = vhost_user_get_vq_index, .vhost_set_vring_enable = vhost_user_set_vring_enable, + .vhost_set_vhost_pci = vhost_user_set_vhost_pci, .vhost_requires_shm_log = vhost_user_requires_shm_log, .vhost_migration_done = vhost_user_migration_done, .vhost_backend_can_merge = vhost_user_can_merge, diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index e4290ce..dda7b8f 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -25,6 +25,7 @@ #include "exec/address-spaces.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" +#include "hw/virtio/vhost-user.h" #include "migration/blocker.h" #include "sysemu/dma.h" @@ -1007,6 +1008,12 @@ out: return ret; } +bool vhost_pci_enabled(struct vhost_dev *dev) +{ + return ((dev->protocol_features & + (1ULL << VHOST_USER_PROTOCOL_F_VHOST_PCI)) != 0); +} + static int vhost_virtqueue_start(struct vhost_dev *dev, struct VirtIODevice *vdev, struct vhost_virtqueue *vq, diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index a7a5f22..e9d23dd 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -71,6 +71,7 @@ typedef int (*vhost_reset_device_op)(struct vhost_dev *dev); typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx); typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev, int enable); +typedef int (*vhost_set_vhost_pci_op)(struct vhost_dev *dev, bool up); typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev); typedef int (*vhost_migration_done_op)(struct vhost_dev *dev, char *mac_addr); @@ -111,6 +112,7 @@ typedef struct VhostOps { vhost_reset_device_op vhost_reset_device; vhost_get_vq_index_op vhost_get_vq_index; vhost_set_vring_enable_op vhost_set_vring_enable; + vhost_set_vhost_pci_op vhost_set_vhost_pci; vhost_requires_shm_log_op vhost_requires_shm_log; vhost_migration_done_op vhost_migration_done; vhost_backend_can_merge_op vhost_backend_can_merge; diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h index d76e9ad..d5bcee7 100644 --- a/include/hw/virtio/vhost-user.h +++ b/include/hw/virtio/vhost-user.h @@ -14,6 +14,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_NET_MTU = 4, VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5, VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6, + VHOST_USER_PROTOCOL_F_VHOST_PCI = 7, VHOST_USER_PROTOCOL_F_MAX }; @@ -45,6 +46,7 @@ typedef enum VhostUserRequest { VHOST_USER_SET_SLAVE_REQ_FD = 21, VHOST_USER_IOTLB_MSG = 22, VHOST_USER_SET_VRING_ENDIAN = 23, + VHOST_USER_SET_VHOST_PCI = 24, VHOST_USER_MAX } VhostUserRequest; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 467dc77..f173e81 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -106,4 +106,6 @@ int vhost_net_set_backend(struct vhost_dev *hdev, struct vhost_vring_file *file); int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write); + +bool vhost_pci_enabled(struct vhost_dev *dev); #endif diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h index afc1499..257e8cc 100644 --- a/include/net/vhost_net.h +++ b/include/net/vhost_net.h @@ -37,4 +37,6 @@ uint64_t vhost_net_get_acked_features(VHostNetState *net); int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu); +int vhost_set_vhost_pci(NetClientState *nc, bool up); + #endif -- 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg Wei Wang @ 2017-12-05 16:00 ` Stefan Hajnoczi 2017-12-06 10:32 ` Wei Wang 0 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-05 16:00 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 889 bytes --] On Tue, Dec 05, 2017 at 11:33:14AM +0800, Wei Wang wrote: > Add a new vhost-uer protocol msg, VHOST_USER_SET_VHOST_PCI. This msg is > used to signal the vhost-pci device to start/stop working. > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > --- > hw/net/vhost_net.c | 37 +++++++++++++++++++++++++++++++++++++ > hw/virtio/vhost-pci-slave.c | 2 +- > hw/virtio/vhost-user.c | 17 +++++++++++++++++ > hw/virtio/vhost.c | 7 +++++++ > include/hw/virtio/vhost-backend.h | 2 ++ > include/hw/virtio/vhost-user.h | 2 ++ > include/hw/virtio/vhost.h | 2 ++ > include/net/vhost_net.h | 2 ++ > 8 files changed, 70 insertions(+), 1 deletion(-) New protocol messages must be documented in docs/interop/vhost-user.txt. Why is a new message needed? I'm not sure why it is specific to vhost-pci. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg 2017-12-05 16:00 ` [virtio-dev] " Stefan Hajnoczi @ 2017-12-06 10:32 ` Wei Wang 2017-12-15 12:40 ` Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-06 10:32 UTC (permalink / raw) To: Stefan Hajnoczi Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang On 12/06/2017 12:00 AM, Stefan Hajnoczi wrote: > On Tue, Dec 05, 2017 at 11:33:14AM +0800, Wei Wang wrote: >> Add a new vhost-uer protocol msg, VHOST_USER_SET_VHOST_PCI. This msg is >> used to signal the vhost-pci device to start/stop working. >> >> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >> --- >> hw/net/vhost_net.c | 37 +++++++++++++++++++++++++++++++++++++ >> hw/virtio/vhost-pci-slave.c | 2 +- >> hw/virtio/vhost-user.c | 17 +++++++++++++++++ >> hw/virtio/vhost.c | 7 +++++++ >> include/hw/virtio/vhost-backend.h | 2 ++ >> include/hw/virtio/vhost-user.h | 2 ++ >> include/hw/virtio/vhost.h | 2 ++ >> include/net/vhost_net.h | 2 ++ >> 8 files changed, 70 insertions(+), 1 deletion(-) > New protocol messages must be documented in docs/interop/vhost-user.txt. OK, I'll add it to the doc after the discussion. > > Why is a new message needed? I'm not sure why it is specific to > vhost-pci. Yes, it might be useful for other vhost-user slave implementations. Probably we can name it "VHOST_USER_SET_SLAVE"? The message is used to "link up" or "link down" the slave device. For example, when virtio-net leaves, it sends a "VHOST_USER_SET_SLAVE" msg to the salve to link down the slave device. (a similar msg is VHOST_USER_SET_VRING_ENABLE, but that is for virtqueue enable/disable, not for a device level enable/disable) Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg 2017-12-06 10:32 ` Wei Wang @ 2017-12-15 12:40 ` Stefan Hajnoczi 0 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-15 12:40 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 1740 bytes --] On Wed, Dec 06, 2017 at 06:32:50PM +0800, Wei Wang wrote: > On 12/06/2017 12:00 AM, Stefan Hajnoczi wrote: > > On Tue, Dec 05, 2017 at 11:33:14AM +0800, Wei Wang wrote: > > > Add a new vhost-uer protocol msg, VHOST_USER_SET_VHOST_PCI. This msg is > > > used to signal the vhost-pci device to start/stop working. > > > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > > > --- > > > hw/net/vhost_net.c | 37 +++++++++++++++++++++++++++++++++++++ > > > hw/virtio/vhost-pci-slave.c | 2 +- > > > hw/virtio/vhost-user.c | 17 +++++++++++++++++ > > > hw/virtio/vhost.c | 7 +++++++ > > > include/hw/virtio/vhost-backend.h | 2 ++ > > > include/hw/virtio/vhost-user.h | 2 ++ > > > include/hw/virtio/vhost.h | 2 ++ > > > include/net/vhost_net.h | 2 ++ > > > 8 files changed, 70 insertions(+), 1 deletion(-) > > New protocol messages must be documented in docs/interop/vhost-user.txt. > > OK, I'll add it to the doc after the discussion. > > > > > Why is a new message needed? I'm not sure why it is specific to > > vhost-pci. > > Yes, it might be useful for other vhost-user slave implementations. Probably > we can name it "VHOST_USER_SET_SLAVE"? > The message is used to "link up" or "link down" the slave device. For > example, when virtio-net leaves, it sends a "VHOST_USER_SET_SLAVE" msg to > the salve to link down the slave device. > (a similar msg is VHOST_USER_SET_VRING_ENABLE, but that is for virtqueue > enable/disable, not for a device level enable/disable) Why is VHOST_USER_SET_VHOST_PCI necessary when DPDK and other vhost-user net device slaves already exist today and didn't need it? Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] [PATCH v3 6/7] vhost-pci-slave: handle VHOST_USER_SET_VHOST_PCI 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (4 preceding siblings ...) 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg Wei Wang @ 2017-12-05 3:33 ` Wei Wang 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa Wei Wang ` (6 subsequent siblings) 12 siblings, 0 replies; 77+ messages in thread From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw) To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang This patch implements the slave part handling of the VHOST_USER_SET_VHOST_PCI msg. Receiving a "true" from the master will set the LINK_UP status of the vhost-pci device config status, and a config interrupt will be injected to the guest to notify that the device is ready to use. The driver is expected to start reading the metadata for remote memory and vring setup after this LINK_UP interrupt is received. Receiving a "false" from the master will clear the LINK_UP status and inject a config interrupt to the guest to notify the driver to stop sending and receiving packets. Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- hw/net/vhost_pci_net.c | 13 +++++++++++++ hw/virtio/vhost-pci-slave.c | 3 +++ include/hw/virtio/vhost-pci-net.h | 2 ++ include/standard-headers/linux/vhost_pci_net.h | 2 ++ 4 files changed, 20 insertions(+) diff --git a/hw/net/vhost_pci_net.c b/hw/net/vhost_pci_net.c index 11184c3..39eb28d 100644 --- a/hw/net/vhost_pci_net.c +++ b/hw/net/vhost_pci_net.c @@ -32,6 +32,19 @@ static uint64_t vpnet_get_features(VirtIODevice *vdev, uint64_t features, return features; } +void vpnet_set_link_up(VhostPCINet *vpnet, bool up) +{ + VirtIODevice *vdev = VIRTIO_DEVICE(vpnet); + + if (up) { + vpnet->status |= VPNET_S_LINK_UP; + } else { + vpnet->status &= ~VPNET_S_LINK_UP; + } + + virtio_notify_config(vdev); +} + static void vpnet_get_config(VirtIODevice *vdev, uint8_t *config) { VhostPCINet *vpnet = VHOST_PCI_NET(vdev); diff --git a/hw/virtio/vhost-pci-slave.c b/hw/virtio/vhost-pci-slave.c index 8052884..6554efb 100644 --- a/hw/virtio/vhost-pci-slave.c +++ b/hw/virtio/vhost-pci-slave.c @@ -287,6 +287,9 @@ void vp_slave_read(void *opaque, const uint8_t *buf, int size) case VHOST_USER_SET_VRING_ENABLE: vp_slave_set_vring_enable(vpnet, &msg); break; + case VHOST_USER_SET_VHOST_PCI: + vpnet_set_link_up(vpnet, (bool)msg.payload.u64); + break; case VHOST_USER_SET_LOG_BASE: break; case VHOST_USER_SET_LOG_FD: diff --git a/include/hw/virtio/vhost-pci-net.h b/include/hw/virtio/vhost-pci-net.h index 6f4ab6a..2d0e94c 100644 --- a/include/hw/virtio/vhost-pci-net.h +++ b/include/hw/virtio/vhost-pci-net.h @@ -37,4 +37,6 @@ typedef struct VhostPCINet { CharBackend chr_be; } VhostPCINet; +void vpnet_set_link_up(VhostPCINet *vpnet, bool up); + #endif diff --git a/include/standard-headers/linux/vhost_pci_net.h b/include/standard-headers/linux/vhost_pci_net.h index 792261e..ab91989 100644 --- a/include/standard-headers/linux/vhost_pci_net.h +++ b/include/standard-headers/linux/vhost_pci_net.h @@ -35,6 +35,8 @@ #define MAX_REMOTE_REGION 8 +/* Set by the device to indicate that the device (e.g. metadata) is ready */ +#define VPNET_S_LINK_UP 1 struct vpnet_config { uint16_t status; }; -- 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 77+ messages in thread
* [virtio-dev] [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (5 preceding siblings ...) 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 6/7] vhost-pci-slave: handle VHOST_USER_SET_VHOST_PCI Wei Wang @ 2017-12-05 3:33 ` Wei Wang 2017-12-05 16:05 ` [virtio-dev] " Stefan Hajnoczi 2017-12-05 7:01 ` [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Jason Wang ` (5 subsequent siblings) 12 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-05 3:33 UTC (permalink / raw) To: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang, Wei Wang The vhost-pci driver uses the remote guest physical address to send/receive packets from the remote guest, so when sending the ving info to the vhost-pci device, send the guest physical adress directly. Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- hw/virtio/vhost.c | 56 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 36 insertions(+), 20 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index dda7b8f..ffbda6c 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1057,26 +1057,38 @@ static int vhost_virtqueue_start(struct vhost_dev *dev, } } - vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx); vq->desc_phys = a = virtio_queue_get_desc_addr(vdev, idx); - vq->desc = vhost_memory_map(dev, a, &l, 0); - if (!vq->desc || l != s) { - r = -ENOMEM; - goto fail_alloc_desc; + if (vhost_pci_enabled(dev)) { + vq->desc = (void *)a; + } else { + vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx); + vq->desc = vhost_memory_map(dev, a, &l, 0); + if (!vq->desc || l != s) { + r = -ENOMEM; + goto fail_alloc_desc; + } } vq->avail_size = s = l = virtio_queue_get_avail_size(vdev, idx); vq->avail_phys = a = virtio_queue_get_avail_addr(vdev, idx); - vq->avail = vhost_memory_map(dev, a, &l, 0); - if (!vq->avail || l != s) { - r = -ENOMEM; - goto fail_alloc_avail; + if (vhost_pci_enabled(dev)) { + vq->avail = (void *)a; + } else { + vq->avail = vhost_memory_map(dev, a, &l, 0); + if (!vq->avail || l != s) { + r = -ENOMEM; + goto fail_alloc_avail; + } } vq->used_size = s = l = virtio_queue_get_used_size(vdev, idx); vq->used_phys = a = virtio_queue_get_used_addr(vdev, idx); - vq->used = vhost_memory_map(dev, a, &l, 1); - if (!vq->used || l != s) { - r = -ENOMEM; - goto fail_alloc_used; + if (vhost_pci_enabled(dev)) { + vq->used = (void *)a; + } else { + vq->used = vhost_memory_map(dev, a, &l, 1); + if (!vq->used || l != s) { + r = -ENOMEM; + goto fail_alloc_used; + } } r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled); @@ -1163,13 +1175,17 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, !virtio_is_big_endian(vdev), vhost_vq_index); } - - vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx), - 1, virtio_queue_get_used_size(vdev, idx)); - vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx), - 0, virtio_queue_get_avail_size(vdev, idx)); - vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx), - 0, virtio_queue_get_desc_size(vdev, idx)); + if (!vhost_pci_enabled(dev)) { + vhost_memory_unmap(dev, vq->used, + virtio_queue_get_used_size(vdev, idx), + 1, virtio_queue_get_used_size(vdev, idx)); + vhost_memory_unmap(dev, vq->avail, + virtio_queue_get_avail_size(vdev, idx), + 0, virtio_queue_get_avail_size(vdev, idx)); + vhost_memory_unmap(dev, vq->desc, + virtio_queue_get_desc_size(vdev, idx), + 0, virtio_queue_get_desc_size(vdev, idx)); + } } static void vhost_eventfd_add(MemoryListener *listener, -- 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa Wei Wang @ 2017-12-05 16:05 ` Stefan Hajnoczi 2017-12-06 10:46 ` Wei Wang 0 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-05 16:05 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 1262 bytes --] On Tue, Dec 05, 2017 at 11:33:16AM +0800, Wei Wang wrote: > The vhost-pci driver uses the remote guest physical address to send/receive > packets from the remote guest, so when sending the ving info to the vhost-pci > device, send the guest physical adress directly. > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > --- > hw/virtio/vhost.c | 56 +++++++++++++++++++++++++++++++++++-------------------- > 1 file changed, 36 insertions(+), 20 deletions(-) Can you do it inside vhost_memory_map()/vhost_memory_unmap() instead of modifying callers? Looks like vhost_dev_has_iommu() already takes this approach: static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr, hwaddr *plen, int is_write) { if (!vhost_dev_has_iommu(dev)) { return cpu_physical_memory_map(addr, plen, is_write); } else { return (void *)(uintptr_t)addr; } } static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer, hwaddr len, int is_write, hwaddr access_len) { if (!vhost_dev_has_iommu(dev)) { cpu_physical_memory_unmap(buffer, len, is_write, access_len); } } [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa 2017-12-05 16:05 ` [virtio-dev] " Stefan Hajnoczi @ 2017-12-06 10:46 ` Wei Wang 0 siblings, 0 replies; 77+ messages in thread From: Wei Wang @ 2017-12-06 10:46 UTC (permalink / raw) To: Stefan Hajnoczi Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang On 12/06/2017 12:05 AM, Stefan Hajnoczi wrote: > On Tue, Dec 05, 2017 at 11:33:16AM +0800, Wei Wang wrote: >> The vhost-pci driver uses the remote guest physical address to send/receive >> packets from the remote guest, so when sending the ving info to the vhost-pci >> device, send the guest physical adress directly. >> >> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >> --- >> hw/virtio/vhost.c | 56 +++++++++++++++++++++++++++++++++++-------------------- >> 1 file changed, 36 insertions(+), 20 deletions(-) > Can you do it inside vhost_memory_map()/vhost_memory_unmap() instead of > modifying callers? > > Looks like vhost_dev_has_iommu() already takes this approach: > > static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr, > hwaddr *plen, int is_write) > { > if (!vhost_dev_has_iommu(dev)) { > return cpu_physical_memory_map(addr, plen, is_write); > } else { > return (void *)(uintptr_t)addr; > } > } > > static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer, > hwaddr len, int is_write, > hwaddr access_len) > { > if (!vhost_dev_has_iommu(dev)) { > cpu_physical_memory_unmap(buffer, len, is_write, access_len); > } > } Thanks for the reminder. I think this patch may be not needed with adding the F_IOMMU_PLATFORM feature. Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (6 preceding siblings ...) 2017-12-05 3:33 ` [virtio-dev] [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa Wei Wang @ 2017-12-05 7:01 ` Jason Wang 2017-12-05 7:15 ` Wei Wang 2017-12-05 14:30 ` Stefan Hajnoczi ` (4 subsequent siblings) 12 siblings, 1 reply; 77+ messages in thread From: Jason Wang @ 2017-12-05 7:01 UTC (permalink / raw) To: Wei Wang, virtio-dev, qemu-devel, mst, marcandre.lureau, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang On 2017年12月05日 11:33, Wei Wang wrote: > Vhost-pci is a point-to-point based inter-VM communication solution. This > patch series implements the vhost-pci-net device setup and emulation. The > device is implemented as a virtio device, and it is set up via the > vhost-user protocol to get the neessary info (e.g the memory info of the > remote VM, vring info). > > Currently, only the fundamental functions are implemented. More features, > such as MQ and live migration, will be updated in the future. > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > http://dpdk.org/ml/archives/dev/2017-November/082615.html > > v2->v3 changes: > 1) static device creation: instead of creating and hot-plugging the > device when receiving a vhost-user msg, the device is not created > via the qemu booting command line. > 2) remove vqs: rq and ctrlq are removed in this version. > - receive vq: the receive vq is not needed anymore. The PMD directly > shares the remote txq and rxq - grab from remote txq to > receive packets, and put to rxq to send packets. > - ctrlq: the ctrlq is replaced by the first 4KB metadata area of the > device Bar-2. > 3) simpler implementation: the entire implementation has been tailored > from ~1800 LOC to ~850 LOC. Hi: Any performance numbers you can share? Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 7:01 ` [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Jason Wang @ 2017-12-05 7:15 ` Wei Wang 2017-12-05 7:19 ` Jason Wang 0 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-05 7:15 UTC (permalink / raw) To: Jason Wang, virtio-dev, qemu-devel, mst, marcandre.lureau, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang On 12/05/2017 03:01 PM, Jason Wang wrote: > > > On 2017年12月05日 11:33, Wei Wang wrote: >> Vhost-pci is a point-to-point based inter-VM communication solution. >> This >> patch series implements the vhost-pci-net device setup and emulation. >> The >> device is implemented as a virtio device, and it is set up via the >> vhost-user protocol to get the neessary info (e.g the memory info of the >> remote VM, vring info). >> >> Currently, only the fundamental functions are implemented. More >> features, >> such as MQ and live migration, will be updated in the future. >> >> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: >> http://dpdk.org/ml/archives/dev/2017-November/082615.html >> >> v2->v3 changes: >> 1) static device creation: instead of creating and hot-plugging the >> device when receiving a vhost-user msg, the device is not created >> via the qemu booting command line. >> 2) remove vqs: rq and ctrlq are removed in this version. >> - receive vq: the receive vq is not needed anymore. The PMD >> directly >> shares the remote txq and rxq - grab from remote >> txq to >> receive packets, and put to rxq to send packets. >> - ctrlq: the ctrlq is replaced by the first 4KB metadata area of >> the >> device Bar-2. >> 3) simpler implementation: the entire implementation has been tailored >> from ~1800 LOC to ~850 LOC. > > Hi: > > Any performance numbers you can share? > Hi Jason, Performance testing and tuning on the data plane is in progress (btw, that wouldn't affect the device part patches). If possible, could we start the device part patch review in the meantime? Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 7:15 ` Wei Wang @ 2017-12-05 7:19 ` Jason Wang 2017-12-05 8:49 ` Avi Cohen (A) 0 siblings, 1 reply; 77+ messages in thread From: Jason Wang @ 2017-12-05 7:19 UTC (permalink / raw) To: Wei Wang, virtio-dev, qemu-devel, mst, marcandre.lureau, stefanha, pbonzini Cc: jan.kiszka, avi.cohen, zhiyong.yang On 2017年12月05日 15:15, Wei Wang wrote: > On 12/05/2017 03:01 PM, Jason Wang wrote: >> >> >> On 2017年12月05日 11:33, Wei Wang wrote: >>> Vhost-pci is a point-to-point based inter-VM communication solution. >>> This >>> patch series implements the vhost-pci-net device setup and >>> emulation. The >>> device is implemented as a virtio device, and it is set up via the >>> vhost-user protocol to get the neessary info (e.g the memory info of >>> the >>> remote VM, vring info). >>> >>> Currently, only the fundamental functions are implemented. More >>> features, >>> such as MQ and live migration, will be updated in the future. >>> >>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: >>> http://dpdk.org/ml/archives/dev/2017-November/082615.html >>> >>> v2->v3 changes: >>> 1) static device creation: instead of creating and hot-plugging the >>> device when receiving a vhost-user msg, the device is not created >>> via the qemu booting command line. >>> 2) remove vqs: rq and ctrlq are removed in this version. >>> - receive vq: the receive vq is not needed anymore. The PMD >>> directly >>> shares the remote txq and rxq - grab from remote >>> txq to >>> receive packets, and put to rxq to send packets. >>> - ctrlq: the ctrlq is replaced by the first 4KB metadata area >>> of the >>> device Bar-2. >>> 3) simpler implementation: the entire implementation has been tailored >>> from ~1800 LOC to ~850 LOC. >> >> Hi: >> >> Any performance numbers you can share? >> > > Hi Jason, > > Performance testing and tuning on the data plane is in progress (btw, > that wouldn't affect the device part patches). > If possible, could we start the device part patch review in the meantime? > > Best, > Wei > Hi Wei: Will do, but basically, the cover lacks of the motivation for vhost-pci and I want to see some numbers first since I suspect it can over-perform exist data-path. Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* RE: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 7:19 ` Jason Wang @ 2017-12-05 8:49 ` Avi Cohen (A) 2017-12-05 10:36 ` Wei Wang 0 siblings, 1 reply; 77+ messages in thread From: Avi Cohen (A) @ 2017-12-05 8:49 UTC (permalink / raw) To: Jason Wang, Wei Wang, virtio-dev@lists.oasis-open.org, qemu-devel@nongnu.org, mst@redhat.com, marcandre.lureau@redhat.com, stefanha@redhat.com, pbonzini@redhat.com Cc: jan.kiszka@siemens.com, zhiyong.yang@intel.com > -----Original Message----- > From: Jason Wang [mailto:jasowang@redhat.com] > Sent: Tuesday, 05 December, 2017 9:19 AM > To: Wei Wang; virtio-dev@lists.oasis-open.org; qemu-devel@nongnu.org; > mst@redhat.com; marcandre.lureau@redhat.com; stefanha@redhat.com; > pbonzini@redhat.com > Cc: jan.kiszka@siemens.com; Avi Cohen (A); zhiyong.yang@intel.com > Subject: Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication > > > > On 2017年12月05日 15:15, Wei Wang wrote: > > On 12/05/2017 03:01 PM, Jason Wang wrote: > >> > >> > >> On 2017年12月05日 11:33, Wei Wang wrote: > >>> Vhost-pci is a point-to-point based inter-VM communication solution. > >>> This > >>> patch series implements the vhost-pci-net device setup and > >>> emulation. The device is implemented as a virtio device, and it is > >>> set up via the vhost-user protocol to get the neessary info (e.g the > >>> memory info of the remote VM, vring info). > >>> > >>> Currently, only the fundamental functions are implemented. More > >>> features, such as MQ and live migration, will be updated in the > >>> future. > >>> > >>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > >>> http://dpdk.org/ml/archives/dev/2017-November/082615.html > >>> > >>> v2->v3 changes: > >>> 1) static device creation: instead of creating and hot-plugging the > >>> device when receiving a vhost-user msg, the device is not > >>> created > >>> via the qemu booting command line. > >>> 2) remove vqs: rq and ctrlq are removed in this version. > >>> - receive vq: the receive vq is not needed anymore. The PMD > >>> directly > >>> shares the remote txq and rxq - grab from remote > >>> txq to > >>> receive packets, and put to rxq to send packets. > >>> - ctrlq: the ctrlq is replaced by the first 4KB metadata area > >>> of the > >>> device Bar-2. > >>> 3) simpler implementation: the entire implementation has been > >>> tailored > >>> from ~1800 LOC to ~850 LOC. > >> > >> Hi: > >> > >> Any performance numbers you can share? > >> > > > > Hi Jason, > > > > Performance testing and tuning on the data plane is in progress (btw, > > that wouldn't affect the device part patches). > > If possible, could we start the device part patch review in the meantime? > > > > Best, > > Wei > > > > Hi Wei: > > Will do, but basically, the cover lacks of the motivation for vhost-pci and I want > to see some numbers first since I suspect it can over-perform exist data-path. > > Thanks [Avi Cohen (A)] Hi Wei I can try testing to get **numbers** - I can do it now , need a little help from you I've started with downloading/building and installing of > driver: https://github.com/wei-w-wang/vhost-pci-driver to the kernel guest, **without** downloading the 2nd patch > device: https://github.com/wei-w-wang/vhost-pci-device But my guest kernel was corrupted after reboot (kernel panic/out of mem ..) - can you tell me the steps to apply these patches ? Best Regards Avi ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 8:49 ` Avi Cohen (A) @ 2017-12-05 10:36 ` Wei Wang 0 siblings, 0 replies; 77+ messages in thread From: Wei Wang @ 2017-12-05 10:36 UTC (permalink / raw) To: Avi Cohen (A), Jason Wang, virtio-dev@lists.oasis-open.org, qemu-devel@nongnu.org, mst@redhat.com, marcandre.lureau@redhat.com, stefanha@redhat.com, pbonzini@redhat.com Cc: jan.kiszka@siemens.com, zhiyong.yang@intel.com On 12/05/2017 04:49 PM, Avi Cohen (A) wrote: > >> -----Original Message----- >> From: Jason Wang [mailto:jasowang@redhat.com] >> Sent: Tuesday, 05 December, 2017 9:19 AM >> To: Wei Wang; virtio-dev@lists.oasis-open.org; qemu-devel@nongnu.org; >> mst@redhat.com; marcandre.lureau@redhat.com; stefanha@redhat.com; >> pbonzini@redhat.com >> Cc: jan.kiszka@siemens.com; Avi Cohen (A); zhiyong.yang@intel.com >> Subject: Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication >> >> >> >> On 2017年12月05日 15:15, Wei Wang wrote: >>> On 12/05/2017 03:01 PM, Jason Wang wrote: >>>> >>>> On 2017年12月05日 11:33, Wei Wang wrote: >>>>> Vhost-pci is a point-to-point based inter-VM communication solution. >>>>> This >>>>> patch series implements the vhost-pci-net device setup and >>>>> emulation. The device is implemented as a virtio device, and it is >>>>> set up via the vhost-user protocol to get the neessary info (e.g the >>>>> memory info of the remote VM, vring info). >>>>> >>>>> Currently, only the fundamental functions are implemented. More >>>>> features, such as MQ and live migration, will be updated in the >>>>> future. >>>>> >>>>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: >>>>> http://dpdk.org/ml/archives/dev/2017-November/082615.html >>>>> >>>>> v2->v3 changes: >>>>> 1) static device creation: instead of creating and hot-plugging the >>>>> device when receiving a vhost-user msg, the device is not >>>>> created >>>>> via the qemu booting command line. >>>>> 2) remove vqs: rq and ctrlq are removed in this version. >>>>> - receive vq: the receive vq is not needed anymore. The PMD >>>>> directly >>>>> shares the remote txq and rxq - grab from remote >>>>> txq to >>>>> receive packets, and put to rxq to send packets. >>>>> - ctrlq: the ctrlq is replaced by the first 4KB metadata area >>>>> of the >>>>> device Bar-2. >>>>> 3) simpler implementation: the entire implementation has been >>>>> tailored >>>>> from ~1800 LOC to ~850 LOC. >>>> Hi: >>>> >>>> Any performance numbers you can share? >>>> >>> Hi Jason, >>> >>> Performance testing and tuning on the data plane is in progress (btw, >>> that wouldn't affect the device part patches). >>> If possible, could we start the device part patch review in the meantime? >>> >>> Best, >>> Wei >>> >> Hi Wei: >> >> Will do, but basically, the cover lacks of the motivation for vhost-pci and I want >> to see some numbers first since I suspect it can over-perform exist data-path. >> >> Thanks > [Avi Cohen (A)] > Hi Wei > I can try testing to get **numbers** - I can do it now , need a little help from you > I've started with downloading/building and installing of > driver: https://github.com/wei-w-wang/vhost-pci-driver to the kernel guest, > **without** downloading the 2nd patch > device: https://github.com/wei-w-wang/vhost-pci-device > But my guest kernel was corrupted after reboot (kernel panic/out of mem ..) - can you tell me the steps to apply these patches ? > Best Regards > Avi The kernel driver do have some bugs in some environment, so it might be a good source to get feeling about how it works, but I wouldn't recommend you to test it by yourself at this point. We are currently focusing on the dpdk pmd, and wouldn't get back to the kernel driver until all are merged. Sorry about that. Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (7 preceding siblings ...) 2017-12-05 7:01 ` [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Jason Wang @ 2017-12-05 14:30 ` Stefan Hajnoczi 2017-12-05 15:20 ` [virtio-dev] " Michael S. Tsirkin ` (3 subsequent siblings) 12 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-05 14:30 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 1339 bytes --] On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > Vhost-pci is a point-to-point based inter-VM communication solution. This > patch series implements the vhost-pci-net device setup and emulation. The > device is implemented as a virtio device, and it is set up via the > vhost-user protocol to get the neessary info (e.g the memory info of the > remote VM, vring info). > > Currently, only the fundamental functions are implemented. More features, > such as MQ and live migration, will be updated in the future. > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > http://dpdk.org/ml/archives/dev/2017-November/082615.html Please be explicit about "vhost-pci" vs "vhost-pci-net" in future emails. In this email you use "vhost-pci" to mean both things and it's likely to cause confusion. For example, MQ and DPDK PMD are related to vhost-pci-net. This is a general problem with all of vhost-user, not just vhost-pci. People sometimes focus solely on virtio-net. It causes layering violations and protocol limitations because they are not thinking about the virtio device model as a whole. Eventually virtio-gpu, virtio-scsi, etc should all work over vhost-pci so it's important to keep the virtio architecture in mind with a separation between the transport and devices. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (8 preceding siblings ...) 2017-12-05 14:30 ` Stefan Hajnoczi @ 2017-12-05 15:20 ` Michael S. Tsirkin 2017-12-05 16:06 ` [virtio-dev] " Stefan Hajnoczi ` (2 subsequent siblings) 12 siblings, 0 replies; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-05 15:20 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, marcandre.lureau, jasowang, stefanha, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > Vhost-pci is a point-to-point based inter-VM communication solution. This > patch series implements the vhost-pci-net device setup and emulation. The > device is implemented as a virtio device, and it is set up via the > vhost-user protocol to get the neessary info (e.g the memory info of the > remote VM, vring info). > > Currently, only the fundamental functions are implemented. More features, > such as MQ and live migration, will be updated in the future. > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > http://dpdk.org/ml/archives/dev/2017-November/082615.html > > v2->v3 changes: > 1) static device creation: instead of creating and hot-plugging the > device when receiving a vhost-user msg, the device is not created > via the qemu booting command line. > 2) remove vqs: rq and ctrlq are removed in this version. > - receive vq: the receive vq is not needed anymore. The PMD directly > shares the remote txq and rxq - grab from remote txq to > receive packets, and put to rxq to send packets. > - ctrlq: the ctrlq is replaced by the first 4KB metadata area of the > device Bar-2. > 3) simpler implementation: the entire implementation has been tailored > from ~1800 LOC to ~850 LOC. Pls always include full change log, so pls add v1->v2 as well. Also pls include the link to the spec you posted previously (not required to be 100% in sync, but pls list main differences). > Wei Wang (7): > vhost-user: share the vhost-user protocol related structures > vhost-pci-net: add vhost-pci-net > virtio/virtio-pci.c: add vhost-pci-net-pci > vhost-pci-slave: add vhost-pci slave implementation > vhost-user: VHOST_USER_SET_VHOST_PCI msg > vhost-pci-slave: handle VHOST_USER_SET_VHOST_PCI > virtio/vhost.c: vhost-pci needs remote gpa > > hw/net/Makefile.objs | 2 +- > hw/net/vhost_net.c | 37 +++ > hw/net/vhost_pci_net.c | 137 +++++++++++ > hw/virtio/Makefile.objs | 1 + > hw/virtio/vhost-pci-slave.c | 310 +++++++++++++++++++++++++ > hw/virtio/vhost-user.c | 117 ++-------- > hw/virtio/vhost.c | 63 +++-- > hw/virtio/virtio-pci.c | 55 +++++ > hw/virtio/virtio-pci.h | 14 ++ > include/hw/pci/pci.h | 1 + > include/hw/virtio/vhost-backend.h | 2 + > include/hw/virtio/vhost-pci-net.h | 42 ++++ > include/hw/virtio/vhost-pci-slave.h | 12 + > include/hw/virtio/vhost-user.h | 108 +++++++++ > include/hw/virtio/vhost.h | 2 + > include/net/vhost_net.h | 2 + > include/standard-headers/linux/vhost_pci_net.h | 65 ++++++ > include/standard-headers/linux/virtio_ids.h | 1 + > 18 files changed, 851 insertions(+), 120 deletions(-) > create mode 100644 hw/net/vhost_pci_net.c > create mode 100644 hw/virtio/vhost-pci-slave.c > create mode 100644 include/hw/virtio/vhost-pci-net.h > create mode 100644 include/hw/virtio/vhost-pci-slave.h > create mode 100644 include/hw/virtio/vhost-user.h > create mode 100644 include/standard-headers/linux/vhost_pci_net.h > > -- > 2.7.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (9 preceding siblings ...) 2017-12-05 15:20 ` [virtio-dev] " Michael S. Tsirkin @ 2017-12-05 16:06 ` Stefan Hajnoczi 2017-12-06 13:49 ` Stefan Hajnoczi 2017-12-19 11:35 ` Stefan Hajnoczi 12 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-05 16:06 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 1506 bytes --] On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > Vhost-pci is a point-to-point based inter-VM communication solution. This > patch series implements the vhost-pci-net device setup and emulation. The > device is implemented as a virtio device, and it is set up via the > vhost-user protocol to get the neessary info (e.g the memory info of the > remote VM, vring info). > > Currently, only the fundamental functions are implemented. More features, > such as MQ and live migration, will be updated in the future. > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > http://dpdk.org/ml/archives/dev/2017-November/082615.html > > v2->v3 changes: > 1) static device creation: instead of creating and hot-plugging the > device when receiving a vhost-user msg, the device is not created > via the qemu booting command line. > 2) remove vqs: rq and ctrlq are removed in this version. > - receive vq: the receive vq is not needed anymore. The PMD directly > shares the remote txq and rxq - grab from remote txq to > receive packets, and put to rxq to send packets. > - ctrlq: the ctrlq is replaced by the first 4KB metadata area of the > device Bar-2. > 3) simpler implementation: the entire implementation has been tailored > from ~1800 LOC to ~850 LOC. Please write qtest test cases for vhost-pci. See tests/virtio-net-test.c and tests/vhost-user-test.c for examples. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (10 preceding siblings ...) 2017-12-05 16:06 ` [virtio-dev] " Stefan Hajnoczi @ 2017-12-06 13:49 ` Stefan Hajnoczi 2017-12-06 16:09 ` Wang, Wei W 2017-12-19 11:35 ` Stefan Hajnoczi 12 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-06 13:49 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 6733 bytes --] On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > Vhost-pci is a point-to-point based inter-VM communication solution. This > patch series implements the vhost-pci-net device setup and emulation. The > device is implemented as a virtio device, and it is set up via the > vhost-user protocol to get the neessary info (e.g the memory info of the > remote VM, vring info). > > Currently, only the fundamental functions are implemented. More features, > such as MQ and live migration, will be updated in the future. > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > http://dpdk.org/ml/archives/dev/2017-November/082615.html I have asked questions about the scope of this feature. In particular, I think it's best to support all device types rather than just virtio-net. Here is a design document that shows how this can be achieved. What I'm proposing is different from the current approach: 1. It's a PCI adapter (see below for justification) 2. The vhost-user protocol is exposed by the device (not handled 100% in QEMU). Ultimately I think your approach would also need to do this. I'm not implementing this and not asking you to implement it. Let's just use this for discussion so we can figure out what the final vhost-pci will look like. Please let me know what you think, Wei, Michael, and others. --- vhost-pci device specification ------------------------------- The vhost-pci device allows guests to act as vhost-user slaves. This enables appliance VMs like network switches or storage targets to back devices in other VMs. VM-to-VM communication is possible without vmexits using polling mode drivers. The vhost-user protocol has been used to implement virtio devices in userspace processes on the host. vhost-pci maps the vhost-user protocol to a PCI adapter so guest software can perform virtio device emulation. This is useful in environments where high-performance VM-to-VM communication is necessary or where it is preferrable to deploy device emulation as VMs instead of host userspace processes. The vhost-user protocol involves file descriptor passing and shared memory. This precludes vhost-user slave implementations over virtio-vsock, virtio-serial, or TCP/IP. Therefore a new device type is needed to expose the vhost-user protocol to guests. The vhost-pci PCI adapter has the following resources: Queues (used for vhost-user protocol communication): 1. Master-to-slave messages 2. Slave-to-master messages Doorbells (used for slave->guest/master events): 1. Vring call (one doorbell per virtqueue) 2. Vring err (one doorbell per virtqueue) 3. Log changed Interrupts (used for guest->slave events): 1. Vring kick (one MSI per virtqueue) Shared Memory BARs: 1. Guest memory 2. Log Master-to-slave queue: The following vhost-user protocol messages are relayed from the vhost-user master. Each message follows the vhost-user protocol VhostUserMsg layout. Messages that include file descriptor passing are relayed but do not carry file descriptors. The relevant resources (doorbells, interrupts, or shared memory BARs) are initialized from the file descriptors prior to the message becoming available on the Master-to-Slave queue. Resources must only be used after the corresponding vhost-user message has been received. For example, the Vring call doorbell can only be used after VHOST_USER_SET_VRING_CALL becomes available on the Master-to-Slave queue. Messages must be processed in order. The following vhost-user protocol messages are relayed: * VHOST_USER_GET_FEATURES * VHOST_USER_SET_FEATURES * VHOST_USER_GET_PROTOCOL_FEATURES * VHOST_USER_SET_PROTOCOL_FEATURES * VHOST_USER_SET_OWNER * VHOST_USER_SET_MEM_TABLE The shared memory is available in the corresponding BAR. * VHOST_USER_SET_LOG_BASE The shared memory is available in the corresponding BAR. * VHOST_USER_SET_LOG_FD The logging file descriptor can be signalled through the logging virtqueue. * VHOST_USER_SET_VRING_NUM * VHOST_USER_SET_VRING_ADDR * VHOST_USER_SET_VRING_BASE * VHOST_USER_GET_VRING_BASE * VHOST_USER_SET_VRING_KICK This message is still needed because it may indicate only polling mode is supported. * VHOST_USER_SET_VRING_CALL This message is still needed because it may indicate only polling mode is supported. * VHOST_USER_SET_VRING_ERR * VHOST_USER_GET_QUEUE_NUM * VHOST_USER_SET_VRING_ENABLE * VHOST_USER_SEND_RARP * VHOST_USER_NET_SET_MTU * VHOST_USER_SET_SLAVE_REQ_FD * VHOST_USER_IOTLB_MSG * VHOST_USER_SET_VRING_ENDIAN Slave-to-Master queue: Messages added to the Slave-to-Master queue are sent to the vhost-user master. Each message follows the vhost-user protocol VhostUserMsg layout. The following vhost-user protocol messages are relayed: * VHOST_USER_SLAVE_IOTLB_MSG Theory of Operation: When the vhost-pci adapter is detected the queues must be set up by the driver. Once the driver is ready the vhost-pci device begins relaying vhost-user protocol messages over the Master-to-Slave queue. The driver must follow the vhost-user protocol specification to implement virtio device initialization and virtqueue processing. Notes: The vhost-user UNIX domain socket connects two host processes. The slave process interprets messages and initializes vhost-pci resources (doorbells, interrupts, shared memory BARs) based on them before relaying via the Master-to-Slave queue. All messages are relayed, even if they only pass a file descriptor, because the message itself may act as a signal (e.g. virtqueue is now enabled). vhost-pci is a PCI adapter instead of a virtio device to allow doorbells and interrupts to be connected to the virtio device in the master VM in the most efficient way possible. This means the Vring call doorbell can be an ioeventfd that signals an irqfd inside the host kernel without host userspace involvement. The Vring kick interrupt can be an irqfd that is signalled by the master VM's virtqueue ioeventfd. It may be possible to write a Linux vhost-pci driver that implements the drivers/vhost/ API. That way existing vhost drivers could work with vhost-pci in the kernel. Guest userspace vhost-pci drivers will be similar to QEMU's contrib/libvhost-user/ except they will probably use vfio to access the vhost-pci device directly from userspace. TODO: * Queue memory layout and hardware registers * vhost-pci-level negotiation and configuration so the hardware interface can be extended in the future. * vhost-pci <-> driver initialization procedure * Master<->Slave disconnected & reconnect [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* RE: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-06 13:49 ` Stefan Hajnoczi @ 2017-12-06 16:09 ` Wang, Wei W [not found] ` <CAJSP0QWugAKQy6hYEJfy_XHEg-Q2swAzZMNcWBqn-r9Yi7yiEg@mail.gmail.com> 0 siblings, 1 reply; 77+ messages in thread From: Wang, Wei W @ 2017-12-06 16:09 UTC (permalink / raw) To: Stefan Hajnoczi Cc: virtio-dev@lists.oasis-open.org, qemu-devel@nongnu.org, mst@redhat.com, marcandre.lureau@redhat.com, jasowang@redhat.com, pbonzini@redhat.com, jan.kiszka@siemens.com, avi.cohen@huawei.com, Yang, Zhiyong On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote: > On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > > Vhost-pci is a point-to-point based inter-VM communication solution. > > This patch series implements the vhost-pci-net device setup and > > emulation. The device is implemented as a virtio device, and it is set > > up via the vhost-user protocol to get the neessary info (e.g the > > memory info of the remote VM, vring info). > > > > Currently, only the fundamental functions are implemented. More > > features, such as MQ and live migration, will be updated in the future. > > > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > > http://dpdk.org/ml/archives/dev/2017-November/082615.html > > I have asked questions about the scope of this feature. In particular, I think > it's best to support all device types rather than just virtio-net. Here is a > design document that shows how this can be achieved. > > What I'm proposing is different from the current approach: > 1. It's a PCI adapter (see below for justification) 2. The vhost-user protocol is > exposed by the device (not handled 100% in > QEMU). Ultimately I think your approach would also need to do this. > > I'm not implementing this and not asking you to implement it. Let's just use > this for discussion so we can figure out what the final vhost-pci will look like. > > Please let me know what you think, Wei, Michael, and others. > Thanks for sharing the thoughts. If I understand it correctly, the key difference is that this approach tries to relay every vhost-user msg to the guest. I'm not sure about the benefits of doing this. To make data plane (i.e. driver to send/receive packets) work, I think, mostly, the memory info and vring info are enough. Other things like callfd, kickfd don't need to be sent to the guest, they are needed by QEMU only for the eventfd and irqfd setup. > --- > vhost-pci device specification > ------------------------------- > The vhost-pci device allows guests to act as vhost-user slaves. This enables > appliance VMs like network switches or storage targets to back devices in > other VMs. VM-to-VM communication is possible without vmexits using > polling mode drivers. > > The vhost-user protocol has been used to implement virtio devices in > userspace processes on the host. vhost-pci maps the vhost-user protocol to > a PCI adapter so guest software can perform virtio device emulation. > This is useful in environments where high-performance VM-to-VM > communication is necessary or where it is preferrable to deploy device > emulation as VMs instead of host userspace processes. > > The vhost-user protocol involves file descriptor passing and shared memory. > This precludes vhost-user slave implementations over virtio-vsock, virtio- > serial, or TCP/IP. Therefore a new device type is needed to expose the > vhost-user protocol to guests. > > The vhost-pci PCI adapter has the following resources: > > Queues (used for vhost-user protocol communication): > 1. Master-to-slave messages > 2. Slave-to-master messages > > Doorbells (used for slave->guest/master events): > 1. Vring call (one doorbell per virtqueue) 2. Vring err (one doorbell per > virtqueue) 3. Log changed > > Interrupts (used for guest->slave events): > 1. Vring kick (one MSI per virtqueue) > > Shared Memory BARs: > 1. Guest memory > 2. Log > > Master-to-slave queue: > The following vhost-user protocol messages are relayed from the vhost-user > master. Each message follows the vhost-user protocol VhostUserMsg layout. > > Messages that include file descriptor passing are relayed but do not carry file > descriptors. The relevant resources (doorbells, interrupts, or shared memory > BARs) are initialized from the file descriptors prior to the message becoming > available on the Master-to-Slave queue. > > Resources must only be used after the corresponding vhost-user message > has been received. For example, the Vring call doorbell can only be used > after VHOST_USER_SET_VRING_CALL becomes available on the Master-to- > Slave queue. > > Messages must be processed in order. > > The following vhost-user protocol messages are relayed: > * VHOST_USER_GET_FEATURES > * VHOST_USER_SET_FEATURES > * VHOST_USER_GET_PROTOCOL_FEATURES > * VHOST_USER_SET_PROTOCOL_FEATURES > * VHOST_USER_SET_OWNER > * VHOST_USER_SET_MEM_TABLE > The shared memory is available in the corresponding BAR. > * VHOST_USER_SET_LOG_BASE > The shared memory is available in the corresponding BAR. > * VHOST_USER_SET_LOG_FD > The logging file descriptor can be signalled through the logging > virtqueue. > * VHOST_USER_SET_VRING_NUM > * VHOST_USER_SET_VRING_ADDR > * VHOST_USER_SET_VRING_BASE > * VHOST_USER_GET_VRING_BASE > * VHOST_USER_SET_VRING_KICK > This message is still needed because it may indicate only polling > mode is supported. > * VHOST_USER_SET_VRING_CALL > This message is still needed because it may indicate only polling > mode is supported. > * VHOST_USER_SET_VRING_ERR > * VHOST_USER_GET_QUEUE_NUM > * VHOST_USER_SET_VRING_ENABLE > * VHOST_USER_SEND_RARP > * VHOST_USER_NET_SET_MTU > * VHOST_USER_SET_SLAVE_REQ_FD > * VHOST_USER_IOTLB_MSG > * VHOST_USER_SET_VRING_ENDIAN > > Slave-to-Master queue: > Messages added to the Slave-to-Master queue are sent to the vhost-user > master. Each message follows the vhost-user protocol VhostUserMsg layout. > > The following vhost-user protocol messages are relayed: > > * VHOST_USER_SLAVE_IOTLB_MSG > > Theory of Operation: > When the vhost-pci adapter is detected the queues must be set up by the > driver. Once the driver is ready the vhost-pci device begins relaying vhost- > user protocol messages over the Master-to-Slave queue. The driver must > follow the vhost-user protocol specification to implement virtio device > initialization and virtqueue processing. > > Notes: > The vhost-user UNIX domain socket connects two host processes. The slave > process interprets messages and initializes vhost-pci resources (doorbells, > interrupts, shared memory BARs) based on them before relaying via the > Master-to-Slave queue. All messages are relayed, even if they only pass a > file descriptor, because the message itself may act as a signal (e.g. virtqueue > is now enabled). > > vhost-pci is a PCI adapter instead of a virtio device to allow doorbells and > interrupts to be connected to the virtio device in the master VM in the most > efficient way possible. This means the Vring call doorbell can be an > ioeventfd that signals an irqfd inside the host kernel without host userspace > involvement. The Vring kick interrupt can be an irqfd that is signalled by the > master VM's virtqueue ioeventfd. > This looks the same as the implementation of inter-VM notification in v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html which is fig. 4 here: https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf When the vhost-pci driver kicks its tx, the host signals the irqfd of virtio-net's rx. I think this has already bypassed the host userspace (thanks to the fast mmio implementation) Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QWugAKQy6hYEJfy_XHEg-Q2swAzZMNcWBqn-r9Yi7yiEg@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QWugAKQy6hYEJfy_XHEg-Q2swAzZMNcWBqn-r9Yi7yiEg@mail.gmail.com> @ 2017-12-07 3:57 ` Wei Wang 2017-12-07 5:11 ` Michael S. Tsirkin [not found] ` <CAJSP0QUxRv9LNb1+McYxV0KY4Ss3NkaSjwO6fXiJd+oU2+zJSQ@mail.gmail.com> 0 siblings, 2 replies; 77+ messages in thread From: Wei Wang @ 2017-12-07 3:57 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, mst@redhat.com, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, marcandre.lureau@redhat.com, pbonzini@redhat.com On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote: > On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W <wei.w.wang@intel.com> wrote: >> On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote: >>> On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: >>>> Vhost-pci is a point-to-point based inter-VM communication solution. >>>> This patch series implements the vhost-pci-net device setup and >>>> emulation. The device is implemented as a virtio device, and it is set >>>> up via the vhost-user protocol to get the neessary info (e.g the >>>> memory info of the remote VM, vring info). >>>> >>>> Currently, only the fundamental functions are implemented. More >>>> features, such as MQ and live migration, will be updated in the future. >>>> >>>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: >>>> http://dpdk.org/ml/archives/dev/2017-November/082615.html >>> I have asked questions about the scope of this feature. In particular, I think >>> it's best to support all device types rather than just virtio-net. Here is a >>> design document that shows how this can be achieved. >>> >>> What I'm proposing is different from the current approach: >>> 1. It's a PCI adapter (see below for justification) 2. The vhost-user protocol is >>> exposed by the device (not handled 100% in >>> QEMU). Ultimately I think your approach would also need to do this. >>> >>> I'm not implementing this and not asking you to implement it. Let's just use >>> this for discussion so we can figure out what the final vhost-pci will look like. >>> >>> Please let me know what you think, Wei, Michael, and others. >>> >> Thanks for sharing the thoughts. If I understand it correctly, the key difference is that this approach tries to relay every vhost-user msg to the guest. I'm not sure about the benefits of doing this. >> To make data plane (i.e. driver to send/receive packets) work, I think, mostly, the memory info and vring info are enough. Other things like callfd, kickfd don't need to be sent to the guest, they are needed by QEMU only for the eventfd and irqfd setup. > Handling the vhost-user protocol inside QEMU and exposing a different > interface to the guest makes the interface device-specific. This will > cause extra work to support new devices (vhost-user-scsi, > vhost-user-blk). It also makes development harder because you might > have to learn 3 separate specifications to debug the system (virtio, > vhost-user, vhost-pci-net). > > If vhost-user is mapped to a PCI device then these issues are solved. I intend to have a different opinion about this: 1) Even relaying the msgs to the guest, QEMU still need to handle the msg first, for example, it needs to decode the msg to see if it is the ones (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be used for the device setup (e.g. mmap the memory given via SET_MEM_TABLE). In this case, we will be likely to have 2 slave handlers - one in the guest, another in QEMU device. 2) If people already understand the vhost-user protocol, it would be natural for them to understand the vhost-pci metadata - just the obtained memory and vring info are put to the metadata area (no new things). Inspired from your sharing, how about the following: we can actually factor out a common vhost-pci layer, which handles all the features that are common to all the vhost-pci series of devices (vhost-pci-net, vhost-pci-blk,...) Coming to the implementation, we can have a VhostpciDeviceClass (similar to VirtioDeviceClass), the device realize sequence will be virtio_device_realize()-->vhost_pci_device_realize()-->vhost_pci_net_device_realize() > >>> vhost-pci is a PCI adapter instead of a virtio device to allow doorbells and >>> interrupts to be connected to the virtio device in the master VM in the most >>> efficient way possible. This means the Vring call doorbell can be an >>> ioeventfd that signals an irqfd inside the host kernel without host userspace >>> involvement. The Vring kick interrupt can be an irqfd that is signalled by the >>> master VM's virtqueue ioeventfd. >>> >> >> This looks the same as the implementation of inter-VM notification in v2: >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html >> which is fig. 4 here: https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf >> >> When the vhost-pci driver kicks its tx, the host signals the irqfd of virtio-net's rx. I think this has already bypassed the host userspace (thanks to the fast mmio implementation) > Yes, I think the irqfd <-> ioeventfd mapping is good. Perhaps it even > makes sense to implement a special fused_irq_ioevent_fd in the host > kernel to bypass the need for a kernel thread to read the eventfd so > that an interrupt can be injected (i.e. to make the operation > synchronous). > > Is the tx virtqueue in your inter-VM notification v2 series a real > virtqueue that gets used? Or is it just a dummy virtqueue that you're > using for the ioeventfd doorbell? It looks like vpnet_handle_vq() is > empty so it's really just a dummy. The actual virtqueue is in the > vhost-user master guest memory. Yes, that tx is a dummy actually, just created to use its doorbell. Currently, with virtio_device, I think ioeventfd comes with virtqueue only. Actually, I think we could have the issues solved by vhost-pci. For example, reserve a piece of the BAR area for ioeventfd. The bar layout can be: BAR 2: 0~4k: vhost-pci device specific usages (ioeventfd etc) 4k~8k: metadata (memory info and vring info) 8k~64GB: remote guest memory (we can make the bar size (64GB is the default value used) configurable via qemu cmdline) Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-07 3:57 ` [virtio-dev] Re: [Qemu-devel] " Wei Wang @ 2017-12-07 5:11 ` Michael S. Tsirkin 2017-12-07 5:34 ` Wei Wang [not found] ` <CAJSP0QUxRv9LNb1+McYxV0KY4Ss3NkaSjwO6fXiJd+oU2+zJSQ@mail.gmail.com> 1 sibling, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-07 5:11 UTC (permalink / raw) To: Wei Wang Cc: Stefan Hajnoczi, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, marcandre.lureau@redhat.com, pbonzini@redhat.com On Thu, Dec 07, 2017 at 11:57:33AM +0800, Wei Wang wrote: > On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote: > > On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W <wei.w.wang@intel.com> wrote: > > > On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote: > > > > On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > > > > > Vhost-pci is a point-to-point based inter-VM communication solution. > > > > > This patch series implements the vhost-pci-net device setup and > > > > > emulation. The device is implemented as a virtio device, and it is set > > > > > up via the vhost-user protocol to get the neessary info (e.g the > > > > > memory info of the remote VM, vring info). > > > > > > > > > > Currently, only the fundamental functions are implemented. More > > > > > features, such as MQ and live migration, will be updated in the future. > > > > > > > > > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > > > > > http://dpdk.org/ml/archives/dev/2017-November/082615.html > > > > I have asked questions about the scope of this feature. In particular, I think > > > > it's best to support all device types rather than just virtio-net. Here is a > > > > design document that shows how this can be achieved. > > > > > > > > What I'm proposing is different from the current approach: > > > > 1. It's a PCI adapter (see below for justification) 2. The vhost-user protocol is > > > > exposed by the device (not handled 100% in > > > > QEMU). Ultimately I think your approach would also need to do this. > > > > > > > > I'm not implementing this and not asking you to implement it. Let's just use > > > > this for discussion so we can figure out what the final vhost-pci will look like. > > > > > > > > Please let me know what you think, Wei, Michael, and others. > > > > > > > Thanks for sharing the thoughts. If I understand it correctly, the key difference is that this approach tries to relay every vhost-user msg to the guest. I'm not sure about the benefits of doing this. > > > To make data plane (i.e. driver to send/receive packets) work, I think, mostly, the memory info and vring info are enough. Other things like callfd, kickfd don't need to be sent to the guest, they are needed by QEMU only for the eventfd and irqfd setup. > > Handling the vhost-user protocol inside QEMU and exposing a different > > interface to the guest makes the interface device-specific. This will > > cause extra work to support new devices (vhost-user-scsi, > > vhost-user-blk). It also makes development harder because you might > > have to learn 3 separate specifications to debug the system (virtio, > > vhost-user, vhost-pci-net). > > > > If vhost-user is mapped to a PCI device then these issues are solved. > > I intend to have a different opinion about this: > > 1) Even relaying the msgs to the guest, QEMU still need to handle the msg > first, for example, it needs to decode the msg to see if it is the ones > (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be used for > the device setup (e.g. mmap the memory given via SET_MEM_TABLE). In this > case, we will be likely to have 2 slave handlers - one in the guest, another > in QEMU device. > > 2) If people already understand the vhost-user protocol, it would be natural > for them to understand the vhost-pci metadata - just the obtained memory and > vring info are put to the metadata area (no new things). I see a bigger problem with passthrough. If qemu can't fully decode all messages, it can not operate in a disconected mode - guest will have to stop on disconnect until we re-connect a backend. > > Inspired from your sharing, how about the following: > we can actually factor out a common vhost-pci layer, which handles all the > features that are common to all the vhost-pci series of devices > (vhost-pci-net, vhost-pci-blk,...) > Coming to the implementation, we can have a VhostpciDeviceClass (similar to > VirtioDeviceClass), the device realize sequence will be virtio_device_realize()-->vhost_pci_device_realize()-->vhost_pci_net_device_realize() > > > > > > > > > vhost-pci is a PCI adapter instead of a virtio device to allow doorbells and > > > > interrupts to be connected to the virtio device in the master VM in the most > > > > efficient way possible. This means the Vring call doorbell can be an > > > > ioeventfd that signals an irqfd inside the host kernel without host userspace > > > > involvement. The Vring kick interrupt can be an irqfd that is signalled by the > > > > master VM's virtqueue ioeventfd. > > > > > > > > > > This looks the same as the implementation of inter-VM notification in v2: > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html > > > which is fig. 4 here: https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf > > > > > > When the vhost-pci driver kicks its tx, the host signals the irqfd of virtio-net's rx. I think this has already bypassed the host userspace (thanks to the fast mmio implementation) > > Yes, I think the irqfd <-> ioeventfd mapping is good. Perhaps it even > > makes sense to implement a special fused_irq_ioevent_fd in the host > > kernel to bypass the need for a kernel thread to read the eventfd so > > that an interrupt can be injected (i.e. to make the operation > > synchronous). > > > > Is the tx virtqueue in your inter-VM notification v2 series a real > > virtqueue that gets used? Or is it just a dummy virtqueue that you're > > using for the ioeventfd doorbell? It looks like vpnet_handle_vq() is > > empty so it's really just a dummy. The actual virtqueue is in the > > vhost-user master guest memory. > > > Yes, that tx is a dummy actually, just created to use its doorbell. > Currently, with virtio_device, I think ioeventfd comes with virtqueue only. > Actually, I think we could have the issues solved by vhost-pci. For example, > reserve a piece of the BAR area for ioeventfd. The bar layout can be: > BAR 2: > 0~4k: vhost-pci device specific usages (ioeventfd etc) > 4k~8k: metadata (memory info and vring info) > 8k~64GB: remote guest memory > (we can make the bar size (64GB is the default value used) configurable via > qemu cmdline) > > > Best, > Wei > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-07 5:11 ` Michael S. Tsirkin @ 2017-12-07 5:34 ` Wei Wang 0 siblings, 0 replies; 77+ messages in thread From: Wei Wang @ 2017-12-07 5:34 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Stefan Hajnoczi, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, marcandre.lureau@redhat.com, pbonzini@redhat.com On 12/07/2017 01:11 PM, Michael S. Tsirkin wrote: > On Thu, Dec 07, 2017 at 11:57:33AM +0800, Wei Wang wrote: >> On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote: >>> On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W <wei.w.wang@intel.com> wrote: >>>> On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote: >>>>> On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: >>>>>> Vhost-pci is a point-to-point based inter-VM communication solution. >>>>>> This patch series implements the vhost-pci-net device setup and >>>>>> emulation. The device is implemented as a virtio device, and it is set >>>>>> up via the vhost-user protocol to get the neessary info (e.g the >>>>>> memory info of the remote VM, vring info). >>>>>> >>>>>> Currently, only the fundamental functions are implemented. More >>>>>> features, such as MQ and live migration, will be updated in the future. >>>>>> >>>>>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: >>>>>> http://dpdk.org/ml/archives/dev/2017-November/082615.html >>>>> I have asked questions about the scope of this feature. In particular, I think >>>>> it's best to support all device types rather than just virtio-net. Here is a >>>>> design document that shows how this can be achieved. >>>>> >>>>> What I'm proposing is different from the current approach: >>>>> 1. It's a PCI adapter (see below for justification) 2. The vhost-user protocol is >>>>> exposed by the device (not handled 100% in >>>>> QEMU). Ultimately I think your approach would also need to do this. >>>>> >>>>> I'm not implementing this and not asking you to implement it. Let's just use >>>>> this for discussion so we can figure out what the final vhost-pci will look like. >>>>> >>>>> Please let me know what you think, Wei, Michael, and others. >>>>> >>>> Thanks for sharing the thoughts. If I understand it correctly, the key difference is that this approach tries to relay every vhost-user msg to the guest. I'm not sure about the benefits of doing this. >>>> To make data plane (i.e. driver to send/receive packets) work, I think, mostly, the memory info and vring info are enough. Other things like callfd, kickfd don't need to be sent to the guest, they are needed by QEMU only for the eventfd and irqfd setup. >>> Handling the vhost-user protocol inside QEMU and exposing a different >>> interface to the guest makes the interface device-specific. This will >>> cause extra work to support new devices (vhost-user-scsi, >>> vhost-user-blk). It also makes development harder because you might >>> have to learn 3 separate specifications to debug the system (virtio, >>> vhost-user, vhost-pci-net). >>> >>> If vhost-user is mapped to a PCI device then these issues are solved. >> I intend to have a different opinion about this: >> >> 1) Even relaying the msgs to the guest, QEMU still need to handle the msg >> first, for example, it needs to decode the msg to see if it is the ones >> (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be used for >> the device setup (e.g. mmap the memory given via SET_MEM_TABLE). In this >> case, we will be likely to have 2 slave handlers - one in the guest, another >> in QEMU device. >> >> 2) If people already understand the vhost-user protocol, it would be natural >> for them to understand the vhost-pci metadata - just the obtained memory and >> vring info are put to the metadata area (no new things). > I see a bigger problem with passthrough. If qemu can't fully decode all > messages, it can not operate in a disconected mode - guest will have to > stop on disconnect until we re-connect a backend. > What do you mean by "passthrough" in this case? Why qemu can't fully decode all the messages (probably I haven't got the point) Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QUxRv9LNb1+McYxV0KY4Ss3NkaSjwO6fXiJd+oU2+zJSQ@mail.gmail.com>]
* [virtio-dev] RE: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QUxRv9LNb1+McYxV0KY4Ss3NkaSjwO6fXiJd+oU2+zJSQ@mail.gmail.com> @ 2017-12-07 7:54 ` Avi Cohen (A) [not found] ` <CAJSP0QVSOsPXYTyjCsBbUmzivjtYbC7xKpU2m7dQbAPMhrcLnA@mail.gmail.com> 2017-12-07 13:33 ` Michael S. Tsirkin 2017-12-07 9:02 ` Wei Wang 1 sibling, 2 replies; 77+ messages in thread From: Avi Cohen (A) @ 2017-12-07 7:54 UTC (permalink / raw) To: Stefan Hajnoczi, Wei Wang Cc: Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, mst@redhat.com, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, qemu-devel@nongnu.org, marcandre.lureau@redhat.com, pbonzini@redhat.com There is already a virtio mechanism in which 2 VMs assigned a virtio device , are communicating via a veth pair in the host . KVM just passes a pointer of the page of the writer VM to the reader VM - resulting in excellent performance (no vSwitch in the middle) **Question**: What is the advantage of vhost-pci compared to this ? Best Regards Avi > -----Original Message----- > From: Stefan Hajnoczi [mailto:stefanha@gmail.com] > Sent: Thursday, 07 December, 2017 8:31 AM > To: Wei Wang > Cc: Stefan Hajnoczi; virtio-dev@lists.oasis-open.org; mst@redhat.com; Yang, > Zhiyong; jan.kiszka@siemens.com; jasowang@redhat.com; Avi Cohen (A); > qemu-devel@nongnu.org; marcandre.lureau@redhat.com; > pbonzini@redhat.com > Subject: Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM > communication > > On Thu, Dec 7, 2017 at 3:57 AM, Wei Wang <wei.w.wang@intel.com> wrote: > > On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote: > >> > >> On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W <wei.w.wang@intel.com> > wrote: > >>> > >>> On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote: > >>>> > >>>> On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > >>>>> > >>>>> Vhost-pci is a point-to-point based inter-VM communication solution. > >>>>> This patch series implements the vhost-pci-net device setup and > >>>>> emulation. The device is implemented as a virtio device, and it is > >>>>> set up via the vhost-user protocol to get the neessary info (e.g > >>>>> the memory info of the remote VM, vring info). > >>>>> > >>>>> Currently, only the fundamental functions are implemented. More > >>>>> features, such as MQ and live migration, will be updated in the future. > >>>>> > >>>>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > >>>>> http://dpdk.org/ml/archives/dev/2017-November/082615.html > >>>> > >>>> I have asked questions about the scope of this feature. In > >>>> particular, I think it's best to support all device types rather > >>>> than just virtio-net. Here is a design document that shows how > >>>> this can be achieved. > >>>> > >>>> What I'm proposing is different from the current approach: > >>>> 1. It's a PCI adapter (see below for justification) 2. The > >>>> vhost-user protocol is exposed by the device (not handled 100% in > >>>> QEMU). Ultimately I think your approach would also need to do this. > >>>> > >>>> I'm not implementing this and not asking you to implement it. > >>>> Let's just use this for discussion so we can figure out what the > >>>> final vhost-pci will look like. > >>>> > >>>> Please let me know what you think, Wei, Michael, and others. > >>>> > >>> Thanks for sharing the thoughts. If I understand it correctly, the > >>> key difference is that this approach tries to relay every vhost-user > >>> msg to the guest. I'm not sure about the benefits of doing this. > >>> To make data plane (i.e. driver to send/receive packets) work, I > >>> think, mostly, the memory info and vring info are enough. Other > >>> things like callfd, kickfd don't need to be sent to the guest, they > >>> are needed by QEMU only for the eventfd and irqfd setup. > >> > >> Handling the vhost-user protocol inside QEMU and exposing a different > >> interface to the guest makes the interface device-specific. This > >> will cause extra work to support new devices (vhost-user-scsi, > >> vhost-user-blk). It also makes development harder because you might > >> have to learn 3 separate specifications to debug the system (virtio, > >> vhost-user, vhost-pci-net). > >> > >> If vhost-user is mapped to a PCI device then these issues are solved. > > > > > > I intend to have a different opinion about this: > > > > 1) Even relaying the msgs to the guest, QEMU still need to handle the > > msg first, for example, it needs to decode the msg to see if it is the > > ones (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should > > be used for the device setup (e.g. mmap the memory given via > > SET_MEM_TABLE). In this case, we will be likely to have 2 slave > > handlers - one in the guest, another in QEMU device. > > In theory the vhost-pci PCI adapter could decide not to relay certain messages. > As explained in the document, I think it's better to relay everything because > some messages that only carry an fd still have a meaning. They are a signal > that the master has entered a new state. > > The approach in this patch series doesn't really solve the 2 handler problem, it > still needs to notify the guest when certain vhost-user messages are received > from the master. The difference is just that it's non-trivial in this patch series > because each message is handled on a case-by-case basis and has a custom > interface (does not simply relay a vhost-user protocol message). > > A 1:1 model is simple and consistent. I think it will avoid bugs and design > mistakes. > > > 2) If people already understand the vhost-user protocol, it would be > > natural for them to understand the vhost-pci metadata - just the > > obtained memory and vring info are put to the metadata area (no new > things). > > This is debatable. It's like saying if you understand QEMU command-line > options you will understand libvirt domain XML. They map to each other but > how obvious that mapping is depends on the details. > I'm saying a 1:1 mapping (reusing the vhost-user protocol message > layout) is the cleanest option. > > > Inspired from your sharing, how about the following: > > we can actually factor out a common vhost-pci layer, which handles all > > the features that are common to all the vhost-pci series of devices > > (vhost-pci-net, vhost-pci-blk,...) Coming to the implementation, we > > can have a VhostpciDeviceClass (similar to VirtioDeviceClass), the > > device realize sequence will be > > virtio_device_realize()-->vhost_pci_device_realize()-->vhost_pci_net_d > > evice_realize() > > Why have individual device types (vhost-pci-net, vhost-pci-blk, etc) instead of > just a vhost-pci device? > > >>>> vhost-pci is a PCI adapter instead of a virtio device to allow > >>>> doorbells and interrupts to be connected to the virtio device in > >>>> the master VM in the most efficient way possible. This means the > >>>> Vring call doorbell can be an ioeventfd that signals an irqfd > >>>> inside the host kernel without host userspace involvement. The > >>>> Vring kick interrupt can be an irqfd that is signalled by the > >>>> master VM's virtqueue ioeventfd. > >>>> > >>> > >>> This looks the same as the implementation of inter-VM notification in v2: > >>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html > >>> which is fig. 4 here: > >>> https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost > >>> -pci-rfc2.0.pdf > >>> > >>> When the vhost-pci driver kicks its tx, the host signals the irqfd > >>> of virtio-net's rx. I think this has already bypassed the host > >>> userspace (thanks to the fast mmio implementation) > >> > >> Yes, I think the irqfd <-> ioeventfd mapping is good. Perhaps it > >> even makes sense to implement a special fused_irq_ioevent_fd in the > >> host kernel to bypass the need for a kernel thread to read the > >> eventfd so that an interrupt can be injected (i.e. to make the > >> operation synchronous). > >> > >> Is the tx virtqueue in your inter-VM notification v2 series a real > >> virtqueue that gets used? Or is it just a dummy virtqueue that > >> you're using for the ioeventfd doorbell? It looks like > >> vpnet_handle_vq() is empty so it's really just a dummy. The actual > >> virtqueue is in the vhost-user master guest memory. > > > > > > > > Yes, that tx is a dummy actually, just created to use its doorbell. > > Currently, with virtio_device, I think ioeventfd comes with virtqueue only. > > Actually, I think we could have the issues solved by vhost-pci. For > > example, reserve a piece of the BAR area for ioeventfd. The bar layout can > be: > > BAR 2: > > 0~4k: vhost-pci device specific usages (ioeventfd etc) > > 4k~8k: metadata (memory info and vring info) > > 8k~64GB: remote guest memory > > (we can make the bar size (64GB is the default value used) > > configurable via qemu cmdline) > > Why use a virtio device? The doorbell and shared memory don't fit the virtio > architecture. There are no real virtqueues. This makes it a strange virtio > device. > > Stefan ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QVSOsPXYTyjCsBbUmzivjtYbC7xKpU2m7dQbAPMhrcLnA@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QVSOsPXYTyjCsBbUmzivjtYbC7xKpU2m7dQbAPMhrcLnA@mail.gmail.com> @ 2017-12-07 8:31 ` Jason Wang 2017-12-07 10:24 ` Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Jason Wang @ 2017-12-07 8:31 UTC (permalink / raw) To: Stefan Hajnoczi, Avi Cohen (A) Cc: Wei Wang, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, mst@redhat.com, Yang, Zhiyong, jan.kiszka@siemens.com, qemu-devel@nongnu.org, marcandre.lureau@redhat.com, pbonzini@redhat.com On 2017年12月07日 16:04, Stefan Hajnoczi wrote: > On Thu, Dec 7, 2017 at 7:54 AM, Avi Cohen (A) <avi.cohen@huawei.com> wrote: >> There is already a virtio mechanism in which 2 VMs assigned a virtio device , are communicating via a veth pair in the host . >> KVM just passes a pointer of the page of the writer VM to the reader VM - resulting in excellent performance (no vSwitch in the middle) >> **Question**: What is the advantage of vhost-pci compared to this ? > Which mechanism do you mean? > > vhost-pci will allow VM-to-VM communication without vmexits when > polling mode is used. Does the mechanism you are thinking about > require vmexits? > > Stefan I guess what Avi means is probably veth tap support (RFC) here: https://www.spinics.net/lists/netdev/msg454040.html But in fact, we don't need veth at all, by using rx handler trick, we can easily implement pair mode on TUN/TAP. Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-07 8:31 ` [virtio-dev] " Jason Wang @ 2017-12-07 10:24 ` Stefan Hajnoczi 0 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-07 10:24 UTC (permalink / raw) To: Jason Wang Cc: Stefan Hajnoczi, Avi Cohen (A), Wei Wang, virtio-dev@lists.oasis-open.org, mst@redhat.com, Yang, Zhiyong, jan.kiszka@siemens.com, qemu-devel@nongnu.org, marcandre.lureau@redhat.com, pbonzini@redhat.com [-- Attachment #1: Type: text/plain, Size: 1221 bytes --] On Thu, Dec 07, 2017 at 04:31:27PM +0800, Jason Wang wrote: > > > On 2017年12月07日 16:04, Stefan Hajnoczi wrote: > > On Thu, Dec 7, 2017 at 7:54 AM, Avi Cohen (A) <avi.cohen@huawei.com> wrote: > > > There is already a virtio mechanism in which 2 VMs assigned a virtio device , are communicating via a veth pair in the host . > > > KVM just passes a pointer of the page of the writer VM to the reader VM - resulting in excellent performance (no vSwitch in the middle) > > > **Question**: What is the advantage of vhost-pci compared to this ? > > Which mechanism do you mean? > > > > vhost-pci will allow VM-to-VM communication without vmexits when > > polling mode is used. Does the mechanism you are thinking about > > require vmexits? > > > > Stefan > > I guess what Avi means is probably veth tap support (RFC) here: > > https://www.spinics.net/lists/netdev/msg454040.html > > But in fact, we don't need veth at all, by using rx handler trick, we can > easily implement pair mode on TUN/TAP. Okay, that is a different use case: 1. VM<->physical NIC: macvtap 2. VM<->host userspace: vhost-user 3. VM<->host/container: vethtap or pair mode 4. VM<->VM: vhost-pci Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-07 7:54 ` [virtio-dev] " Avi Cohen (A) [not found] ` <CAJSP0QVSOsPXYTyjCsBbUmzivjtYbC7xKpU2m7dQbAPMhrcLnA@mail.gmail.com> @ 2017-12-07 13:33 ` Michael S. Tsirkin 1 sibling, 0 replies; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-07 13:33 UTC (permalink / raw) To: Avi Cohen (A) Cc: Stefan Hajnoczi, Wei Wang, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, qemu-devel@nongnu.org, marcandre.lureau@redhat.com, pbonzini@redhat.com On Thu, Dec 07, 2017 at 07:54:48AM +0000, Avi Cohen (A) wrote: > There is already a virtio mechanism in which 2 VMs assigned a virtio device , are communicating via a veth pair in the host . > KVM just passes a pointer of the page of the writer VM to the reader VM - resulting in excellent performance (no vSwitch in the middle) What exactly do you refer to? > **Question**: What is the advantage of vhost-pci compared to this ? > Best Regards > Avi I suspect what's missing in this thread is the original motivation rfc: http://virtualization.linux-foundation.narkive.com/A7FkzAgp/rfc-vhost-user-enhancements-for-vm2vm-communication --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QUxRv9LNb1+McYxV0KY4Ss3NkaSjwO6fXiJd+oU2+zJSQ@mail.gmail.com> 2017-12-07 7:54 ` [virtio-dev] " Avi Cohen (A) @ 2017-12-07 9:02 ` Wei Wang [not found] ` <CAJSP0QURjdD8BnOmJo83fzJn_zCijSKQh==Pz+Xu4r6Q2i3SkQ@mail.gmail.com> 1 sibling, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-07 9:02 UTC (permalink / raw) To: Stefan Hajnoczi Cc: virtio-dev@lists.oasis-open.org, mst@redhat.com, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On 12/07/2017 02:31 PM, Stefan Hajnoczi wrote: > On Thu, Dec 7, 2017 at 3:57 AM, Wei Wang <wei.w.wang@intel.com> wrote: >> On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote: >>> On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W <wei.w.wang@intel.com> wrote: >>>> On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote: >>>>> On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: >>>>>> Vhost-pci is a point-to-point based inter-VM communication solution. >>>>>> This patch series implements the vhost-pci-net device setup and >>>>>> emulation. The device is implemented as a virtio device, and it is set >>>>>> up via the vhost-user protocol to get the neessary info (e.g the >>>>>> memory info of the remote VM, vring info). >>>>>> >>>>>> Currently, only the fundamental functions are implemented. More >>>>>> features, such as MQ and live migration, will be updated in the future. >>>>>> >>>>>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: >>>>>> http://dpdk.org/ml/archives/dev/2017-November/082615.html >>>>> I have asked questions about the scope of this feature. In particular, >>>>> I think >>>>> it's best to support all device types rather than just virtio-net. Here >>>>> is a >>>>> design document that shows how this can be achieved. >>>>> >>>>> What I'm proposing is different from the current approach: >>>>> 1. It's a PCI adapter (see below for justification) 2. The vhost-user >>>>> protocol is >>>>> exposed by the device (not handled 100% in >>>>> QEMU). Ultimately I think your approach would also need to do this. >>>>> >>>>> I'm not implementing this and not asking you to implement it. Let's >>>>> just use >>>>> this for discussion so we can figure out what the final vhost-pci will >>>>> look like. >>>>> >>>>> Please let me know what you think, Wei, Michael, and others. >>>>> >>>> Thanks for sharing the thoughts. If I understand it correctly, the key >>>> difference is that this approach tries to relay every vhost-user msg to the >>>> guest. I'm not sure about the benefits of doing this. >>>> To make data plane (i.e. driver to send/receive packets) work, I think, >>>> mostly, the memory info and vring info are enough. Other things like callfd, >>>> kickfd don't need to be sent to the guest, they are needed by QEMU only for >>>> the eventfd and irqfd setup. >>> Handling the vhost-user protocol inside QEMU and exposing a different >>> interface to the guest makes the interface device-specific. This will >>> cause extra work to support new devices (vhost-user-scsi, >>> vhost-user-blk). It also makes development harder because you might >>> have to learn 3 separate specifications to debug the system (virtio, >>> vhost-user, vhost-pci-net). >>> >>> If vhost-user is mapped to a PCI device then these issues are solved. >> >> I intend to have a different opinion about this: >> >> 1) Even relaying the msgs to the guest, QEMU still need to handle the msg >> first, for example, it needs to decode the msg to see if it is the ones >> (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be used for >> the device setup (e.g. mmap the memory given via SET_MEM_TABLE). In this >> case, we will be likely to have 2 slave handlers - one in the guest, another >> in QEMU device. > In theory the vhost-pci PCI adapter could decide not to relay certain > messages. As explained in the document, I think it's better to relay > everything because some messages that only carry an fd still have a > meaning. It has its meaning, which is useful for the device setup, but that's not useful for the guest. I think the point is most of the mgs are not useful for the guest. IMHO, the relay mechanism is useful when 1) the QEMU slave handler doesn't need to process the messages at all (receive and directly pass on to the guest) 2) most of the msgs are useful for the guest (say we have more than 20 msgs, only 2 or 3 of them are useful for the guest, why let the device pass all of them to the guest) Also the relay mechanism complicates the vhost-user protocol interaction: normally, only master<->QemuSlave. With the relay mechanism, it will be master<->QemuSlave<->GuestSlave. For example, when the master sends VHOST_USER_GET_QUEUE_NUM, normally it can be answered by QemuSlave directly. Why complicate it by passing the msg the GuestSlave, and then get the same answer from GuestSlave. > They are a signal that the master has entered a new state. Actually vhost-user isn't state-machine based. > Why have individual device types (vhost-pci-net, vhost-pci-blk, etc) > instead of just a vhost-pci device? This is the same as virtio - we don't have a single virtio device, we have virtio-net, virtio-blk etc. So, the same way, we can have a common TYPE_VHOST_PCI_DEVICE parent device (like TYPE_VIRTIO_DEVICE), but net may have its own special features like MRG_RXBUF, and own config registers like mac[6] etc, so we can have TYPE_VHOST_PCI_NET under TYPE_VHOST_PCI_DEVICE. > >>>>> vhost-pci is a PCI adapter instead of a virtio device to allow doorbells >>>>> and >>>>> interrupts to be connected to the virtio device in the master VM in the >>>>> most >>>>> efficient way possible. This means the Vring call doorbell can be an >>>>> ioeventfd that signals an irqfd inside the host kernel without host >>>>> userspace >>>>> involvement. The Vring kick interrupt can be an irqfd that is signalled >>>>> by the >>>>> master VM's virtqueue ioeventfd. >>>>> >>>> This looks the same as the implementation of inter-VM notification in v2: >>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html >>>> which is fig. 4 here: >>>> https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf >>>> >>>> When the vhost-pci driver kicks its tx, the host signals the irqfd of >>>> virtio-net's rx. I think this has already bypassed the host userspace >>>> (thanks to the fast mmio implementation) >>> Yes, I think the irqfd <-> ioeventfd mapping is good. Perhaps it even >>> makes sense to implement a special fused_irq_ioevent_fd in the host >>> kernel to bypass the need for a kernel thread to read the eventfd so >>> that an interrupt can be injected (i.e. to make the operation >>> synchronous). >>> >>> Is the tx virtqueue in your inter-VM notification v2 series a real >>> virtqueue that gets used? Or is it just a dummy virtqueue that you're >>> using for the ioeventfd doorbell? It looks like vpnet_handle_vq() is >>> empty so it's really just a dummy. The actual virtqueue is in the >>> vhost-user master guest memory. >> >> >> Yes, that tx is a dummy actually, just created to use its doorbell. >> Currently, with virtio_device, I think ioeventfd comes with virtqueue only. >> Actually, I think we could have the issues solved by vhost-pci. For example, >> reserve a piece of the BAR area for ioeventfd. The bar layout can be: >> BAR 2: >> 0~4k: vhost-pci device specific usages (ioeventfd etc) >> 4k~8k: metadata (memory info and vring info) >> 8k~64GB: remote guest memory >> (we can make the bar size (64GB is the default value used) configurable via >> qemu cmdline) > Why use a virtio device? The doorbell and shared memory don't fit the > virtio architecture. There are no real virtqueues. This makes it a > strange virtio device. The virtio spec doesn't seem to require the device to have at lease one virtqueue. It doesn't make a huge difference to me whether it is a virtio device or a regular PCI device. We use it as a virtio device because it acts as a backend of virtio devices, not sure if it could be used by other devices (I guess virtio would be the main paravirtualized-like device here) --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QURjdD8BnOmJo83fzJn_zCijSKQh==Pz+Xu4r6Q2i3SkQ@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QURjdD8BnOmJo83fzJn_zCijSKQh==Pz+Xu4r6Q2i3SkQ@mail.gmail.com> @ 2017-12-07 14:02 ` Michael S. Tsirkin [not found] ` <CAJSP0QVu4iwAu01Sth84VZshQde97x3FW1E1ua_YXVKs-65vhQ@mail.gmail.com> 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-07 14:02 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On Thu, Dec 07, 2017 at 01:08:04PM +0000, Stefan Hajnoczi wrote: > Instead of responding individually to these points, I hope this will > explain my perspective. Let me know if you do want individual > responses, I'm happy to talk more about the points above but I think > the biggest difference is our perspective on this: > > Existing vhost-user slave code should be able to run on top of > vhost-pci. For example, QEMU's > contrib/vhost-user-scsi/vhost-user-scsi.c should work inside the guest > with only minimal changes to the source file (i.e. today it explicitly > opens a UNIX domain socket and that should be done by libvhost-user > instead). It shouldn't be hard to add vhost-pci vfio support to > contrib/libvhost-user/ alongside the existing UNIX domain socket code. > > This seems pretty easy to achieve with the vhost-pci PCI adapter that > I've described but I'm not sure how to implement libvhost-user on top > of vhost-pci vfio if the device doesn't expose the vhost-user > protocol. > > I think this is a really important goal. Let's use a single > vhost-user software stack instead of creating a separate one for guest > code only. > > Do you agree that the vhost-user software stack should be shared > between host userspace and guest code as much as possible? The sharing you propose is not necessarily practical because the security goals of the two are different. It seems that the best motivation presentation is still the original rfc http://virtualization.linux-foundation.narkive.com/A7FkzAgp/rfc-vhost-user-enhancements-for-vm2vm-communication So comparing with vhost-user iotlb handling is different: With vhost-user guest trusts the vhost-user backend on the host. With vhost-pci we can strive to limit the trust to qemu only. The switch running within a VM does not have to be trusted. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QVu4iwAu01Sth84VZshQde97x3FW1E1ua_YXVKs-65vhQ@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QVu4iwAu01Sth84VZshQde97x3FW1E1ua_YXVKs-65vhQ@mail.gmail.com> @ 2017-12-07 16:47 ` Michael S. Tsirkin [not found] ` <CAJSP0QVnukGD3Afu9myv=v5OjqrPDpXu6JL3Tpf+Cdk=em9V3w@mail.gmail.com> 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-07 16:47 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On Thu, Dec 07, 2017 at 04:29:45PM +0000, Stefan Hajnoczi wrote: > On Thu, Dec 7, 2017 at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Dec 07, 2017 at 01:08:04PM +0000, Stefan Hajnoczi wrote: > >> Instead of responding individually to these points, I hope this will > >> explain my perspective. Let me know if you do want individual > >> responses, I'm happy to talk more about the points above but I think > >> the biggest difference is our perspective on this: > >> > >> Existing vhost-user slave code should be able to run on top of > >> vhost-pci. For example, QEMU's > >> contrib/vhost-user-scsi/vhost-user-scsi.c should work inside the guest > >> with only minimal changes to the source file (i.e. today it explicitly > >> opens a UNIX domain socket and that should be done by libvhost-user > >> instead). It shouldn't be hard to add vhost-pci vfio support to > >> contrib/libvhost-user/ alongside the existing UNIX domain socket code. > >> > >> This seems pretty easy to achieve with the vhost-pci PCI adapter that > >> I've described but I'm not sure how to implement libvhost-user on top > >> of vhost-pci vfio if the device doesn't expose the vhost-user > >> protocol. > >> > >> I think this is a really important goal. Let's use a single > >> vhost-user software stack instead of creating a separate one for guest > >> code only. > >> > >> Do you agree that the vhost-user software stack should be shared > >> between host userspace and guest code as much as possible? > > > > > > > > The sharing you propose is not necessarily practical because the security goals > > of the two are different. > > > > It seems that the best motivation presentation is still the original rfc > > > > http://virtualization.linux-foundation.narkive.com/A7FkzAgp/rfc-vhost-user-enhancements-for-vm2vm-communication > > > > So comparing with vhost-user iotlb handling is different: > > > > With vhost-user guest trusts the vhost-user backend on the host. > > > > With vhost-pci we can strive to limit the trust to qemu only. > > The switch running within a VM does not have to be trusted. > > Can you give a concrete example? > > I have an idea about what you're saying but it may be wrong: > > Today the iotlb mechanism in vhost-user does not actually enforce > memory permissions. The vhost-user slave has full access to mmapped > memory regions even when iotlb is enabled. Currently the iotlb just > adds an indirection layer but no real security. (Is this correct?) Not exactly. iotlb protects against malicious drivers within guest. But yes, not against a vhost-user driver on the host. > Are you saying the vhost-pci device code in QEMU should enforce iotlb > permissions so the vhost-user slave guest only has access to memory > regions that are allowed by the iotlb? Yes. > This is a weak way to enforce memory permissions. If the guest is > able to exploit a bug in QEMU then it has full memory access. That's par for the course though. We don't have many of these. If you assume qemu is insecure, using a theoretical kernel-based mechanism does not add much since kernel exploits are pretty common too :). > It's a > security problem waiting to happen It's better security than running the switch on host though. > and QEMU generally doesn't > implement things this way. Not sure what does this mean. > A stronger solution is for the vhost-user master to control memory > protection and to disallow the vhost-user slave from changing memory > protection. I think the kernel mechanism to support this does not > exist today. Such a mechanism would also make the vhost-user host > userspace use case secure. The kernel mechanism to do this would > definitely be useful outside of virtualization too. > > Stefan In theory, maybe. But I'm not up to implementing this, it is very far from trivial. We can do a QEMU based one and then add the kernel based one on top when it surfaces. Also I forgot - this has some performance advantages too. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QVnukGD3Afu9myv=v5OjqrPDpXu6JL3Tpf+Cdk=em9V3w@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QVnukGD3Afu9myv=v5OjqrPDpXu6JL3Tpf+Cdk=em9V3w@mail.gmail.com> @ 2017-12-07 17:38 ` Michael S. Tsirkin [not found] ` <CAJSP0QX4V64OoU4-Dhb93MUZ9Rz0FPR-La5Xq4_yqGH7SG6PjQ@mail.gmail.com> 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-07 17:38 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On Thu, Dec 07, 2017 at 05:29:14PM +0000, Stefan Hajnoczi wrote: > On Thu, Dec 7, 2017 at 4:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Dec 07, 2017 at 04:29:45PM +0000, Stefan Hajnoczi wrote: > >> On Thu, Dec 7, 2017 at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Thu, Dec 07, 2017 at 01:08:04PM +0000, Stefan Hajnoczi wrote: > >> >> Instead of responding individually to these points, I hope this will > >> >> explain my perspective. Let me know if you do want individual > >> >> responses, I'm happy to talk more about the points above but I think > >> >> the biggest difference is our perspective on this: > >> >> > >> >> Existing vhost-user slave code should be able to run on top of > >> >> vhost-pci. For example, QEMU's > >> >> contrib/vhost-user-scsi/vhost-user-scsi.c should work inside the guest > >> >> with only minimal changes to the source file (i.e. today it explicitly > >> >> opens a UNIX domain socket and that should be done by libvhost-user > >> >> instead). It shouldn't be hard to add vhost-pci vfio support to > >> >> contrib/libvhost-user/ alongside the existing UNIX domain socket code. > >> >> > >> >> This seems pretty easy to achieve with the vhost-pci PCI adapter that > >> >> I've described but I'm not sure how to implement libvhost-user on top > >> >> of vhost-pci vfio if the device doesn't expose the vhost-user > >> >> protocol. > >> >> > >> >> I think this is a really important goal. Let's use a single > >> >> vhost-user software stack instead of creating a separate one for guest > >> >> code only. > >> >> > >> >> Do you agree that the vhost-user software stack should be shared > >> >> between host userspace and guest code as much as possible? > >> > > >> > > >> > > >> > The sharing you propose is not necessarily practical because the security goals > >> > of the two are different. > >> > > >> > It seems that the best motivation presentation is still the original rfc > >> > > >> > http://virtualization.linux-foundation.narkive.com/A7FkzAgp/rfc-vhost-user-enhancements-for-vm2vm-communication > >> > > >> > So comparing with vhost-user iotlb handling is different: > >> > > >> > With vhost-user guest trusts the vhost-user backend on the host. > >> > > >> > With vhost-pci we can strive to limit the trust to qemu only. > >> > The switch running within a VM does not have to be trusted. > >> > >> Can you give a concrete example? > >> > >> I have an idea about what you're saying but it may be wrong: > >> > >> Today the iotlb mechanism in vhost-user does not actually enforce > >> memory permissions. The vhost-user slave has full access to mmapped > >> memory regions even when iotlb is enabled. Currently the iotlb just > >> adds an indirection layer but no real security. (Is this correct?) > > > > Not exactly. iotlb protects against malicious drivers within guest. > > But yes, not against a vhost-user driver on the host. > > > >> Are you saying the vhost-pci device code in QEMU should enforce iotlb > >> permissions so the vhost-user slave guest only has access to memory > >> regions that are allowed by the iotlb? > > > > Yes. > > Okay, thanks for confirming. > > This can be supported by the approach I've described. The vhost-pci > QEMU code has control over the BAR memory so it can prevent the guest > from accessing regions that are not allowed by the iotlb. > > Inside the guest the vhost-user slave still has the memory region > descriptions and sends iotlb messages. This is completely compatible > with the libvirt-user APIs and existing vhost-user slave code can run > fine. The only unique thing is that guest accesses to memory regions > not allowed by the iotlb do not work because QEMU has prevented it. I don't think this can work since suddenly you need to map full IOMMU address space into BAR. Besides, this means implementing iotlb in both qemu and guest. > If better performance is needed then it might be possible to optimize > this interface by handling most or even all of the iotlb stuff in QEMU > vhost-pci code and not exposing it to the vhost-user slave in the > guest. But it doesn't change the fact that the vhost-user protocol > can be used and the same software stack works. For one, the iotlb part would be out of scope then. Instead you would have code to offset from BAR. > Do you have a concrete example of why sharing the same vhost-user > software stack inside the guest can't work? With enough dedication some code might be shared. OTOH reusing virtio gains you a ready feature negotiation and discovery protocol. I'm not convinced which has more value, and the second proposal has been implemented already. > >> and QEMU generally doesn't > >> implement things this way. > > > > Not sure what does this mean. > > It's the reason why virtio-9p has a separate virtfs-proxy-helper > program. Root is needed to set file uid/gids. Instead of running > QEMU as root, there is a separate helper process that handles the > privileged operations. It slows things down and makes the codebase > larger but it prevents the guest from getting root in case of QEMU > bugs. > > The reason why VMs are considered more secure than containers is > because of the extra level of isolation provided by running device > emulation in an unprivileged userspace process. If you change this > model then QEMU loses the "security in depth" advantage. > > Stefan I don't see where vhost-pci needs QEMU to run as root though. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QX4V64OoU4-Dhb93MUZ9Rz0FPR-La5Xq4_yqGH7SG6PjQ@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QX4V64OoU4-Dhb93MUZ9Rz0FPR-La5Xq4_yqGH7SG6PjQ@mail.gmail.com> @ 2017-12-07 23:54 ` Michael S. Tsirkin 2017-12-08 6:43 ` Wei Wang [not found] ` <CAJSP0QXYMVBidUd5-NJb5FDYbc6wSkNYgdadjk8+NXvwosLMPw@mail.gmail.com> 0 siblings, 2 replies; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-07 23:54 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: > On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Dec 07, 2017 at 05:29:14PM +0000, Stefan Hajnoczi wrote: > >> On Thu, Dec 7, 2017 at 4:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Thu, Dec 07, 2017 at 04:29:45PM +0000, Stefan Hajnoczi wrote: > >> >> On Thu, Dec 7, 2017 at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> > On Thu, Dec 07, 2017 at 01:08:04PM +0000, Stefan Hajnoczi wrote: > >> >> >> Instead of responding individually to these points, I hope this will > >> >> >> explain my perspective. Let me know if you do want individual > >> >> >> responses, I'm happy to talk more about the points above but I think > >> >> >> the biggest difference is our perspective on this: > >> >> >> > >> >> >> Existing vhost-user slave code should be able to run on top of > >> >> >> vhost-pci. For example, QEMU's > >> >> >> contrib/vhost-user-scsi/vhost-user-scsi.c should work inside the guest > >> >> >> with only minimal changes to the source file (i.e. today it explicitly > >> >> >> opens a UNIX domain socket and that should be done by libvhost-user > >> >> >> instead). It shouldn't be hard to add vhost-pci vfio support to > >> >> >> contrib/libvhost-user/ alongside the existing UNIX domain socket code. > >> >> >> > >> >> >> This seems pretty easy to achieve with the vhost-pci PCI adapter that > >> >> >> I've described but I'm not sure how to implement libvhost-user on top > >> >> >> of vhost-pci vfio if the device doesn't expose the vhost-user > >> >> >> protocol. > >> >> >> > >> >> >> I think this is a really important goal. Let's use a single > >> >> >> vhost-user software stack instead of creating a separate one for guest > >> >> >> code only. > >> >> >> > >> >> >> Do you agree that the vhost-user software stack should be shared > >> >> >> between host userspace and guest code as much as possible? > >> >> > > >> >> > > >> >> > > >> >> > The sharing you propose is not necessarily practical because the security goals > >> >> > of the two are different. > >> >> > > >> >> > It seems that the best motivation presentation is still the original rfc > >> >> > > >> >> > http://virtualization.linux-foundation.narkive.com/A7FkzAgp/rfc-vhost-user-enhancements-for-vm2vm-communication > >> >> > > >> >> > So comparing with vhost-user iotlb handling is different: > >> >> > > >> >> > With vhost-user guest trusts the vhost-user backend on the host. > >> >> > > >> >> > With vhost-pci we can strive to limit the trust to qemu only. > >> >> > The switch running within a VM does not have to be trusted. > >> >> > >> >> Can you give a concrete example? > >> >> > >> >> I have an idea about what you're saying but it may be wrong: > >> >> > >> >> Today the iotlb mechanism in vhost-user does not actually enforce > >> >> memory permissions. The vhost-user slave has full access to mmapped > >> >> memory regions even when iotlb is enabled. Currently the iotlb just > >> >> adds an indirection layer but no real security. (Is this correct?) > >> > > >> > Not exactly. iotlb protects against malicious drivers within guest. > >> > But yes, not against a vhost-user driver on the host. > >> > > >> >> Are you saying the vhost-pci device code in QEMU should enforce iotlb > >> >> permissions so the vhost-user slave guest only has access to memory > >> >> regions that are allowed by the iotlb? > >> > > >> > Yes. > >> > >> Okay, thanks for confirming. > >> > >> This can be supported by the approach I've described. The vhost-pci > >> QEMU code has control over the BAR memory so it can prevent the guest > >> from accessing regions that are not allowed by the iotlb. > >> > >> Inside the guest the vhost-user slave still has the memory region > >> descriptions and sends iotlb messages. This is completely compatible > >> with the libvirt-user APIs and existing vhost-user slave code can run > >> fine. The only unique thing is that guest accesses to memory regions > >> not allowed by the iotlb do not work because QEMU has prevented it. > > > > I don't think this can work since suddenly you need > > to map full IOMMU address space into BAR. > > The BAR covers all guest RAM > but QEMU can set up MemoryRegions that > hide parts from the guest (e.g. reads produce 0xff). I'm not sure how > expensive that is but implementing a strict IOMMU is hard to do > without performance overhead. I'm worried about leaking PAs. fundamentally if you want proper protection you need your device driver to use VA for addressing, On the one hand BAR only needs to be as large as guest PA then. On the other hand it must cover all of guest PA, not just what is accessible to the device. > > > Besides, this means implementing iotlb in both qemu and guest. > > It's free in the guest, the libvhost-user stack already has it. That library is designed to work with a unix domain socket though. We'll need extra kernel code to make a device pretend it's a socket. > >> If better performance is needed then it might be possible to optimize > >> this interface by handling most or even all of the iotlb stuff in QEMU > >> vhost-pci code and not exposing it to the vhost-user slave in the > >> guest. But it doesn't change the fact that the vhost-user protocol > >> can be used and the same software stack works. > > > > For one, the iotlb part would be out of scope then. > > Instead you would have code to offset from BAR. > > > >> Do you have a concrete example of why sharing the same vhost-user > >> software stack inside the guest can't work? > > > > With enough dedication some code might be shared. OTOH reusing virtio > > gains you a ready feature negotiation and discovery protocol. > > > > I'm not convinced which has more value, and the second proposal > > has been implemented already. > > Thanks to you and Wei for the discussion. I've learnt a lot about > vhost-user. If you have questions about what I've posted, please let > me know and we can discuss further. > > The decision is not up to me so I'll just summarize what the vhost-pci > PCI adapter approach achieves: > 1. Just one device and driver > 2. Support for all device types (net, scsi, blk, etc) > 3. Reuse of software stack so vhost-user slaves can run in both host > userspace and the guest > 4. Simpler to debug because the vhost-user protocol used by QEMU is > also used by the guest > > Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-07 23:54 ` Michael S. Tsirkin @ 2017-12-08 6:43 ` Wei Wang [not found] ` <CAJSP0QUAqCzFgVtM1cg_KybdyrZa_FRUHhDN7oLfRjZ2ZVkp4g@mail.gmail.com> [not found] ` <CAJSP0QXYMVBidUd5-NJb5FDYbc6wSkNYgdadjk8+NXvwosLMPw@mail.gmail.com> 1 sibling, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-08 6:43 UTC (permalink / raw) To: Michael S. Tsirkin, Stefan Hajnoczi Cc: virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On 12/08/2017 07:54 AM, Michael S. Tsirkin wrote: > On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: >> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>> On Thu, Dec 07, 2017 at 05:29:14PM +0000, Stefan Hajnoczi wrote: >>>> On Thu, Dec 7, 2017 at 4:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>>>> On Thu, Dec 07, 2017 at 04:29:45PM +0000, Stefan Hajnoczi wrote: >>>>>> On Thu, Dec 7, 2017 at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>>> On Thu, Dec 07, 2017 at 01:08:04PM +0000, Stefan Hajnoczi wrote: >>>>>>>> Instead of responding individually to these points, I hope this will >>>>>>>> explain my perspective. Let me know if you do want individual >>>>>>>> responses, I'm happy to talk more about the points above but I think >>>>>>>> the biggest difference is our perspective on this: >>>>>>>> >>>>>>>> Existing vhost-user slave code should be able to run on top of >>>>>>>> vhost-pci. For example, QEMU's >>>>>>>> contrib/vhost-user-scsi/vhost-user-scsi.c should work inside the guest >>>>>>>> with only minimal changes to the source file (i.e. today it explicitly >>>>>>>> opens a UNIX domain socket and that should be done by libvhost-user >>>>>>>> instead). It shouldn't be hard to add vhost-pci vfio support to >>>>>>>> contrib/libvhost-user/ alongside the existing UNIX domain socket code. >>>>>>>> >>>>>>>> This seems pretty easy to achieve with the vhost-pci PCI adapter that >>>>>>>> I've described but I'm not sure how to implement libvhost-user on top >>>>>>>> of vhost-pci vfio if the device doesn't expose the vhost-user >>>>>>>> protocol. >>>>>>>> >>>>>>>> I think this is a really important goal. Let's use a single >>>>>>>> vhost-user software stack instead of creating a separate one for guest >>>>>>>> code only. >>>>>>>> >>>>>>>> Do you agree that the vhost-user software stack should be shared >>>>>>>> between host userspace and guest code as much as possible? >>>>>>> >>>>>>> >>>>>>> The sharing you propose is not necessarily practical because the security goals >>>>>>> of the two are different. >>>>>>> >>>>>>> It seems that the best motivation presentation is still the original rfc >>>>>>> >>>>>>> http://virtualization.linux-foundation.narkive.com/A7FkzAgp/rfc-vhost-user-enhancements-for-vm2vm-communication >>>>>>> >>>>>>> So comparing with vhost-user iotlb handling is different: >>>>>>> >>>>>>> With vhost-user guest trusts the vhost-user backend on the host. >>>>>>> >>>>>>> With vhost-pci we can strive to limit the trust to qemu only. >>>>>>> The switch running within a VM does not have to be trusted. >>>>>> Can you give a concrete example? >>>>>> >>>>>> I have an idea about what you're saying but it may be wrong: >>>>>> >>>>>> Today the iotlb mechanism in vhost-user does not actually enforce >>>>>> memory permissions. The vhost-user slave has full access to mmapped >>>>>> memory regions even when iotlb is enabled. Currently the iotlb just >>>>>> adds an indirection layer but no real security. (Is this correct?) >>>>> Not exactly. iotlb protects against malicious drivers within guest. >>>>> But yes, not against a vhost-user driver on the host. >>>>> >>>>>> Are you saying the vhost-pci device code in QEMU should enforce iotlb >>>>>> permissions so the vhost-user slave guest only has access to memory >>>>>> regions that are allowed by the iotlb? >>>>> Yes. >>>> Okay, thanks for confirming. >>>> >>>> This can be supported by the approach I've described. The vhost-pci >>>> QEMU code has control over the BAR memory so it can prevent the guest >>>> from accessing regions that are not allowed by the iotlb. >>>> >>>> Inside the guest the vhost-user slave still has the memory region >>>> descriptions and sends iotlb messages. This is completely compatible >>>> with the libvirt-user APIs and existing vhost-user slave code can run >>>> fine. The only unique thing is that guest accesses to memory regions >>>> not allowed by the iotlb do not work because QEMU has prevented it. >>> I don't think this can work since suddenly you need >>> to map full IOMMU address space into BAR. >> The BAR covers all guest RAM >> but QEMU can set up MemoryRegions that >> hide parts from the guest (e.g. reads produce 0xff). I'm not sure how >> expensive that is but implementing a strict IOMMU is hard to do >> without performance overhead. > I'm worried about leaking PAs. > fundamentally if you want proper protection you > need your device driver to use VA for addressing, > > On the one hand BAR only needs to be as large as guest PA then. > On the other hand it must cover all of guest PA, > not just what is accessible to the device. > > >>> Besides, this means implementing iotlb in both qemu and guest. >> It's free in the guest, the libvhost-user stack already has it. > That library is designed to work with a unix domain socket > though. We'll need extra kernel code to make a device > pretend it's a socket. > >>>> If better performance is needed then it might be possible to optimize >>>> this interface by handling most or even all of the iotlb stuff in QEMU >>>> vhost-pci code and not exposing it to the vhost-user slave in the >>>> guest. But it doesn't change the fact that the vhost-user protocol >>>> can be used and the same software stack works. >>> For one, the iotlb part would be out of scope then. >>> Instead you would have code to offset from BAR. >>> >>>> Do you have a concrete example of why sharing the same vhost-user >>>> software stack inside the guest can't work? >>> With enough dedication some code might be shared. OTOH reusing virtio >>> gains you a ready feature negotiation and discovery protocol. >>> >>> I'm not convinced which has more value, and the second proposal >>> has been implemented already. >> Thanks to you and Wei for the discussion. I've learnt a lot about >> vhost-user. If you have questions about what I've posted, please let >> me know and we can discuss further. >> >> The decision is not up to me so I'll just summarize what the vhost-pci >> PCI adapter approach achieves: >> 1. Just one device and driver >> 2. Support for all device types (net, scsi, blk, etc) >> 3. Reuse of software stack so vhost-user slaves can run in both host >> userspace and the guest >> 4. Simpler to debug because the vhost-user protocol used by QEMU is >> also used by the guest >> >> Stefan Thanks Stefan and Michael for the sharing and discussion. I think above 3 and 4 are debatable (e.g. whether it is simpler really depends). 1 and 2 are implementations, I think both approaches could implement the device that way. We originally thought about one device and driver to support all types (called it transformer sometimes :-) ), that would look interesting from research point of view, but from real usage point of view, I think it would be better to have them separated, because: - different device types have different driver logic, mixing them together would cause the driver to look messy. Imagine that a networking driver developer has to go over the block related code to debug, that also increases the difficulty. - For the kernel driver (looks like some people from Huawei are interested in that), I think users may want to see a standard network device and driver. If we mix all the types together, not sure what type of device will it be (misc?). Please let me know if you have a different viewpoint. Btw, from your perspective, what would be the practical usage of vhost-pci-blk? Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QUAqCzFgVtM1cg_KybdyrZa_FRUHhDN7oLfRjZ2ZVkp4g@mail.gmail.com>]
* [virtio-dev] RE: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QUAqCzFgVtM1cg_KybdyrZa_FRUHhDN7oLfRjZ2ZVkp4g@mail.gmail.com> @ 2017-12-09 16:23 ` Wang, Wei W 2017-12-11 11:11 ` [virtio-dev] " Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Wang, Wei W @ 2017-12-09 16:23 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On Friday, December 8, 2017 4:34 PM, Stefan Hajnoczi wrote: > On Fri, Dec 8, 2017 at 6:43 AM, Wei Wang <wei.w.wang@intel.com> wrote: > > On 12/08/2017 07:54 AM, Michael S. Tsirkin wrote: > >> > >> On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: > >>> > >>> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin <mst@redhat.com> > > Thanks Stefan and Michael for the sharing and discussion. I think > > above 3 and 4 are debatable (e.g. whether it is simpler really > > depends). 1 and 2 are implementations, I think both approaches could > > implement the device that way. We originally thought about one device > > and driver to support all types (called it transformer sometimes :-) > > ), that would look interesting from research point of view, but from > > real usage point of view, I think it would be better to have them separated, > because: > > - different device types have different driver logic, mixing them > > together would cause the driver to look messy. Imagine that a > > networking driver developer has to go over the block related code to > > debug, that also increases the difficulty. > > I'm not sure I understand where things get messy because: > 1. The vhost-pci device implementation in QEMU relays messages but has no > device logic, so device-specific messages like VHOST_USER_NET_SET_MTU are > trivial at this layer. > 2. vhost-user slaves only handle certain vhost-user protocol messages. > They handle device-specific messages for their device type only. This is like > vhost drivers today where the ioctl() function returns an error if the ioctl is > not supported by the device. It's not messy. > > Where are you worried about messy driver logic? Probably I didn’t explain well, please let me summarize my thought a little bit, from the perspective of the control path and data path. Control path: the vhost-user messages - I would prefer just have the interaction between QEMUs, instead of relaying to the GuestSlave, because 1) I think the claimed advantage (easier to debug and develop) doesn’t seem very convincing 2) some messages can be directly answered by QemuSlave , and some messages are not useful to give to the GuestSlave (inside the VM), e.g. fds, VhostUserMemoryRegion from SET_MEM_TABLE msg (the device first maps the master memory and gives the offset (in terms of the bar, i.e., where does it sit in the bar) of the mapped gpa to the guest. if we give the raw VhostUserMemoryRegion to the guest, that wouldn’t be usable). Data path: that's the discussion we had about one driver or separate driver for different device types, and this is not related to the control path. I meant if we have one driver for all the types, that driver would look messy, because each type has its own data sending/receiving logic. For example, net type deals with a pair of tx and rx, and transmission is skb based (e.g. xmit_skb), while block type deals with a request queue. If we have one driver, then the driver will include all the things together. The last part is whether to make it a virtio device or a regular pci device I don’t have a strong preference. I think virtio device works fine (e.g. use some bar area to create ioevenfds to solve the "no virtqueue no fds" issue if you and Michael think that's acceptable), and we can reuse some other things like feature negotiation from virtio. But if Michael and you have a decision to make it a regular PCI device, I think that would also work though. Best, Wei ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-09 16:23 ` [virtio-dev] " Wang, Wei W @ 2017-12-11 11:11 ` Stefan Hajnoczi 2017-12-11 13:53 ` [virtio-dev] " Wang, Wei W 0 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-11 11:11 UTC (permalink / raw) To: Wang, Wei W Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com [-- Attachment #1: Type: text/plain, Size: 4968 bytes --] On Sat, Dec 09, 2017 at 04:23:17PM +0000, Wang, Wei W wrote: > On Friday, December 8, 2017 4:34 PM, Stefan Hajnoczi wrote: > > On Fri, Dec 8, 2017 at 6:43 AM, Wei Wang <wei.w.wang@intel.com> wrote: > > > On 12/08/2017 07:54 AM, Michael S. Tsirkin wrote: > > >> > > >> On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: > > >>> > > >>> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin <mst@redhat.com> > > > Thanks Stefan and Michael for the sharing and discussion. I think > > > above 3 and 4 are debatable (e.g. whether it is simpler really > > > depends). 1 and 2 are implementations, I think both approaches could > > > implement the device that way. We originally thought about one device > > > and driver to support all types (called it transformer sometimes :-) > > > ), that would look interesting from research point of view, but from > > > real usage point of view, I think it would be better to have them separated, > > because: > > > - different device types have different driver logic, mixing them > > > together would cause the driver to look messy. Imagine that a > > > networking driver developer has to go over the block related code to > > > debug, that also increases the difficulty. > > > > I'm not sure I understand where things get messy because: > > 1. The vhost-pci device implementation in QEMU relays messages but has no > > device logic, so device-specific messages like VHOST_USER_NET_SET_MTU are > > trivial at this layer. > > 2. vhost-user slaves only handle certain vhost-user protocol messages. > > They handle device-specific messages for their device type only. This is like > > vhost drivers today where the ioctl() function returns an error if the ioctl is > > not supported by the device. It's not messy. > > > > Where are you worried about messy driver logic? > > Probably I didn’t explain well, please let me summarize my thought a little bit, from the perspective of the control path and data path. > > Control path: the vhost-user messages - I would prefer just have the interaction between QEMUs, instead of relaying to the GuestSlave, because > 1) I think the claimed advantage (easier to debug and develop) doesn’t seem very convincing You are defining a mapping from the vhost-user protocol to a custom virtio device interface. Every time the vhost-user protocol (feature bits, messages, etc) is extended it will be necessary to map this new extension to the virtio device interface. That's non-trivial. Mistakes are possible when designing the mapping. Using the vhost-user protocol as the device interface minimizes the effort and risk of mistakes because most messages are relayed 1:1. > 2) some messages can be directly answered by QemuSlave , and some messages are not useful to give to the GuestSlave (inside the VM), e.g. fds, VhostUserMemoryRegion from SET_MEM_TABLE msg (the device first maps the master memory and gives the offset (in terms of the bar, i.e., where does it sit in the bar) of the mapped gpa to the guest. if we give the raw VhostUserMemoryRegion to the guest, that wouldn’t be usable). I agree that QEMU has to handle some of messages, but it should still relay all (possibly modified) messages to the guest. The point of using the vhost-user protocol is not just to use a familiar binary encoding, it's to match the semantics of vhost-user 100%. That way the vhost-user software stack can work either in host userspace or with vhost-pci without significant changes. Using the vhost-user protocol as the device interface doesn't seem any harder than defining a completely new virtio device interface. It has the advantages that I've pointed out: 1. Simple 1:1 mapping for most that is easy to maintain as the vhost-user protocol grows. 2. Compatible with vhost-user so slaves can run in host userspace or the guest. I don't see why it makes sense to define new device interfaces for each device type and create a software stack that is incompatible with vhost-user. > > > Data path: that's the discussion we had about one driver or separate driver for different device types, and this is not related to the control path. > I meant if we have one driver for all the types, that driver would look messy, because each type has its own data sending/receiving logic. For example, net type deals with a pair of tx and rx, and transmission is skb based (e.g. xmit_skb), while block type deals with a request queue. If we have one driver, then the driver will include all the things together. I don't understand this. Why would we have to put all devices (net, scsi, etc) into just one driver? The device drivers sit on top of the vhost-pci driver. For example, imagine a libvhost-user application that handles the net device. The vhost-pci vfio driver would be part of libvhost-user and the application would only emulate the net device (RX and TX queues). Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] RE: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-11 11:11 ` [virtio-dev] " Stefan Hajnoczi @ 2017-12-11 13:53 ` Wang, Wei W 2017-12-12 10:14 ` [virtio-dev] " Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Wang, Wei W @ 2017-12-11 13:53 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com On Monday, December 11, 2017 7:12 PM, Stefan Hajnoczi wrote: > On Sat, Dec 09, 2017 at 04:23:17PM +0000, Wang, Wei W wrote: > > On Friday, December 8, 2017 4:34 PM, Stefan Hajnoczi wrote: > > > On Fri, Dec 8, 2017 at 6:43 AM, Wei Wang <wei.w.wang@intel.com> > wrote: > > > > On 12/08/2017 07:54 AM, Michael S. Tsirkin wrote: > > > >> > > > >> On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: > > > >>> > > > >>> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin > > > >>> <mst@redhat.com> > > > > Thanks Stefan and Michael for the sharing and discussion. I > > > > think above 3 and 4 are debatable (e.g. whether it is simpler > > > > really depends). 1 and 2 are implementations, I think both > > > > approaches could implement the device that way. We originally > > > > thought about one device and driver to support all types (called > > > > it transformer sometimes :-) ), that would look interesting from > > > > research point of view, but from real usage point of view, I > > > > think it would be better to have them separated, > > > because: > > > > - different device types have different driver logic, mixing > > > > them together would cause the driver to look messy. Imagine that > > > > a networking driver developer has to go over the block related > > > > code to debug, that also increases the difficulty. > > > > > > I'm not sure I understand where things get messy because: > > > 1. The vhost-pci device implementation in QEMU relays messages but > > > has no device logic, so device-specific messages like > > > VHOST_USER_NET_SET_MTU are trivial at this layer. > > > 2. vhost-user slaves only handle certain vhost-user protocol messages. > > > They handle device-specific messages for their device type only. > > > This is like vhost drivers today where the ioctl() function > > > returns an error if the ioctl is not supported by the device. It's not messy. > > > > > > Where are you worried about messy driver logic? > > > > Probably I didn’t explain well, please let me summarize my thought a > > little > bit, from the perspective of the control path and data path. > > > > Control path: the vhost-user messages - I would prefer just have the > > interaction between QEMUs, instead of relaying to the GuestSlave, > > because > > 1) I think the claimed advantage (easier to debug and develop) > > doesn’t seem very convincing > > You are defining a mapping from the vhost-user protocol to a custom > virtio device interface. Every time the vhost-user protocol (feature > bits, messages, > etc) is extended it will be necessary to map this new extension to the > virtio device interface. > > That's non-trivial. Mistakes are possible when designing the mapping. > Using the vhost-user protocol as the device interface minimizes the > effort and risk of mistakes because most messages are relayed 1:1. > > > 2) some messages can be directly answered by QemuSlave , and some > messages are not useful to give to the GuestSlave (inside the VM), > e.g. fds, VhostUserMemoryRegion from SET_MEM_TABLE msg (the device > first maps the master memory and gives the offset (in terms of the > bar, i.e., where does it sit in the bar) of the mapped gpa to the > guest. if we give the raw VhostUserMemoryRegion to the guest, that wouldn’t be usable). > > I agree that QEMU has to handle some of messages, but it should still > relay all (possibly modified) messages to the guest. > > The point of using the vhost-user protocol is not just to use a > familiar binary encoding, it's to match the semantics of vhost-user > 100%. That way the vhost-user software stack can work either in host > userspace or with vhost-pci without significant changes. > > Using the vhost-user protocol as the device interface doesn't seem any > harder than defining a completely new virtio device interface. It has > the advantages that I've pointed out: > > 1. Simple 1:1 mapping for most that is easy to maintain as the > vhost-user protocol grows. > > 2. Compatible with vhost-user so slaves can run in host userspace > or the guest. > > I don't see why it makes sense to define new device interfaces for > each device type and create a software stack that is incompatible with vhost-user. I think this 1:1 mapping wouldn't be easy: 1) We will have 2 Qemu side slaves to achieve this bidirectional relaying, that is, the working model will be - master to slave: Master->QemuSlave1->GuestSlave; and - slave to master: GuestSlave->QemuSlave2->Master QemuSlave1 and QemuSlave2 can't be the same piece of code, because QemuSlave1 needs to do some setup with some messages, and QemuSlave2 is more likely to be a true "relayer" (receive and directly pass on) 2) poor re-usability of the QemuSlave and GuestSlave We couldn’t reuse much of the QemuSlave handling code for GuestSlave. For example, for the VHOST_USER_SET_MEM_TABLE msg, all the QemuSlave handling code (please see the vp_slave_set_mem_table function), won't be used by GuestSlave. On the other hand, GuestSlave needs an implementation to reply back to the QEMU device, and this implementation isn't needed by QemuSlave. If we want to run the same piece of the slave code in both QEMU and guest, then we may need "if (QemuSlave) else" in each msg handling entry to choose the code path for QemuSlave and GuestSlave separately. So, ideally we wish to run (reuse) one slave implementation in both QEMU and guest. In practice, we will still need to handle them each case by case, which is no different than maintaining two separate slaves for QEMU and guest, and I'm afraid this would be much more complex. Best, Wei ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-11 13:53 ` [virtio-dev] " Wang, Wei W @ 2017-12-12 10:14 ` Stefan Hajnoczi 2017-12-13 8:11 ` Wei Wang 0 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-12 10:14 UTC (permalink / raw) To: Wang, Wei W Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com [-- Attachment #1: Type: text/plain, Size: 7058 bytes --] On Mon, Dec 11, 2017 at 01:53:40PM +0000, Wang, Wei W wrote: > On Monday, December 11, 2017 7:12 PM, Stefan Hajnoczi wrote: > > On Sat, Dec 09, 2017 at 04:23:17PM +0000, Wang, Wei W wrote: > > > On Friday, December 8, 2017 4:34 PM, Stefan Hajnoczi wrote: > > > > On Fri, Dec 8, 2017 at 6:43 AM, Wei Wang <wei.w.wang@intel.com> > > wrote: > > > > > On 12/08/2017 07:54 AM, Michael S. Tsirkin wrote: > > > > >> > > > > >> On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: > > > > >>> > > > > >>> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin > > > > >>> <mst@redhat.com> > > > > > Thanks Stefan and Michael for the sharing and discussion. I > > > > > think above 3 and 4 are debatable (e.g. whether it is simpler > > > > > really depends). 1 and 2 are implementations, I think both > > > > > approaches could implement the device that way. We originally > > > > > thought about one device and driver to support all types (called > > > > > it transformer sometimes :-) ), that would look interesting from > > > > > research point of view, but from real usage point of view, I > > > > > think it would be better to have them separated, > > > > because: > > > > > - different device types have different driver logic, mixing > > > > > them together would cause the driver to look messy. Imagine that > > > > > a networking driver developer has to go over the block related > > > > > code to debug, that also increases the difficulty. > > > > > > > > I'm not sure I understand where things get messy because: > > > > 1. The vhost-pci device implementation in QEMU relays messages but > > > > has no device logic, so device-specific messages like > > > > VHOST_USER_NET_SET_MTU are trivial at this layer. > > > > 2. vhost-user slaves only handle certain vhost-user protocol messages. > > > > They handle device-specific messages for their device type only. > > > > This is like vhost drivers today where the ioctl() function > > > > returns an error if the ioctl is not supported by the device. It's not messy. > > > > > > > > Where are you worried about messy driver logic? > > > > > > Probably I didn’t explain well, please let me summarize my thought a > > > little > > bit, from the perspective of the control path and data path. > > > > > > Control path: the vhost-user messages - I would prefer just have the > > > interaction between QEMUs, instead of relaying to the GuestSlave, > > > because > > > 1) I think the claimed advantage (easier to debug and develop) > > > doesn’t seem very convincing > > > > You are defining a mapping from the vhost-user protocol to a custom > > virtio device interface. Every time the vhost-user protocol (feature > > bits, messages, > > etc) is extended it will be necessary to map this new extension to the > > virtio device interface. > > > > That's non-trivial. Mistakes are possible when designing the mapping. > > Using the vhost-user protocol as the device interface minimizes the > > effort and risk of mistakes because most messages are relayed 1:1. > > > > > 2) some messages can be directly answered by QemuSlave , and some > > messages are not useful to give to the GuestSlave (inside the VM), > > e.g. fds, VhostUserMemoryRegion from SET_MEM_TABLE msg (the device > > first maps the master memory and gives the offset (in terms of the > > bar, i.e., where does it sit in the bar) of the mapped gpa to the > > guest. if we give the raw VhostUserMemoryRegion to the guest, that wouldn’t be usable). > > > > I agree that QEMU has to handle some of messages, but it should still > > relay all (possibly modified) messages to the guest. > > > > The point of using the vhost-user protocol is not just to use a > > familiar binary encoding, it's to match the semantics of vhost-user > > 100%. That way the vhost-user software stack can work either in host > > userspace or with vhost-pci without significant changes. > > > > Using the vhost-user protocol as the device interface doesn't seem any > > harder than defining a completely new virtio device interface. It has > > the advantages that I've pointed out: > > > > 1. Simple 1:1 mapping for most that is easy to maintain as the > > vhost-user protocol grows. > > > > 2. Compatible with vhost-user so slaves can run in host userspace > > or the guest. > > > > I don't see why it makes sense to define new device interfaces for > > each device type and create a software stack that is incompatible with vhost-user. > > > I think this 1:1 mapping wouldn't be easy: > > 1) We will have 2 Qemu side slaves to achieve this bidirectional relaying, that is, the working model will be > - master to slave: Master->QemuSlave1->GuestSlave; and > - slave to master: GuestSlave->QemuSlave2->Master > QemuSlave1 and QemuSlave2 can't be the same piece of code, because QemuSlave1 needs to do some setup with some messages, and QemuSlave2 is more likely to be a true "relayer" (receive and directly pass on) I mostly agree with this. Some messages cannot be passed through. QEMU needs to process some messages so that makes it both a slave (on the host) and a master (to the guest). > 2) poor re-usability of the QemuSlave and GuestSlave > We couldn’t reuse much of the QemuSlave handling code for GuestSlave. > For example, for the VHOST_USER_SET_MEM_TABLE msg, all the QemuSlave handling code (please see the vp_slave_set_mem_table function), won't be used by GuestSlave. On the other hand, GuestSlave needs an implementation to reply back to the QEMU device, and this implementation isn't needed by QemuSlave. > If we want to run the same piece of the slave code in both QEMU and guest, then we may need "if (QemuSlave) else" in each msg handling entry to choose the code path for QemuSlave and GuestSlave separately. > So, ideally we wish to run (reuse) one slave implementation in both QEMU and guest. In practice, we will still need to handle them each case by case, which is no different than maintaining two separate slaves for QEMU and guest, and I'm afraid this would be much more complex. Are you saying QEMU's vhost-pci code cannot be reused by guest slaves? If so, I agree and it was not my intention to run the same slave code in QEMU and the guest. When I referred to reusing the vhost-user software stack I meant something else: 1. contrib/libvhost-user/ is a vhost-user slave library. QEMU itself does not use it but external programs may use it to avoid reimplementing vhost-user and vrings. Currently this code handles the vhost-user protocol over UNIX domain sockets, but it's possible to add vfio vhost-pci support. Programs using libvhost-user would be able to take advantage of vhost-pci easily (no big changes required). 2. DPDK and other codebases that implement custom vhost-user slaves are also easy to update for vhost-pci since the same protocol is used. Only the lowest layer of vhost-user slave code needs to be touched. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-12 10:14 ` [virtio-dev] " Stefan Hajnoczi @ 2017-12-13 8:11 ` Wei Wang [not found] ` <20171213123521.GL16782@stefanha-x1.localdomain> 0 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-13 8:11 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com On 12/12/2017 06:14 PM, Stefan Hajnoczi wrote: > On Mon, Dec 11, 2017 at 01:53:40PM +0000, Wang, Wei W wrote: >> On Monday, December 11, 2017 7:12 PM, Stefan Hajnoczi wrote: >>> On Sat, Dec 09, 2017 at 04:23:17PM +0000, Wang, Wei W wrote: >>>> On Friday, December 8, 2017 4:34 PM, Stefan Hajnoczi wrote: >>>>> On Fri, Dec 8, 2017 at 6:43 AM, Wei Wang <wei.w.wang@intel.com> >>> wrote: >>>>>> On 12/08/2017 07:54 AM, Michael S. Tsirkin wrote: >>>>>>> On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: >>>>>>>> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin >>>>>>>> <mst@redhat.com> >>>>>> Thanks Stefan and Michael for the sharing and discussion. I >>>>>> think above 3 and 4 are debatable (e.g. whether it is simpler >>>>>> really depends). 1 and 2 are implementations, I think both >>>>>> approaches could implement the device that way. We originally >>>>>> thought about one device and driver to support all types (called >>>>>> it transformer sometimes :-) ), that would look interesting from >>>>>> research point of view, but from real usage point of view, I >>>>>> think it would be better to have them separated, >>>>> because: >>>>>> - different device types have different driver logic, mixing >>>>>> them together would cause the driver to look messy. Imagine that >>>>>> a networking driver developer has to go over the block related >>>>>> code to debug, that also increases the difficulty. >>>>> I'm not sure I understand where things get messy because: >>>>> 1. The vhost-pci device implementation in QEMU relays messages but >>>>> has no device logic, so device-specific messages like >>>>> VHOST_USER_NET_SET_MTU are trivial at this layer. >>>>> 2. vhost-user slaves only handle certain vhost-user protocol messages. >>>>> They handle device-specific messages for their device type only. >>>>> This is like vhost drivers today where the ioctl() function >>>>> returns an error if the ioctl is not supported by the device. It's not messy. >>>>> >>>>> Where are you worried about messy driver logic? >>>> Probably I didn’t explain well, please let me summarize my thought a >>>> little >>> bit, from the perspective of the control path and data path. >>>> Control path: the vhost-user messages - I would prefer just have the >>>> interaction between QEMUs, instead of relaying to the GuestSlave, >>>> because >>>> 1) I think the claimed advantage (easier to debug and develop) >>>> doesn’t seem very convincing >>> You are defining a mapping from the vhost-user protocol to a custom >>> virtio device interface. Every time the vhost-user protocol (feature >>> bits, messages, >>> etc) is extended it will be necessary to map this new extension to the >>> virtio device interface. >>> >>> That's non-trivial. Mistakes are possible when designing the mapping. >>> Using the vhost-user protocol as the device interface minimizes the >>> effort and risk of mistakes because most messages are relayed 1:1. >>> >>>> 2) some messages can be directly answered by QemuSlave , and some >>> messages are not useful to give to the GuestSlave (inside the VM), >>> e.g. fds, VhostUserMemoryRegion from SET_MEM_TABLE msg (the device >>> first maps the master memory and gives the offset (in terms of the >>> bar, i.e., where does it sit in the bar) of the mapped gpa to the >>> guest. if we give the raw VhostUserMemoryRegion to the guest, that wouldn’t be usable). >>> >>> I agree that QEMU has to handle some of messages, but it should still >>> relay all (possibly modified) messages to the guest. >>> >>> The point of using the vhost-user protocol is not just to use a >>> familiar binary encoding, it's to match the semantics of vhost-user >>> 100%. That way the vhost-user software stack can work either in host >>> userspace or with vhost-pci without significant changes. >>> >>> Using the vhost-user protocol as the device interface doesn't seem any >>> harder than defining a completely new virtio device interface. It has >>> the advantages that I've pointed out: >>> >>> 1. Simple 1:1 mapping for most that is easy to maintain as the >>> vhost-user protocol grows. >>> >>> 2. Compatible with vhost-user so slaves can run in host userspace >>> or the guest. >>> >>> I don't see why it makes sense to define new device interfaces for >>> each device type and create a software stack that is incompatible with vhost-user. >> >> I think this 1:1 mapping wouldn't be easy: >> >> 1) We will have 2 Qemu side slaves to achieve this bidirectional relaying, that is, the working model will be >> - master to slave: Master->QemuSlave1->GuestSlave; and >> - slave to master: GuestSlave->QemuSlave2->Master >> QemuSlave1 and QemuSlave2 can't be the same piece of code, because QemuSlave1 needs to do some setup with some messages, and QemuSlave2 is more likely to be a true "relayer" (receive and directly pass on) > I mostly agree with this. Some messages cannot be passed through. QEMU > needs to process some messages so that makes it both a slave (on the > host) and a master (to the guest). > >> 2) poor re-usability of the QemuSlave and GuestSlave >> We couldn’t reuse much of the QemuSlave handling code for GuestSlave. >> For example, for the VHOST_USER_SET_MEM_TABLE msg, all the QemuSlave handling code (please see the vp_slave_set_mem_table function), won't be used by GuestSlave. On the other hand, GuestSlave needs an implementation to reply back to the QEMU device, and this implementation isn't needed by QemuSlave. >> If we want to run the same piece of the slave code in both QEMU and guest, then we may need "if (QemuSlave) else" in each msg handling entry to choose the code path for QemuSlave and GuestSlave separately. >> So, ideally we wish to run (reuse) one slave implementation in both QEMU and guest. In practice, we will still need to handle them each case by case, which is no different than maintaining two separate slaves for QEMU and guest, and I'm afraid this would be much more complex. > Are you saying QEMU's vhost-pci code cannot be reused by guest slaves? > If so, I agree and it was not my intention to run the same slave code in > QEMU and the guest. Yes, it is too difficult to reuse in practice. > > When I referred to reusing the vhost-user software stack I meant > something else: > > 1. contrib/libvhost-user/ is a vhost-user slave library. QEMU itself > does not use it but external programs may use it to avoid reimplementing > vhost-user and vrings. Currently this code handles the vhost-user > protocol over UNIX domain sockets, but it's possible to add vfio > vhost-pci support. Programs using libvhost-user would be able to take > advantage of vhost-pci easily (no big changes required). > > 2. DPDK and other codebases that implement custom vhost-user slaves are > also easy to update for vhost-pci since the same protocol is used. Only > the lowest layer of vhost-user slave code needs to be touched. I'm not sure if libvhost-user would be limited to be used by QEMU only in practice. For example, DPDK currently implements its own vhost-user slave, and changing to use libvhost-user may require dpdk to be bound with QEMU, that is, applications like OVS-DPDK will have a dependency on QEMU. Probably people wouldn't want it this way. On the other side, vhost-pci is more coupled with the QEMU implementation, because some of the msg handling will need to do some device setup (e.g. mmap memory and add sub MemoryRegion to the bar). This device emulation related code is specific to QEMU, so I think vhost-pci slave may not be reused by applications other than QEMU. Would it be acceptable to use the vhost-pci slave from this patch series as the initial solution? It is already implemented, and we can investigate the possibility of integrating it into the libvhost-user as the next step. Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <20171213123521.GL16782@stefanha-x1.localdomain>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <20171213123521.GL16782@stefanha-x1.localdomain> @ 2017-12-13 15:01 ` Michael S. Tsirkin [not found] ` <CAJSP0QWHJBL4APkeMt8-P8PFaPF=Vbi0NSnJtU7YX67fJrW=hw@mail.gmail.com> 2017-12-14 5:53 ` Wei Wang 1 sibling, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-13 15:01 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, Maxime Coquelin On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: > I'm not saying that DPDK should use libvhost-user. I'm saying that it's > easy to add vfio vhost-pci support (for the PCI adapter I described) to > DPDK. This patch series would require writing a completely new slave > for vhost-pci because the device interface is so different from > vhost-user. The main question is how appropriate is the vhost user protocol for passing to guests. And I am not sure at this point. Someone should go over vhost user messages and see whether they are safe to pass to guest. If most are then we can try the transparent approach. If most aren't then we can't and might as well use the proposed protocol which at least has code behind it. I took a quick look and I doubt we can do something that is both compatible with the existing vhost-user and will make it possible to extend the protocol without qemu changes. Let's assume I pass a new message over the vhost-user channel. How do we know it's safe to pass it to the guest? That's why we gate any protocol change on a feature bit and must parse all messages. Cc Maxime as well. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QWHJBL4APkeMt8-P8PFaPF=Vbi0NSnJtU7YX67fJrW=hw@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QWHJBL4APkeMt8-P8PFaPF=Vbi0NSnJtU7YX67fJrW=hw@mail.gmail.com> @ 2017-12-13 20:59 ` Michael S. Tsirkin 2017-12-14 15:06 ` Stefan Hajnoczi 2017-12-13 21:50 ` Maxime Coquelin 1 sibling, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-13 20:59 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, Maxime Coquelin > * VHOST_USER_SET_VRING_KICK > > Set up vring kick doorbell (unless bit 8 is set) before sending > VHOST_USER_SET_VRING_KICK to the guest. But guest can't use it, now can it? What guest needs is a mapping to interrupts. > * VHOST_USER_SET_VRING_CALL > > Set up the vring call doorbell (unless bit 8 is set) before sending > VHOST_USER_SET_VRING_CALL to the guest. Same here. what guest needs is mapping from io to notifications, right? --- > > I took a quick look and I doubt we can do something that is both > > compatible with the existing vhost-user and will make it possible to > > extend the protocol without qemu changes. Let's assume I pass a new > > message over the vhost-user channel. How do we know it's safe to pass > > it to the guest? > > > > That's why we gate any protocol change on a feature bit and must parse > > all messages. > > QEMU must parse all messages and cannot pass through unknown messages. > Think of QEMU vhost-pci as both a vhost-user slave to the other VM and > a vhost-user master to the guest. > > QEMU changes are necessary when the vhost protocol is extended. > Device interface changes are only necessary if doorbells or shared > memory regions are added, any other protocol changes do not change the > device interface. > > Stefan I guess you have a different definition of a device interface than myself - I consider it an interface change if a feature bit changes :) -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-13 20:59 ` Michael S. Tsirkin @ 2017-12-14 15:06 ` Stefan Hajnoczi 2017-12-15 10:33 ` Wei Wang 0 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-14 15:06 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Stefan Hajnoczi, Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, Maxime Coquelin [-- Attachment #1: Type: text/plain, Size: 2633 bytes --] On Wed, Dec 13, 2017 at 10:59:10PM +0200, Michael S. Tsirkin wrote: > > * VHOST_USER_SET_VRING_KICK > > > > Set up vring kick doorbell (unless bit 8 is set) before sending > > VHOST_USER_SET_VRING_KICK to the guest. > > But guest can't use it, now can it? > > What guest needs is a mapping to interrupts. ... > > * VHOST_USER_SET_VRING_CALL > > > > Set up the vring call doorbell (unless bit 8 is set) before sending > > VHOST_USER_SET_VRING_CALL to the guest. > > Same here. what guest needs is mapping from io to notifications, > right? The PCI device should contain a BAR with doorbell registers. I don't think a fancy mapping is necessary, instead the device spec should define the BAR layout. When the guest vhost-user slave receives this message it knows it can now begin using the doorbell register. > --- > > > > I took a quick look and I doubt we can do something that is both > > > compatible with the existing vhost-user and will make it possible to > > > extend the protocol without qemu changes. Let's assume I pass a new > > > message over the vhost-user channel. How do we know it's safe to pass > > > it to the guest? > > > > > > That's why we gate any protocol change on a feature bit and must parse > > > all messages. > > > > QEMU must parse all messages and cannot pass through unknown messages. > > Think of QEMU vhost-pci as both a vhost-user slave to the other VM and > > a vhost-user master to the guest. > > > > QEMU changes are necessary when the vhost protocol is extended. > > Device interface changes are only necessary if doorbells or shared > > memory regions are added, any other protocol changes do not change the > > device interface. > > > > Stefan > > I guess you have a different definition of a device interface than > myself - I consider it an interface change if a feature bit changes :) The feature bits are defined in the vhost-user protocol specification, not in the vhost-pci PCI device specification. For example, imagine we are adding the virtio-net MTU feature to the protocol: 1. The VHOST_USER_PROTOCOL_F_MTU feature bit is added to the vhost-user protocol specification. 2. The VHOST_USER_NET_SET_MTU message is added to the vhost-user protocol specification. As a result of this: 1. No PCI adapter resources (BARs, register layout, etc) change. This is why I say the device interface is unchanged. The vhost-pci specification does not change. 2. QEMU vhost-pci code needs to unmask VHOST_USER_PROTOCOL_F_MTU and pass through VHOST_USER_NET_SET_MTU. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 15:06 ` Stefan Hajnoczi @ 2017-12-15 10:33 ` Wei Wang 2017-12-15 12:37 ` Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-15 10:33 UTC (permalink / raw) To: Stefan Hajnoczi, Michael S. Tsirkin Cc: Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, Maxime Coquelin On 12/14/2017 11:06 PM, Stefan Hajnoczi wrote: > On Wed, Dec 13, 2017 at 10:59:10PM +0200, Michael S. Tsirkin wrote: >>> * VHOST_USER_SET_VRING_KICK >>> >>> Set up vring kick doorbell (unless bit 8 is set) before sending >>> VHOST_USER_SET_VRING_KICK to the guest. >> But guest can't use it, now can it? >> >> What guest needs is a mapping to interrupts. > ... >>> * VHOST_USER_SET_VRING_CALL >>> >>> Set up the vring call doorbell (unless bit 8 is set) before sending >>> VHOST_USER_SET_VRING_CALL to the guest. >> Same here. what guest needs is mapping from io to notifications, >> right? > The PCI device should contain a BAR with doorbell registers. I don't > think a fancy mapping is necessary, instead the device spec should > define the BAR layout. > > When the guest vhost-user slave receives this message it knows it can > now begin using the doorbell register. Not really. A doorbell will cause an interrupt to be injected to the master device driver, which is not ready to work at that time. The slave driver isn't expected to use the doorbell until the master is ready by sending the last message VHOST_USER_SET_VHOST_PCI to link UP the slave device. So I think passing the fd msg to guest doesn't have a value in terms of functionality. Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-15 10:33 ` Wei Wang @ 2017-12-15 12:37 ` Stefan Hajnoczi 0 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-15 12:37 UTC (permalink / raw) To: Wei Wang Cc: Michael S. Tsirkin, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, Maxime Coquelin [-- Attachment #1: Type: text/plain, Size: 1840 bytes --] On Fri, Dec 15, 2017 at 06:33:14PM +0800, Wei Wang wrote: > On 12/14/2017 11:06 PM, Stefan Hajnoczi wrote: > > On Wed, Dec 13, 2017 at 10:59:10PM +0200, Michael S. Tsirkin wrote: > > > > * VHOST_USER_SET_VRING_KICK > > > > > > > > Set up vring kick doorbell (unless bit 8 is set) before sending > > > > VHOST_USER_SET_VRING_KICK to the guest. > > > But guest can't use it, now can it? > > > > > > What guest needs is a mapping to interrupts. > > ... > > > > * VHOST_USER_SET_VRING_CALL > > > > > > > > Set up the vring call doorbell (unless bit 8 is set) before sending > > > > VHOST_USER_SET_VRING_CALL to the guest. > > > Same here. what guest needs is mapping from io to notifications, > > > right? > > The PCI device should contain a BAR with doorbell registers. I don't > > think a fancy mapping is necessary, instead the device spec should > > define the BAR layout. > > > > When the guest vhost-user slave receives this message it knows it can > > now begin using the doorbell register. > > Not really. A doorbell will cause an interrupt to be injected to the master > device driver, which is not ready to work at that time. The slave driver > isn't expected to use the doorbell until the master is ready by sending the > last message VHOST_USER_SET_VHOST_PCI to link UP the slave device. > > So I think passing the fd msg to guest doesn't have a value in terms of > functionality. The new VHOST_USER_SET_VHOST_PCI device message is something you defined in this patch series. This email sub-thread is about the vhost-user protocol specification as it exist today. The VHOST_USER_SET_VRING_CALL is the message that allows the guest to begin using the doorbell when VHOST_USER_F_PROTOCOL_FEATURES wasn't negotiated (i.e. the vring is enabled right away). Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QWHJBL4APkeMt8-P8PFaPF=Vbi0NSnJtU7YX67fJrW=hw@mail.gmail.com> 2017-12-13 20:59 ` Michael S. Tsirkin @ 2017-12-13 21:50 ` Maxime Coquelin 2017-12-14 15:46 ` Stefan Hajnoczi 1 sibling, 1 reply; 77+ messages in thread From: Maxime Coquelin @ 2017-12-13 21:50 UTC (permalink / raw) To: Stefan Hajnoczi, Michael S. Tsirkin Cc: Wei Wang, Stefan Hajnoczi, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com Hi Stefan, On 12/13/2017 09:08 PM, Stefan Hajnoczi wrote: > On Wed, Dec 13, 2017 at 3:01 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: >>> I'm not saying that DPDK should use libvhost-user. I'm saying that it's >>> easy to add vfio vhost-pci support (for the PCI adapter I described) to >>> DPDK. This patch series would require writing a completely new slave >>> for vhost-pci because the device interface is so different from >>> vhost-user. >> >> The main question is how appropriate is the vhost user protocol >> for passing to guests. And I am not sure at this point. >> >> Someone should go over vhost user messages and see whether they are safe >> to pass to guest. If most are then we can try the transparent approach. >> If most aren't then we can't and might as well use the proposed protocol >> which at least has code behind it. > > I have done that: > ... > * VHOST_USER_SET_MEM_TABLE > > Set up BARs before sending a VHOST_USER_SET_MEM_TABLE to the guest. It would require to filter out userspace_addr from the payload not to leak other QEMU process VAs to the guest. Maxime --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-13 21:50 ` Maxime Coquelin @ 2017-12-14 15:46 ` Stefan Hajnoczi 2017-12-14 16:27 ` Michael S. Tsirkin 0 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-14 15:46 UTC (permalink / raw) To: Maxime Coquelin Cc: Stefan Hajnoczi, Michael S. Tsirkin, Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com [-- Attachment #1: Type: text/plain, Size: 2003 bytes --] On Wed, Dec 13, 2017 at 10:50:11PM +0100, Maxime Coquelin wrote: > On 12/13/2017 09:08 PM, Stefan Hajnoczi wrote: > > On Wed, Dec 13, 2017 at 3:01 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: > > > > I'm not saying that DPDK should use libvhost-user. I'm saying that it's > > > > easy to add vfio vhost-pci support (for the PCI adapter I described) to > > > > DPDK. This patch series would require writing a completely new slave > > > > for vhost-pci because the device interface is so different from > > > > vhost-user. > > > > > > The main question is how appropriate is the vhost user protocol > > > for passing to guests. And I am not sure at this point. > > > > > > Someone should go over vhost user messages and see whether they are safe > > > to pass to guest. If most are then we can try the transparent approach. > > > If most aren't then we can't and might as well use the proposed protocol > > > which at least has code behind it. > > > > I have done that: > > > ... > > * VHOST_USER_SET_MEM_TABLE > > > > Set up BARs before sending a VHOST_USER_SET_MEM_TABLE to the guest. > > It would require to filter out userspace_addr from the payload not to > leak other QEMU process VAs to the guest. QEMU's vhost-user master implementation is insecure because it leaks QEMU process VAs. This also affects vhost-user host processes, not just vhost-pci. The QEMU vhost-user master could send an post-IOMMU guest physical addresses whereever the vhost-user protocol specification says "user address". That way no address space information is leaked although it does leak IOMMU mappings. If we want to hide the IOMMU mappings too then we need another logical address space (kind a randomized ramaddr_t). Anyway, my point is that the current vhost-user master implementation is insecure and should be fixed. vhost-pci doesn't need to worry about this issue. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 15:46 ` Stefan Hajnoczi @ 2017-12-14 16:27 ` Michael S. Tsirkin 2017-12-14 16:39 ` Maxime Coquelin 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-14 16:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Maxime Coquelin, Stefan Hajnoczi, Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, snabb-devel, dgilbert On Thu, Dec 14, 2017 at 03:46:56PM +0000, Stefan Hajnoczi wrote: > On Wed, Dec 13, 2017 at 10:50:11PM +0100, Maxime Coquelin wrote: > > On 12/13/2017 09:08 PM, Stefan Hajnoczi wrote: > > > On Wed, Dec 13, 2017 at 3:01 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: > > > > > I'm not saying that DPDK should use libvhost-user. I'm saying that it's > > > > > easy to add vfio vhost-pci support (for the PCI adapter I described) to > > > > > DPDK. This patch series would require writing a completely new slave > > > > > for vhost-pci because the device interface is so different from > > > > > vhost-user. > > > > > > > > The main question is how appropriate is the vhost user protocol > > > > for passing to guests. And I am not sure at this point. > > > > > > > > Someone should go over vhost user messages and see whether they are safe > > > > to pass to guest. If most are then we can try the transparent approach. > > > > If most aren't then we can't and might as well use the proposed protocol > > > > which at least has code behind it. > > > > > > I have done that: > > > > > ... > > > * VHOST_USER_SET_MEM_TABLE > > > > > > Set up BARs before sending a VHOST_USER_SET_MEM_TABLE to the guest. > > > > It would require to filter out userspace_addr from the payload not to > > leak other QEMU process VAs to the guest. > > QEMU's vhost-user master implementation is insecure because it leaks > QEMU process VAs. This also affects vhost-user host processes, not just > vhost-pci. > > The QEMU vhost-user master could send an post-IOMMU guest physical > addresses whereever the vhost-user protocol specification says "user > address". That way no address space information is leaked although it > does leak IOMMU mappings. > > If we want to hide the IOMMU mappings too then we need another logical > address space (kind a randomized ramaddr_t). > > Anyway, my point is that the current vhost-user master implementation is > insecure and should be fixed. vhost-pci doesn't need to worry about > this issue. > > Stefan I was going to make this point too. It does not look like anyone uses userspace_addr. It might have been a mistake to put it there - maybe we should have reused it for map offset. It does not look like anyone uses this for anything. How about we put zero, or a copy of the GPA there? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 16:27 ` Michael S. Tsirkin @ 2017-12-14 16:39 ` Maxime Coquelin 2017-12-14 16:40 ` Michael S. Tsirkin 0 siblings, 1 reply; 77+ messages in thread From: Maxime Coquelin @ 2017-12-14 16:39 UTC (permalink / raw) To: Michael S. Tsirkin, Stefan Hajnoczi Cc: Stefan Hajnoczi, Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, snabb-devel, dgilbert On 12/14/2017 05:27 PM, Michael S. Tsirkin wrote: > On Thu, Dec 14, 2017 at 03:46:56PM +0000, Stefan Hajnoczi wrote: >> On Wed, Dec 13, 2017 at 10:50:11PM +0100, Maxime Coquelin wrote: >>> On 12/13/2017 09:08 PM, Stefan Hajnoczi wrote: >>>> On Wed, Dec 13, 2017 at 3:01 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>>>> On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: >>>>>> I'm not saying that DPDK should use libvhost-user. I'm saying that it's >>>>>> easy to add vfio vhost-pci support (for the PCI adapter I described) to >>>>>> DPDK. This patch series would require writing a completely new slave >>>>>> for vhost-pci because the device interface is so different from >>>>>> vhost-user. >>>>> >>>>> The main question is how appropriate is the vhost user protocol >>>>> for passing to guests. And I am not sure at this point. >>>>> >>>>> Someone should go over vhost user messages and see whether they are safe >>>>> to pass to guest. If most are then we can try the transparent approach. >>>>> If most aren't then we can't and might as well use the proposed protocol >>>>> which at least has code behind it. >>>> >>>> I have done that: >>>> >>> ... >>>> * VHOST_USER_SET_MEM_TABLE >>>> >>>> Set up BARs before sending a VHOST_USER_SET_MEM_TABLE to the guest. >>> >>> It would require to filter out userspace_addr from the payload not to >>> leak other QEMU process VAs to the guest. >> >> QEMU's vhost-user master implementation is insecure because it leaks >> QEMU process VAs. This also affects vhost-user host processes, not just >> vhost-pci. >> >> The QEMU vhost-user master could send an post-IOMMU guest physical >> addresses whereever the vhost-user protocol specification says "user >> address". That way no address space information is leaked although it >> does leak IOMMU mappings. >> >> If we want to hide the IOMMU mappings too then we need another logical >> address space (kind a randomized ramaddr_t). >> >> Anyway, my point is that the current vhost-user master implementation is >> insecure and should be fixed. vhost-pci doesn't need to worry about >> this issue. >> >> Stefan > > I was going to make this point too. It does not look like anyone uses > userspace_addr. It might have been a mistake to put it there - > maybe we should have reused it for map offset. > > It does not look like anyone uses this for anything. > > How about we put zero, or a copy of the GPA there? > > It is used when no iommu for the ring addresses, and when iommu is used for the IOTLB update messages. Maxime --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 16:39 ` Maxime Coquelin @ 2017-12-14 16:40 ` Michael S. Tsirkin 2017-12-14 16:50 ` Maxime Coquelin 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-14 16:40 UTC (permalink / raw) To: Maxime Coquelin Cc: Stefan Hajnoczi, Stefan Hajnoczi, Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, snabb-devel, dgilbert On Thu, Dec 14, 2017 at 05:39:19PM +0100, Maxime Coquelin wrote: > > > On 12/14/2017 05:27 PM, Michael S. Tsirkin wrote: > > On Thu, Dec 14, 2017 at 03:46:56PM +0000, Stefan Hajnoczi wrote: > > > On Wed, Dec 13, 2017 at 10:50:11PM +0100, Maxime Coquelin wrote: > > > > On 12/13/2017 09:08 PM, Stefan Hajnoczi wrote: > > > > > On Wed, Dec 13, 2017 at 3:01 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: > > > > > > > I'm not saying that DPDK should use libvhost-user. I'm saying that it's > > > > > > > easy to add vfio vhost-pci support (for the PCI adapter I described) to > > > > > > > DPDK. This patch series would require writing a completely new slave > > > > > > > for vhost-pci because the device interface is so different from > > > > > > > vhost-user. > > > > > > > > > > > > The main question is how appropriate is the vhost user protocol > > > > > > for passing to guests. And I am not sure at this point. > > > > > > > > > > > > Someone should go over vhost user messages and see whether they are safe > > > > > > to pass to guest. If most are then we can try the transparent approach. > > > > > > If most aren't then we can't and might as well use the proposed protocol > > > > > > which at least has code behind it. > > > > > > > > > > I have done that: > > > > > > > > > ... > > > > > * VHOST_USER_SET_MEM_TABLE > > > > > > > > > > Set up BARs before sending a VHOST_USER_SET_MEM_TABLE to the guest. > > > > > > > > It would require to filter out userspace_addr from the payload not to > > > > leak other QEMU process VAs to the guest. > > > > > > QEMU's vhost-user master implementation is insecure because it leaks > > > QEMU process VAs. This also affects vhost-user host processes, not just > > > vhost-pci. > > > > > > The QEMU vhost-user master could send an post-IOMMU guest physical > > > addresses whereever the vhost-user protocol specification says "user > > > address". That way no address space information is leaked although it > > > does leak IOMMU mappings. > > > > > > If we want to hide the IOMMU mappings too then we need another logical > > > address space (kind a randomized ramaddr_t). > > > > > > Anyway, my point is that the current vhost-user master implementation is > > > insecure and should be fixed. vhost-pci doesn't need to worry about > > > this issue. > > > > > > Stefan > > > > I was going to make this point too. It does not look like anyone uses > > userspace_addr. It might have been a mistake to put it there - > > maybe we should have reused it for map offset. > > > > It does not look like anyone uses this for anything. > > > > How about we put zero, or a copy of the GPA there? > > > > > > It is used when no iommu for the ring addresses, and when iommu is used > for the IOTLB update messages. > > Maxime How do clients use it? Why won't GPA do just as well? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 16:40 ` Michael S. Tsirkin @ 2017-12-14 16:50 ` Maxime Coquelin 2017-12-14 18:11 ` Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Maxime Coquelin @ 2017-12-14 16:50 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Stefan Hajnoczi, Stefan Hajnoczi, Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, snabb-devel, dgilbert On 12/14/2017 05:40 PM, Michael S. Tsirkin wrote: > On Thu, Dec 14, 2017 at 05:39:19PM +0100, Maxime Coquelin wrote: >> >> >> On 12/14/2017 05:27 PM, Michael S. Tsirkin wrote: >>> On Thu, Dec 14, 2017 at 03:46:56PM +0000, Stefan Hajnoczi wrote: >>>> On Wed, Dec 13, 2017 at 10:50:11PM +0100, Maxime Coquelin wrote: >>>>> On 12/13/2017 09:08 PM, Stefan Hajnoczi wrote: >>>>>> On Wed, Dec 13, 2017 at 3:01 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>>> On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: >>>>>>>> I'm not saying that DPDK should use libvhost-user. I'm saying that it's >>>>>>>> easy to add vfio vhost-pci support (for the PCI adapter I described) to >>>>>>>> DPDK. This patch series would require writing a completely new slave >>>>>>>> for vhost-pci because the device interface is so different from >>>>>>>> vhost-user. >>>>>>> >>>>>>> The main question is how appropriate is the vhost user protocol >>>>>>> for passing to guests. And I am not sure at this point. >>>>>>> >>>>>>> Someone should go over vhost user messages and see whether they are safe >>>>>>> to pass to guest. If most are then we can try the transparent approach. >>>>>>> If most aren't then we can't and might as well use the proposed protocol >>>>>>> which at least has code behind it. >>>>>> >>>>>> I have done that: >>>>>> >>>>> ... >>>>>> * VHOST_USER_SET_MEM_TABLE >>>>>> >>>>>> Set up BARs before sending a VHOST_USER_SET_MEM_TABLE to the guest. >>>>> >>>>> It would require to filter out userspace_addr from the payload not to >>>>> leak other QEMU process VAs to the guest. >>>> >>>> QEMU's vhost-user master implementation is insecure because it leaks >>>> QEMU process VAs. This also affects vhost-user host processes, not just >>>> vhost-pci. >>>> >>>> The QEMU vhost-user master could send an post-IOMMU guest physical >>>> addresses whereever the vhost-user protocol specification says "user >>>> address". That way no address space information is leaked although it >>>> does leak IOMMU mappings. >>>> >>>> If we want to hide the IOMMU mappings too then we need another logical >>>> address space (kind a randomized ramaddr_t). >>>> >>>> Anyway, my point is that the current vhost-user master implementation is >>>> insecure and should be fixed. vhost-pci doesn't need to worry about >>>> this issue. >>>> >>>> Stefan >>> >>> I was going to make this point too. It does not look like anyone uses >>> userspace_addr. It might have been a mistake to put it there - >>> maybe we should have reused it for map offset. >>> >>> It does not look like anyone uses this for anything. >>> >>> How about we put zero, or a copy of the GPA there? >>> >>> >> >> It is used when no iommu for the ring addresses, and when iommu is used >> for the IOTLB update messages. >> >> Maxime > > How do clients use it? Why won't GPA do just as well? It is used to calculate the offset in the regions, so if we change all to use GPA, it may work without backend change. Maxime --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 16:50 ` Maxime Coquelin @ 2017-12-14 18:11 ` Stefan Hajnoczi 0 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-14 18:11 UTC (permalink / raw) To: Maxime Coquelin Cc: Michael S. Tsirkin, Stefan Hajnoczi, Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com, snabb-devel, dgilbert [-- Attachment #1: Type: text/plain, Size: 3542 bytes --] On Thu, Dec 14, 2017 at 05:50:22PM +0100, Maxime Coquelin wrote: > > > On 12/14/2017 05:40 PM, Michael S. Tsirkin wrote: > > On Thu, Dec 14, 2017 at 05:39:19PM +0100, Maxime Coquelin wrote: > > > > > > > > > On 12/14/2017 05:27 PM, Michael S. Tsirkin wrote: > > > > On Thu, Dec 14, 2017 at 03:46:56PM +0000, Stefan Hajnoczi wrote: > > > > > On Wed, Dec 13, 2017 at 10:50:11PM +0100, Maxime Coquelin wrote: > > > > > > On 12/13/2017 09:08 PM, Stefan Hajnoczi wrote: > > > > > > > On Wed, Dec 13, 2017 at 3:01 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Wed, Dec 13, 2017 at 12:35:21PM +0000, Stefan Hajnoczi wrote: > > > > > > > > > I'm not saying that DPDK should use libvhost-user. I'm saying that it's > > > > > > > > > easy to add vfio vhost-pci support (for the PCI adapter I described) to > > > > > > > > > DPDK. This patch series would require writing a completely new slave > > > > > > > > > for vhost-pci because the device interface is so different from > > > > > > > > > vhost-user. > > > > > > > > > > > > > > > > The main question is how appropriate is the vhost user protocol > > > > > > > > for passing to guests. And I am not sure at this point. > > > > > > > > > > > > > > > > Someone should go over vhost user messages and see whether they are safe > > > > > > > > to pass to guest. If most are then we can try the transparent approach. > > > > > > > > If most aren't then we can't and might as well use the proposed protocol > > > > > > > > which at least has code behind it. > > > > > > > > > > > > > > I have done that: > > > > > > > > > > > > > ... > > > > > > > * VHOST_USER_SET_MEM_TABLE > > > > > > > > > > > > > > Set up BARs before sending a VHOST_USER_SET_MEM_TABLE to the guest. > > > > > > > > > > > > It would require to filter out userspace_addr from the payload not to > > > > > > leak other QEMU process VAs to the guest. > > > > > > > > > > QEMU's vhost-user master implementation is insecure because it leaks > > > > > QEMU process VAs. This also affects vhost-user host processes, not just > > > > > vhost-pci. > > > > > > > > > > The QEMU vhost-user master could send an post-IOMMU guest physical > > > > > addresses whereever the vhost-user protocol specification says "user > > > > > address". That way no address space information is leaked although it > > > > > does leak IOMMU mappings. > > > > > > > > > > If we want to hide the IOMMU mappings too then we need another logical > > > > > address space (kind a randomized ramaddr_t). > > > > > > > > > > Anyway, my point is that the current vhost-user master implementation is > > > > > insecure and should be fixed. vhost-pci doesn't need to worry about > > > > > this issue. > > > > > > > > > > Stefan > > > > > > > > I was going to make this point too. It does not look like anyone uses > > > > userspace_addr. It might have been a mistake to put it there - > > > > maybe we should have reused it for map offset. > > > > > > > > It does not look like anyone uses this for anything. > > > > > > > > How about we put zero, or a copy of the GPA there? > > > > > > > > > > > > > > It is used when no iommu for the ring addresses, and when iommu is used > > > for the IOTLB update messages. > > > > > > Maxime > > > > How do clients use it? Why won't GPA do just as well? > > It is used to calculate the offset in the regions, so if we change all > to use GPA, it may work without backend change. Great. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <20171213123521.GL16782@stefanha-x1.localdomain> 2017-12-13 15:01 ` Michael S. Tsirkin @ 2017-12-14 5:53 ` Wei Wang 2017-12-14 17:32 ` Stefan Hajnoczi 2017-12-14 18:04 ` Stefan Hajnoczi 1 sibling, 2 replies; 77+ messages in thread From: Wei Wang @ 2017-12-14 5:53 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com On 12/13/2017 08:35 PM, Stefan Hajnoczi wrote: > On Wed, Dec 13, 2017 at 04:11:45PM +0800, Wei Wang wrote: > > I think the current approach is fine for a prototype but is not suitable > for wider use by the community because it: > 1. Does not scale to multiple device types (net, scsi, blk, etc) > 2. Does not scale as the vhost-user protocol changes > 3. It is hard to make slaves run in both host userspace and the guest > > It would be good to solve these problems so that vhost-pci can become > successful. It's very hard to fix these things after the code is merged > because guests will depend on the device interface. > > Here are the points in detail (in order of importance): > > 1. Does not scale to multiple device types (net, scsi, blk, etc) > > vhost-user is being applied to new device types beyond virtio-net. > There will be demand for supporting other device types besides > virtio-net with vhost-pci. > > This patch series requires defining a new virtio device type for each > vhost-user device type. It is a lot of work to design a new virtio > device. Additionally, the new virtio device type should become part of > the VIRTIO standard, which can also take some time and requires writing > a standards document. > > 2. Does not scale as the vhost-user protocol changes > > When the vhost-user protocol changes it will be necessary to update the > vhost-pci device interface to reflect those changes. Each protocol > change requires thinking how the virtio devices need to look in order to > support the new behavior. Changes to the vhost-user protocol will > result in changes to the VIRTIO specification for the vhost-pci virtio > devices. > > 3. It is hard to make slaves run in both host userspace and the guest > > If a vhost-user slave wishes to support running in host userspace and > the guest then not much code can be shared between these two modes since > the interfaces are so different. > > How would you solve these issues? 1st one: I think we can factor out a common vhost-pci device layer in QEMU. Specific devices (net, scsi etc) emulation comes on top of it. The vhost-user protocol sets up VhostPCIDev only. So we will have something like this: struct VhostPCINet { struct VhostPCIDev vp_dev; u8 mac[8]; .... } 2nd one: I think we need to view it the other way around: If there is a demand to change the protocol, then where is the demand from? I think mostly it is because there is some new features from the device/driver. That is, we first have already thought about how the virtio device looks like with the new feature, then we add the support to the protocol. I'm not sure how would it cause not scaling well, and how using another GuestSlave-to-QemuMaster changes the story (we will also need to patch the GuestSlave inside the VM to support the vhost-user negotiation of the new feature), in comparison to the standard virtio feature negotiation. 3rd one: I'm not able to solve this one, as discussed, there are too many differences and it's too complex. I prefer the direction of simply gating the vhost-user protocol and deliver to the guest what it should see (just what this patch series shows). You would need to solve this issue to show this direction is simpler :) Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 5:53 ` Wei Wang @ 2017-12-14 17:32 ` Stefan Hajnoczi 2017-12-15 9:10 ` Wei Wang 2017-12-14 18:04 ` Stefan Hajnoczi 1 sibling, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-14 17:32 UTC (permalink / raw) To: Wei Wang Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com [-- Attachment #1: Type: text/plain, Size: 4636 bytes --] On Thu, Dec 14, 2017 at 01:53:16PM +0800, Wei Wang wrote: > On 12/13/2017 08:35 PM, Stefan Hajnoczi wrote: > > On Wed, Dec 13, 2017 at 04:11:45PM +0800, Wei Wang wrote: > > > > I think the current approach is fine for a prototype but is not suitable > > for wider use by the community because it: > > 1. Does not scale to multiple device types (net, scsi, blk, etc) > > 2. Does not scale as the vhost-user protocol changes > > 3. It is hard to make slaves run in both host userspace and the guest > > > > It would be good to solve these problems so that vhost-pci can become > > successful. It's very hard to fix these things after the code is merged > > because guests will depend on the device interface. > > > > Here are the points in detail (in order of importance): > > > > 1. Does not scale to multiple device types (net, scsi, blk, etc) > > > > vhost-user is being applied to new device types beyond virtio-net. > > There will be demand for supporting other device types besides > > virtio-net with vhost-pci. > > > > This patch series requires defining a new virtio device type for each > > vhost-user device type. It is a lot of work to design a new virtio > > device. Additionally, the new virtio device type should become part of > > the VIRTIO standard, which can also take some time and requires writing > > a standards document. > > > > 2. Does not scale as the vhost-user protocol changes > > > > When the vhost-user protocol changes it will be necessary to update the > > vhost-pci device interface to reflect those changes. Each protocol > > change requires thinking how the virtio devices need to look in order to > > support the new behavior. Changes to the vhost-user protocol will > > result in changes to the VIRTIO specification for the vhost-pci virtio > > devices. > > > > 3. It is hard to make slaves run in both host userspace and the guest > > > > If a vhost-user slave wishes to support running in host userspace and > > the guest then not much code can be shared between these two modes since > > the interfaces are so different. > > > > How would you solve these issues? > > 1st one: I think we can factor out a common vhost-pci device layer in QEMU. > Specific devices (net, scsi etc) emulation comes on top of it. The > vhost-user protocol sets up VhostPCIDev only. So we will have something like > this: > > struct VhostPCINet { > struct VhostPCIDev vp_dev; > u8 mac[8]; > .... > } Defining VhostPCIDev is an important step to making it easy to implement other device types. I'm interested is seeing how this would look either in code or in a more detailed outline. I wonder what the device-specific parts will be. This patch series does not implement a fully functional vhost-user-net device so I'm not sure. > 2nd one: I think we need to view it the other way around: If there is a > demand to change the protocol, then where is the demand from? I think mostly > it is because there is some new features from the device/driver. That is, we > first have already thought about how the virtio device looks like with the > new feature, then we add the support to the protocol. The vhost-user protocol will change when people using host userspace slaves decide to change it. They may not know or care about vhost-pci, so the virtio changes will be an afterthought that falls on whoever wants to support vhost-pci. This is why I think it makes a lot more sense to stick to the vhost-user protocol as the vhost-pci slave interface instead of inventing a new interface on top of it. > I'm not sure how would > it cause not scaling well, and how using another GuestSlave-to-QemuMaster > changes the story (we will also need to patch the GuestSlave inside the VM > to support the vhost-user negotiation of the new feature), in comparison to > the standard virtio feature negotiation. Plus the VIRTIO specification needs to be updated. And if the vhost-user protocol change affects all device types then it may be necessary to change multiple virtio devices! This is O(1) vs O(N). > 3rd one: I'm not able to solve this one, as discussed, there are too many > differences and it's too complex. I prefer the direction of simply gating > the vhost-user protocol and deliver to the guest what it should see (just > what this patch series shows). You would need to solve this issue to show > this direction is simpler :) #3 is nice to have but not critical. In the approach I suggested it would be done by implementing vfio vhost-pci for libvhost-user or DPDK. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 17:32 ` Stefan Hajnoczi @ 2017-12-15 9:10 ` Wei Wang 2017-12-15 12:26 ` Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-15 9:10 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com On 12/15/2017 01:32 AM, Stefan Hajnoczi wrote: > On Thu, Dec 14, 2017 at 01:53:16PM +0800, Wei Wang wrote: >> On 12/13/2017 08:35 PM, Stefan Hajnoczi wrote: >>> On Wed, Dec 13, 2017 at 04:11:45PM +0800, Wei Wang wrote: >>> >>> I think the current approach is fine for a prototype but is not suitable >>> for wider use by the community because it: >>> 1. Does not scale to multiple device types (net, scsi, blk, etc) >>> 2. Does not scale as the vhost-user protocol changes >>> 3. It is hard to make slaves run in both host userspace and the guest >>> >>> It would be good to solve these problems so that vhost-pci can become >>> successful. It's very hard to fix these things after the code is merged >>> because guests will depend on the device interface. >>> >>> Here are the points in detail (in order of importance): >>> >>> 1. Does not scale to multiple device types (net, scsi, blk, etc) >>> >>> vhost-user is being applied to new device types beyond virtio-net. >>> There will be demand for supporting other device types besides >>> virtio-net with vhost-pci. >>> >>> This patch series requires defining a new virtio device type for each >>> vhost-user device type. It is a lot of work to design a new virtio >>> device. Additionally, the new virtio device type should become part of >>> the VIRTIO standard, which can also take some time and requires writing >>> a standards document. >>> >>> 2. Does not scale as the vhost-user protocol changes >>> >>> When the vhost-user protocol changes it will be necessary to update the >>> vhost-pci device interface to reflect those changes. Each protocol >>> change requires thinking how the virtio devices need to look in order to >>> support the new behavior. Changes to the vhost-user protocol will >>> result in changes to the VIRTIO specification for the vhost-pci virtio >>> devices. >>> >>> 3. It is hard to make slaves run in both host userspace and the guest >>> >>> If a vhost-user slave wishes to support running in host userspace and >>> the guest then not much code can be shared between these two modes since >>> the interfaces are so different. >>> >>> How would you solve these issues? >> 1st one: I think we can factor out a common vhost-pci device layer in QEMU. >> Specific devices (net, scsi etc) emulation comes on top of it. The >> vhost-user protocol sets up VhostPCIDev only. So we will have something like >> this: >> >> struct VhostPCINet { >> struct VhostPCIDev vp_dev; >> u8 mac[8]; >> .... >> } > Defining VhostPCIDev is an important step to making it easy to implement > other device types. I'm interested is seeing how this would look either > in code or in a more detailed outline. > > I wonder what the device-specific parts will be. This patch series does > not implement a fully functional vhost-user-net device so I'm not sure. I think we can move most of the fields from this series' VhostPCINet to VhostPCIDev: struct VhostPCIDev { VirtIODevice parent_obj; MemoryRegion bar_region; MemoryRegion metadata_region; struct vhost_pci_metadata *metadata; void *remote_mem_base[MAX_REMOTE_REGION]; uint64_t remote_mem_map_size[MAX_REMOTE_REGION]; CharBackend chr_be; }; struct VhostPCINet { struct VhostPCIDev vp_dev; uint32_t host_features; struct vpnet_config config; size_t config_size; uint16_t status; } > >> 2nd one: I think we need to view it the other way around: If there is a >> demand to change the protocol, then where is the demand from? I think mostly >> it is because there is some new features from the device/driver. That is, we >> first have already thought about how the virtio device looks like with the >> new feature, then we add the support to the protocol. > The vhost-user protocol will change when people using host userspace > slaves decide to change it. They may not know or care about vhost-pci, > so the virtio changes will be an afterthought that falls on whoever > wants to support vhost-pci. > > This is why I think it makes a lot more sense to stick to the vhost-user > protocol as the vhost-pci slave interface instead of inventing a new > interface on top of it. I don't think it is different in practice. If the added protocol msg needs some setup on the vhost-pci device side, then we will also need to think about how to use it for the device setup explicitly, with the possibility of delivering the modified msg to the guest (like SET_MEM_TABLE) if using the relaying method. Vhost-pci takes "advantage" of the vhost-user protocol for the inter-VM data path setup. If the added vhost-user message isn't useful for vhost-pci setup, I think the slave don't need to handle it even. > >> I'm not sure how would >> it cause not scaling well, and how using another GuestSlave-to-QemuMaster >> changes the story (we will also need to patch the GuestSlave inside the VM >> to support the vhost-user negotiation of the new feature), in comparison to >> the standard virtio feature negotiation. > Plus the VIRTIO specification needs to be updated. > > And if the vhost-user protocol change affects all device types then it > may be necessary to change multiple virtio devices! This is O(1) vs > O(N). > If this change is common to all the vhost-pci series devices, the change will be made to the vhost-pci layer (i.e. VhostPCIDev) only. Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-15 9:10 ` Wei Wang @ 2017-12-15 12:26 ` Stefan Hajnoczi 0 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-15 12:26 UTC (permalink / raw) To: Wei Wang Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com [-- Attachment #1: Type: text/plain, Size: 5811 bytes --] On Fri, Dec 15, 2017 at 05:10:37PM +0800, Wei Wang wrote: > On 12/15/2017 01:32 AM, Stefan Hajnoczi wrote: > > On Thu, Dec 14, 2017 at 01:53:16PM +0800, Wei Wang wrote: > > > On 12/13/2017 08:35 PM, Stefan Hajnoczi wrote: > > > > On Wed, Dec 13, 2017 at 04:11:45PM +0800, Wei Wang wrote: > > > > > > > > I think the current approach is fine for a prototype but is not suitable > > > > for wider use by the community because it: > > > > 1. Does not scale to multiple device types (net, scsi, blk, etc) > > > > 2. Does not scale as the vhost-user protocol changes > > > > 3. It is hard to make slaves run in both host userspace and the guest > > > > > > > > It would be good to solve these problems so that vhost-pci can become > > > > successful. It's very hard to fix these things after the code is merged > > > > because guests will depend on the device interface. > > > > > > > > Here are the points in detail (in order of importance): > > > > > > > > 1. Does not scale to multiple device types (net, scsi, blk, etc) > > > > > > > > vhost-user is being applied to new device types beyond virtio-net. > > > > There will be demand for supporting other device types besides > > > > virtio-net with vhost-pci. > > > > > > > > This patch series requires defining a new virtio device type for each > > > > vhost-user device type. It is a lot of work to design a new virtio > > > > device. Additionally, the new virtio device type should become part of > > > > the VIRTIO standard, which can also take some time and requires writing > > > > a standards document. > > > > > > > > 2. Does not scale as the vhost-user protocol changes > > > > > > > > When the vhost-user protocol changes it will be necessary to update the > > > > vhost-pci device interface to reflect those changes. Each protocol > > > > change requires thinking how the virtio devices need to look in order to > > > > support the new behavior. Changes to the vhost-user protocol will > > > > result in changes to the VIRTIO specification for the vhost-pci virtio > > > > devices. > > > > > > > > 3. It is hard to make slaves run in both host userspace and the guest > > > > > > > > If a vhost-user slave wishes to support running in host userspace and > > > > the guest then not much code can be shared between these two modes since > > > > the interfaces are so different. > > > > > > > > How would you solve these issues? > > > 1st one: I think we can factor out a common vhost-pci device layer in QEMU. > > > Specific devices (net, scsi etc) emulation comes on top of it. The > > > vhost-user protocol sets up VhostPCIDev only. So we will have something like > > > this: > > > > > > struct VhostPCINet { > > > struct VhostPCIDev vp_dev; > > > u8 mac[8]; > > > .... > > > } > > Defining VhostPCIDev is an important step to making it easy to implement > > other device types. I'm interested is seeing how this would look either > > in code or in a more detailed outline. > > > > I wonder what the device-specific parts will be. This patch series does > > not implement a fully functional vhost-user-net device so I'm not sure. > > I think we can move most of the fields from this series' VhostPCINet to > VhostPCIDev: > > struct VhostPCIDev { > VirtIODevice parent_obj; > MemoryRegion bar_region; > MemoryRegion metadata_region; > struct vhost_pci_metadata *metadata; > void *remote_mem_base[MAX_REMOTE_REGION]; > uint64_t remote_mem_map_size[MAX_REMOTE_REGION]; > CharBackend chr_be; > }; > > struct VhostPCINet { > struct VhostPCIDev vp_dev; > uint32_t host_features; > struct vpnet_config config; > size_t config_size; > uint16_t status; > } > > > > > > > > > 2nd one: I think we need to view it the other way around: If there is a > > > demand to change the protocol, then where is the demand from? I think mostly > > > it is because there is some new features from the device/driver. That is, we > > > first have already thought about how the virtio device looks like with the > > > new feature, then we add the support to the protocol. > > The vhost-user protocol will change when people using host userspace > > slaves decide to change it. They may not know or care about vhost-pci, > > so the virtio changes will be an afterthought that falls on whoever > > wants to support vhost-pci. > > > > This is why I think it makes a lot more sense to stick to the vhost-user > > protocol as the vhost-pci slave interface instead of inventing a new > > interface on top of it. > > I don't think it is different in practice. If the added protocol msg needs > some setup on the vhost-pci device side, then we will also need to think > about how to use it for the device setup explicitly, with the possibility of > delivering the modified msg to the guest (like SET_MEM_TABLE) if using the > relaying method. > > Vhost-pci takes "advantage" of the vhost-user protocol for the inter-VM data > path setup. If the added vhost-user message isn't useful for vhost-pci > setup, I think the slave don't need to handle it even. vhost-pci setup is incomplete in this patch series. Currently the guest slave is unable to participate in feature negotiation or limit the number of queues. Reconnection is also not supported yet. Guest slaves must be able to use a vhost-pci device without requiring QEMU -device options (like max queues, feature bits, etc) for that specific slave implementation. Why? In a cloud environment it is incovenient or impossible for users to add QEMU -device options for the slave implementation they are using. Addressing these issues will require more vhost-pci<->guest slave communication. The number of messages that can be hidden will shrink. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 5:53 ` Wei Wang 2017-12-14 17:32 ` Stefan Hajnoczi @ 2017-12-14 18:04 ` Stefan Hajnoczi 2017-12-15 10:33 ` Wei Wang 1 sibling, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-14 18:04 UTC (permalink / raw) To: Wei Wang Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com [-- Attachment #1: Type: text/plain, Size: 674 bytes --] On Thu, Dec 14, 2017 at 01:53:16PM +0800, Wei Wang wrote: > 3rd one: I'm not able to solve this one, as discussed, there are too many > differences and it's too complex. I prefer the direction of simply gating > the vhost-user protocol and deliver to the guest what it should see (just > what this patch series shows). You would need to solve this issue to show > this direction is simpler :) At the moment the main issue deciding what to do seems to be that we have no patches for the approach I've described. I'll begin working on a patch series. If there is something else you think I could help with, please let me know. I'd like to contribute to vhost-pci. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-14 18:04 ` Stefan Hajnoczi @ 2017-12-15 10:33 ` Wei Wang 2017-12-15 12:00 ` Stefan Hajnoczi 0 siblings, 1 reply; 77+ messages in thread From: Wei Wang @ 2017-12-15 10:33 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com On 12/15/2017 02:04 AM, Stefan Hajnoczi wrote: > On Thu, Dec 14, 2017 at 01:53:16PM +0800, Wei Wang wrote: >> 3rd one: I'm not able to solve this one, as discussed, there are too many >> differences and it's too complex. I prefer the direction of simply gating >> the vhost-user protocol and deliver to the guest what it should see (just >> what this patch series shows). You would need to solve this issue to show >> this direction is simpler :) > At the moment the main issue deciding what to do seems to be that we > have no patches for the approach I've described. I'll begin working on > a patch series. Are you planning to implement the multiple masters and slaves? Master-->QemuSlave1-->GuestSlave Master<--QemuSlave2<--GuestSlave > If there is something else you think I could help with, please let me > know. I'd like to contribute to vhost-pci. > Thanks. Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-15 10:33 ` Wei Wang @ 2017-12-15 12:00 ` Stefan Hajnoczi 0 siblings, 0 replies; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-15 12:00 UTC (permalink / raw) To: Wei Wang Cc: Stefan Hajnoczi, Michael S. Tsirkin, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, pbonzini@redhat.com, marcandre.lureau@redhat.com [-- Attachment #1: Type: text/plain, Size: 1716 bytes --] On Fri, Dec 15, 2017 at 06:33:45PM +0800, Wei Wang wrote: > On 12/15/2017 02:04 AM, Stefan Hajnoczi wrote: > > On Thu, Dec 14, 2017 at 01:53:16PM +0800, Wei Wang wrote: > > > 3rd one: I'm not able to solve this one, as discussed, there are too many > > > differences and it's too complex. I prefer the direction of simply gating > > > the vhost-user protocol and deliver to the guest what it should see (just > > > what this patch series shows). You would need to solve this issue to show > > > this direction is simpler :) > > At the moment the main issue deciding what to do seems to be that we > > have no patches for the approach I've described. I'll begin working on > > a patch series. > > > Are you planning to implement the multiple masters and slaves? > Master-->QemuSlave1-->GuestSlave > Master<--QemuSlave2<--GuestSlave Not sure if we are thinking of the same model. Here's what I have in mind: Master <-> QEMU virtio-vhost-user-slave <-> GuestSlave The virtio-vhost-user-slave device implements a vhost-user slave on the host side and a virtio device that acts as a vhost-user master to the guest. This is a single virtio device that speaks the vhost-user protocol, not a family of different devices. ("virtio-vhost-user-slave" is the name to prevent confusion with your vhost-pci work.) I think your idea of using virtio is good since it avoids duplicating low-level PCI device initialization and queues, so I've decided to try that. I will send a draft device specification later today so you can review it. I think the virtio PCI transport mapping that I'm documenting will be useful whether we choose vhost-pci or virtio-vhost-user-slave. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QXYMVBidUd5-NJb5FDYbc6wSkNYgdadjk8+NXvwosLMPw@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QXYMVBidUd5-NJb5FDYbc6wSkNYgdadjk8+NXvwosLMPw@mail.gmail.com> @ 2017-12-08 14:27 ` Michael S. Tsirkin 2017-12-09 16:08 ` [virtio-dev] " Wang, Wei W 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-08 14:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On Fri, Dec 08, 2017 at 06:08:05AM +0000, Stefan Hajnoczi wrote: > On Thu, Dec 7, 2017 at 11:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: > >> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Thu, Dec 07, 2017 at 05:29:14PM +0000, Stefan Hajnoczi wrote: > >> >> On Thu, Dec 7, 2017 at 4:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> > On Thu, Dec 07, 2017 at 04:29:45PM +0000, Stefan Hajnoczi wrote: > >> >> >> On Thu, Dec 7, 2017 at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> >> > On Thu, Dec 07, 2017 at 01:08:04PM +0000, Stefan Hajnoczi wrote: > >> >> >> >> Instead of responding individually to these points, I hope this will > >> >> >> >> explain my perspective. Let me know if you do want individual > >> >> >> >> responses, I'm happy to talk more about the points above but I think > >> >> >> >> the biggest difference is our perspective on this: > >> >> >> >> > >> >> >> >> Existing vhost-user slave code should be able to run on top of > >> >> >> >> vhost-pci. For example, QEMU's > >> >> >> >> contrib/vhost-user-scsi/vhost-user-scsi.c should work inside the guest > >> >> >> >> with only minimal changes to the source file (i.e. today it explicitly > >> >> >> >> opens a UNIX domain socket and that should be done by libvhost-user > >> >> >> >> instead). It shouldn't be hard to add vhost-pci vfio support to > >> >> >> >> contrib/libvhost-user/ alongside the existing UNIX domain socket code. > >> >> >> >> > >> >> >> >> This seems pretty easy to achieve with the vhost-pci PCI adapter that > >> >> >> >> I've described but I'm not sure how to implement libvhost-user on top > >> >> >> >> of vhost-pci vfio if the device doesn't expose the vhost-user > >> >> >> >> protocol. > >> >> >> >> > >> >> >> >> I think this is a really important goal. Let's use a single > >> >> >> >> vhost-user software stack instead of creating a separate one for guest > >> >> >> >> code only. > >> >> >> >> > >> >> >> >> Do you agree that the vhost-user software stack should be shared > >> >> >> >> between host userspace and guest code as much as possible? > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > The sharing you propose is not necessarily practical because the security goals > >> >> >> > of the two are different. > >> >> >> > > >> >> >> > It seems that the best motivation presentation is still the original rfc > >> >> >> > > >> >> >> > http://virtualization.linux-foundation.narkive.com/A7FkzAgp/rfc-vhost-user-enhancements-for-vm2vm-communication > >> >> >> > > >> >> >> > So comparing with vhost-user iotlb handling is different: > >> >> >> > > >> >> >> > With vhost-user guest trusts the vhost-user backend on the host. > >> >> >> > > >> >> >> > With vhost-pci we can strive to limit the trust to qemu only. > >> >> >> > The switch running within a VM does not have to be trusted. > >> >> >> > >> >> >> Can you give a concrete example? > >> >> >> > >> >> >> I have an idea about what you're saying but it may be wrong: > >> >> >> > >> >> >> Today the iotlb mechanism in vhost-user does not actually enforce > >> >> >> memory permissions. The vhost-user slave has full access to mmapped > >> >> >> memory regions even when iotlb is enabled. Currently the iotlb just > >> >> >> adds an indirection layer but no real security. (Is this correct?) > >> >> > > >> >> > Not exactly. iotlb protects against malicious drivers within guest. > >> >> > But yes, not against a vhost-user driver on the host. > >> >> > > >> >> >> Are you saying the vhost-pci device code in QEMU should enforce iotlb > >> >> >> permissions so the vhost-user slave guest only has access to memory > >> >> >> regions that are allowed by the iotlb? > >> >> > > >> >> > Yes. > >> >> > >> >> Okay, thanks for confirming. > >> >> > >> >> This can be supported by the approach I've described. The vhost-pci > >> >> QEMU code has control over the BAR memory so it can prevent the guest > >> >> from accessing regions that are not allowed by the iotlb. > >> >> > >> >> Inside the guest the vhost-user slave still has the memory region > >> >> descriptions and sends iotlb messages. This is completely compatible > >> >> with the libvirt-user APIs and existing vhost-user slave code can run > >> >> fine. The only unique thing is that guest accesses to memory regions > >> >> not allowed by the iotlb do not work because QEMU has prevented it. > >> > > >> > I don't think this can work since suddenly you need > >> > to map full IOMMU address space into BAR. > >> > >> The BAR covers all guest RAM > >> but QEMU can set up MemoryRegions that > >> hide parts from the guest (e.g. reads produce 0xff). I'm not sure how > >> expensive that is but implementing a strict IOMMU is hard to do > >> without performance overhead. > > > > I'm worried about leaking PAs. > > fundamentally if you want proper protection you > > need your device driver to use VA for addressing, > > > > On the one hand BAR only needs to be as large as guest PA then. > > On the other hand it must cover all of guest PA, > > not just what is accessible to the device. > > A more heavyweight iotlb implementation in QEMU's vhost-pci device > could present only VAs to the vhost-pci driver. It would use > MemoryRegions to map pieces of shared guest memory dynamically. The > only information leak would be the overall guest RAM size because we > still need to set the correct BAR size. I'm not sure this will work. KVM simply isn't designed with a huge number of fragmented regions in mind. Wei, just what is the plan for the IOMMU? How will all virtual addresses fit in a BAR? Maybe we really do want a non-translating IOMMU (leaking PA to userspace but oh well)? > >> > >> > Besides, this means implementing iotlb in both qemu and guest. > >> > >> It's free in the guest, the libvhost-user stack already has it. > > > > That library is designed to work with a unix domain socket > > though. We'll need extra kernel code to make a device > > pretend it's a socket. > > A kernel vhost-pci driver isn't necessary because I don't think there > are in-kernel users. > > A vfio vhost-pci backend can go alongside the UNIX domain socket > backend that exists today in libvhost-user. > > If we want to expose kernel vhost devices via vhost-pci then a > libvhost-user program can translate the vhost-user protocol into > kernel ioctls. For example: > $ vhost-pci-proxy --vhost-pci-addr 00:04.0 --vhost-fd 3 3<>/dev/vhost-net > > The vhost-pci-proxy implements the vhost-user protocol callbacks and > submits ioctls on the vhost kernel device fd. I haven't compared the > kernel ioctl interface vs the vhost-user protocol to see if everything > maps cleanly though. > > Stefan I don't really like this, it's yet another package to install, yet another process to complicate debugging and yet another service that can go down. Maybe vsock can do the trick though? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* [virtio-dev] RE: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-08 14:27 ` Michael S. Tsirkin @ 2017-12-09 16:08 ` Wang, Wei W 0 siblings, 0 replies; 77+ messages in thread From: Wang, Wei W @ 2017-12-09 16:08 UTC (permalink / raw) To: 'Michael S. Tsirkin', Stefan Hajnoczi Cc: virtio-dev@lists.oasis-open.org, Yang, Zhiyong, jan.kiszka@siemens.com, jasowang@redhat.com, avi.cohen@huawei.com, qemu-devel@nongnu.org, Stefan Hajnoczi, pbonzini@redhat.com, marcandre.lureau@redhat.com On Friday, December 8, 2017 10:28 PM, Michael S. Tsirkin wrote: > On Fri, Dec 08, 2017 at 06:08:05AM +0000, Stefan Hajnoczi wrote: > > On Thu, Dec 7, 2017 at 11:54 PM, Michael S. Tsirkin <mst@redhat.com> > wrote: > > > On Thu, Dec 07, 2017 at 06:28:19PM +0000, Stefan Hajnoczi wrote: > > >> On Thu, Dec 7, 2017 at 5:38 PM, Michael S. Tsirkin <mst@redhat.com> > wrote: > > >> > On Thu, Dec 07, 2017 at 05:29:14PM +0000, Stefan Hajnoczi wrote: > > >> >> On Thu, Dec 7, 2017 at 4:47 PM, Michael S. Tsirkin <mst@redhat.com> > wrote: > > >> >> > On Thu, Dec 07, 2017 at 04:29:45PM +0000, Stefan Hajnoczi wrote: > > >> >> >> On Thu, Dec 7, 2017 at 2:02 PM, Michael S. Tsirkin > <mst@redhat.com> wrote: > > >> >> >> > On Thu, Dec 07, 2017 at 01:08:04PM +0000, Stefan Hajnoczi > wrote: > > >> >> >> >> Instead of responding individually to these points, I hope > > >> >> >> >> this will explain my perspective. Let me know if you do > > >> >> >> >> want individual responses, I'm happy to talk more about > > >> >> >> >> the points above but I think the biggest difference is our > perspective on this: > > >> >> >> >> > > >> >> >> >> Existing vhost-user slave code should be able to run on > > >> >> >> >> top of vhost-pci. For example, QEMU's > > >> >> >> >> contrib/vhost-user-scsi/vhost-user-scsi.c should work > > >> >> >> >> inside the guest with only minimal changes to the source > > >> >> >> >> file (i.e. today it explicitly opens a UNIX domain socket > > >> >> >> >> and that should be done by libvhost-user instead). It > > >> >> >> >> shouldn't be hard to add vhost-pci vfio support to > contrib/libvhost-user/ alongside the existing UNIX domain socket code. > > >> >> >> >> > > >> >> >> >> This seems pretty easy to achieve with the vhost-pci PCI > > >> >> >> >> adapter that I've described but I'm not sure how to > > >> >> >> >> implement libvhost-user on top of vhost-pci vfio if the > > >> >> >> >> device doesn't expose the vhost-user protocol. > > >> >> >> >> > > >> >> >> >> I think this is a really important goal. Let's use a > > >> >> >> >> single vhost-user software stack instead of creating a > > >> >> >> >> separate one for guest code only. > > >> >> >> >> > > >> >> >> >> Do you agree that the vhost-user software stack should be > > >> >> >> >> shared between host userspace and guest code as much as > possible? > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > The sharing you propose is not necessarily practical > > >> >> >> > because the security goals of the two are different. > > >> >> >> > > > >> >> >> > It seems that the best motivation presentation is still the > > >> >> >> > original rfc > > >> >> >> > > > >> >> >> > http://virtualization.linux-foundation.narkive.com/A7FkzAgp > > >> >> >> > /rfc-vhost-user-enhancements-for-vm2vm-communication > > >> >> >> > > > >> >> >> > So comparing with vhost-user iotlb handling is different: > > >> >> >> > > > >> >> >> > With vhost-user guest trusts the vhost-user backend on the host. > > >> >> >> > > > >> >> >> > With vhost-pci we can strive to limit the trust to qemu only. > > >> >> >> > The switch running within a VM does not have to be trusted. > > >> >> >> > > >> >> >> Can you give a concrete example? > > >> >> >> > > >> >> >> I have an idea about what you're saying but it may be wrong: > > >> >> >> > > >> >> >> Today the iotlb mechanism in vhost-user does not actually > > >> >> >> enforce memory permissions. The vhost-user slave has full > > >> >> >> access to mmapped memory regions even when iotlb is enabled. > > >> >> >> Currently the iotlb just adds an indirection layer but no > > >> >> >> real security. (Is this correct?) > > >> >> > > > >> >> > Not exactly. iotlb protects against malicious drivers within guest. > > >> >> > But yes, not against a vhost-user driver on the host. > > >> >> > > > >> >> >> Are you saying the vhost-pci device code in QEMU should > > >> >> >> enforce iotlb permissions so the vhost-user slave guest only > > >> >> >> has access to memory regions that are allowed by the iotlb? > > >> >> > > > >> >> > Yes. > > >> >> > > >> >> Okay, thanks for confirming. > > >> >> > > >> >> This can be supported by the approach I've described. The > > >> >> vhost-pci QEMU code has control over the BAR memory so it can > > >> >> prevent the guest from accessing regions that are not allowed by the > iotlb. > > >> >> > > >> >> Inside the guest the vhost-user slave still has the memory > > >> >> region descriptions and sends iotlb messages. This is > > >> >> completely compatible with the libvirt-user APIs and existing > > >> >> vhost-user slave code can run fine. The only unique thing is > > >> >> that guest accesses to memory regions not allowed by the iotlb do > not work because QEMU has prevented it. > > >> > > > >> > I don't think this can work since suddenly you need to map full > > >> > IOMMU address space into BAR. > > >> > > >> The BAR covers all guest RAM > > >> but QEMU can set up MemoryRegions that hide parts from the guest > > >> (e.g. reads produce 0xff). I'm not sure how expensive that is but > > >> implementing a strict IOMMU is hard to do without performance > > >> overhead. > > > > > > I'm worried about leaking PAs. > > > fundamentally if you want proper protection you need your device > > > driver to use VA for addressing, > > > > > > On the one hand BAR only needs to be as large as guest PA then. > > > On the other hand it must cover all of guest PA, not just what is > > > accessible to the device. > > > > A more heavyweight iotlb implementation in QEMU's vhost-pci device > > could present only VAs to the vhost-pci driver. It would use > > MemoryRegions to map pieces of shared guest memory dynamically. The > > only information leak would be the overall guest RAM size because we > > still need to set the correct BAR size. > > I'm not sure this will work. KVM simply > isn't designed with a huge number of fragmented regions in mind. > > Wei, just what is the plan for the IOMMU? How will all virtual addresses fit > in a BAR? > > Maybe we really do want a non-translating IOMMU (leaking PA to userspace > but oh well)? Yes, I have 2 ways of implementation in mind. Basically, I think we will leverage the slave side EPT to provide the accessible master memory to the slave guest. Before getting into that, please let me introduce some background first: here, VM1 is the slave VM, and VM2 is the master VM The VM2's memory, which is exposed to the vhost-pci driver, does not have to be mapped and added to the bar MemoryRegion when the device is plugged to VM1. They can be added later even after the driver does ioremap of the bar. Here is what this patch series has done: i. when VM1 boots, the bar MemoryRegion is initialized, and registered via pci_register_bar (in the vpnet_pci_realize function), the bar MemoryRegion is like a "skeleton" without any real memory added to it, except the 4KB metadata memory which is added to the top of bar MemoryRegion (in vpnet_device_realize) ii. when the vhost-pci driver probes, it does ioremap(bar, 64GB), which sets up the guest kernel mapping of the 64GB bar, but till now only the top 4KB memory has qva->gpa mapping, which will be referenced to create the EPT mapping when accessed. iii. when VM2 boots, the vhost-user master sends the VM2's memory info to VM1, then those memory are mapped by QEMU1 and updated to the vhost-pci bar MemoryRegion(in the vp_slave_set_mem_table function). After this, VM2's memory can appear in VM1's EPT when accessed. With the above understanding, here are the 2 possible options (which are similar actually) of IOMMU support: Option1: 1) In the above phase iii., when QEMU1 receives the memory info, it does not map and expose the entire memory to the guest, instead, it just stores the memory info there, e.g., info1 (fd1, base1, len1), info2 (fd2, base2, len2), info3(fd3, base3, len3) etc. 2) When VM2's virtio-net driver wants to set up dma remapping table entries, the dma_map_page function will trap to QEMU2 with iova and gpa (is this correct? I'll double checked this part) 3) then QEMU2 sends an IOTLB_MSG(iova1, gpa1, size1) to QEMU1 - iova and uaddr seem not useful for vhost-pci, so maybe we will need to use gpa instead of uaddr 4) When QEMU1 receives the msg, it compares gpa1 with the memory info, for example, it may find base1 < gpa1 < base1+len1, and offset1=gpa1-base1, then do mmap(.., size1, fd1, offset1), and add the sub MemoryRegion (gpa1, size1) to the bar memory region, which will finally get the memory added to ept. Option 2: Similar to Option1, but in 1), QEMU1 can map and exposes the entire memory to the guest as non-accessible (instead of not mapping), and changes that pieces of memory (from IOTLB_MSG) to accessible in 4) Best, Wei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang ` (11 preceding siblings ...) 2017-12-06 13:49 ` Stefan Hajnoczi @ 2017-12-19 11:35 ` Stefan Hajnoczi 2017-12-19 14:56 ` Michael S. Tsirkin 12 siblings, 1 reply; 77+ messages in thread From: Stefan Hajnoczi @ 2017-12-19 11:35 UTC (permalink / raw) To: Wei Wang Cc: virtio-dev, qemu-devel, mst, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang [-- Attachment #1: Type: text/plain, Size: 2293 bytes --] On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote: > Vhost-pci is a point-to-point based inter-VM communication solution. This > patch series implements the vhost-pci-net device setup and emulation. The > device is implemented as a virtio device, and it is set up via the > vhost-user protocol to get the neessary info (e.g the memory info of the > remote VM, vring info). > > Currently, only the fundamental functions are implemented. More features, > such as MQ and live migration, will be updated in the future. > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here: > http://dpdk.org/ml/archives/dev/2017-November/082615.html This thread has become long so I've summarized outstanding issues: * Whether to define a virtio device for each device type or to have a single virtio device that speaks the vhost-user protocol. Discussion ongoing. * Please structure the code so there is a common vhost-pci component that other device types (scsi, blk) could use. This will show that the code is designed with other device types (scsi, blk) in mind. * Please validate all inputs from vhost-user protocol messages to protect against buffer overflows and other bugs. * Please handle short reads/writes and EAGAIN with the UNIX domain socket. Do not use read/write_all() functions because they hang QEMU until I/O completes. * What happens when the vhost-user master disconnects while we're running? * How does the guest know when the QEMU vhost-user slave has finished initializing everything? It seems like a guest driver could access the device before things are initialized. * How can the vhost-user slave inside the guest participate in feature negotiation? It must be able to participate, otherwise slaves cannot disable features that QEMU supports but they don't want to support. It's not feasible to pass in host_features as a QEMU parameter because that would require libvirt, OpenStack, cloud providers, etc to add support so users can manually set the bits for their slave implementation. * How can the the guest limit the number of virtqueues? * Please include tests. See tests/virtio-net-test.c and tests/vhost-user-test.c for examples. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication 2017-12-19 11:35 ` Stefan Hajnoczi @ 2017-12-19 14:56 ` Michael S. Tsirkin [not found] ` <CAJSP0QV80YLwDfKJaXkjepwttsrX1wuH0c_69KTq_mGimYMf1g@mail.gmail.com> 0 siblings, 1 reply; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-19 14:56 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Wei Wang, virtio-dev, qemu-devel, marcandre.lureau, jasowang, pbonzini, jan.kiszka, avi.cohen, zhiyong.yang > * Please handle short reads/writes and EAGAIN with the UNIX domain socket. Do > not use read/write_all() functions because they hang QEMU until I/O > completes. I'm not sure I agree with this one. vhost-user uses this extensively right now. It might be a worth-while goal to drop this limitation but I don't see why we should start with vhost-pci. And in particular, VCPU can't make progress unless a slave is present. > * What happens when the vhost-user master disconnects while we're running? > > * How does the guest know when the QEMU vhost-user slave has finished > initializing everything? It seems like a guest driver could access the > device before things are initialized. > > * How can the vhost-user slave inside the guest participate in feature > negotiation? It must be able to participate, otherwise slaves cannot > disable features that QEMU supports but they don't want to support. > > It's not feasible to pass in host_features as a QEMU parameter because > that would require libvirt, OpenStack, cloud providers, etc to add > support so users can manually set the bits for their slave > implementation. > > * How can the the guest limit the number of virtqueues? I think it is feasible to pass in host features, # of vqs etc. Assuming compatibility with existing guests, I don't think you can do anything else really if you assume that vhost guest might boot after the virtio guest. So either you give up on compatibility, or you allow the vhost guest to block the virtio guest. I think compatibility is more important. We can later think about ways to add non-blocking behaviour as a feature. > > * Please include tests. See tests/virtio-net-test.c and > tests/vhost-user-test.c for examples. Unit tests are nice but an actual way to test without running a full blown dpdk stack would be nicer. Something along the lines of a port of vhost user bridge to the guest. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <CAJSP0QV80YLwDfKJaXkjepwttsrX1wuH0c_69KTq_mGimYMf1g@mail.gmail.com>]
* [virtio-dev] Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication [not found] ` <CAJSP0QV80YLwDfKJaXkjepwttsrX1wuH0c_69KTq_mGimYMf1g@mail.gmail.com> @ 2017-12-20 4:06 ` Michael S. Tsirkin 0 siblings, 0 replies; 77+ messages in thread From: Michael S. Tsirkin @ 2017-12-20 4:06 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, virtio-dev, Avi Cohen (A), Yang, Zhiyong, Jan Kiszka, Jason Wang, qemu-devel, Wei Wang, Marc-André Lureau, Paolo Bonzini On Tue, Dec 19, 2017 at 05:05:59PM +0000, Stefan Hajnoczi wrote: > On Tue, Dec 19, 2017 at 2:56 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> * Please handle short reads/writes and EAGAIN with the UNIX domain socket. Do > >> not use read/write_all() functions because they hang QEMU until I/O > >> completes. > > > > I'm not sure I agree with this one. vhost-user uses this extensively > > right now. It might be a worth-while goal to drop this limitation > > but I don't see why we should start with vhost-pci. > > > > And in particular, VCPU can't make progress unless a slave is present. > > Hang on, we're talking about different things: > > The QEMU vhost-user master blocks because vhost_*() functions are > synchronous (they don't use callbacks or yield). Fixing that is > beyond the scope of this patch series and I'm not asking for it. > > This patch series adds a from-scratch vhost-user slave implementation > which has no good reason to be blocking. A single malicious or broken > guest must not be able to hang a vhost-pci network switch. Hmm that's not an easy change. But I agree, it's more important for the switch. > >> * How can the the guest limit the number of virtqueues? > > > > I think it is feasible to pass in host features, # of vqs etc. Assuming > > compatibility with existing guests, I don't think you can do anything > > else really if you assume that vhost guest might boot after the > > virtio guest. > > > > So either you give up on compatibility, or you allow the vhost > > guest to block the virtio guest. > > > > I think compatibility is more important. > > > > We can later think about ways to add non-blocking behaviour > > as a feature. > > I agree it's a separate feature because it will require non-vhost-pci > related changes. > > I have posted a separate email thread to discuss a solution. > > >> > >> * Please include tests. See tests/virtio-net-test.c and > >> tests/vhost-user-test.c for examples. > > > > Unit tests are nice but an actual way to test without > > running a full blown dpdk stack would be nicer. > > Something along the lines of a port of vhost user bridge > > to the guest. > > Yes! --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 77+ messages in thread
end of thread, other threads:[~2017-12-20 4:06 UTC | newest]
Thread overview: 77+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-05 3:33 [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 1/7] vhost-user: share the vhost-user protocol related structures Wei Wang
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net Wei Wang
2017-12-05 14:59 ` Stefan Hajnoczi
2017-12-05 15:17 ` Michael S. Tsirkin
2017-12-05 15:55 ` Michael S. Tsirkin
2017-12-05 16:41 ` Stefan Hajnoczi
2017-12-05 16:53 ` Michael S. Tsirkin
2017-12-05 17:00 ` [virtio-dev] Re: [Qemu-devel] " Cornelia Huck
2017-12-05 18:06 ` Michael S. Tsirkin
2017-12-06 10:17 ` Wei Wang
2017-12-06 12:01 ` Stefan Hajnoczi
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 3/7] virtio/virtio-pci.c: add vhost-pci-net-pci Wei Wang
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation Wei Wang
2017-12-05 15:56 ` [virtio-dev] " Stefan Hajnoczi
2017-12-14 17:30 ` Stefan Hajnoczi
2017-12-14 17:48 ` Stefan Hajnoczi
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg Wei Wang
2017-12-05 16:00 ` [virtio-dev] " Stefan Hajnoczi
2017-12-06 10:32 ` Wei Wang
2017-12-15 12:40 ` Stefan Hajnoczi
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 6/7] vhost-pci-slave: handle VHOST_USER_SET_VHOST_PCI Wei Wang
2017-12-05 3:33 ` [virtio-dev] [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa Wei Wang
2017-12-05 16:05 ` [virtio-dev] " Stefan Hajnoczi
2017-12-06 10:46 ` Wei Wang
2017-12-05 7:01 ` [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication Jason Wang
2017-12-05 7:15 ` Wei Wang
2017-12-05 7:19 ` Jason Wang
2017-12-05 8:49 ` Avi Cohen (A)
2017-12-05 10:36 ` Wei Wang
2017-12-05 14:30 ` Stefan Hajnoczi
2017-12-05 15:20 ` [virtio-dev] " Michael S. Tsirkin
2017-12-05 16:06 ` [virtio-dev] " Stefan Hajnoczi
2017-12-06 13:49 ` Stefan Hajnoczi
2017-12-06 16:09 ` Wang, Wei W
[not found] ` <CAJSP0QWugAKQy6hYEJfy_XHEg-Q2swAzZMNcWBqn-r9Yi7yiEg@mail.gmail.com>
2017-12-07 3:57 ` [virtio-dev] Re: [Qemu-devel] " Wei Wang
2017-12-07 5:11 ` Michael S. Tsirkin
2017-12-07 5:34 ` Wei Wang
[not found] ` <CAJSP0QUxRv9LNb1+McYxV0KY4Ss3NkaSjwO6fXiJd+oU2+zJSQ@mail.gmail.com>
2017-12-07 7:54 ` [virtio-dev] " Avi Cohen (A)
[not found] ` <CAJSP0QVSOsPXYTyjCsBbUmzivjtYbC7xKpU2m7dQbAPMhrcLnA@mail.gmail.com>
2017-12-07 8:31 ` [virtio-dev] " Jason Wang
2017-12-07 10:24 ` Stefan Hajnoczi
2017-12-07 13:33 ` Michael S. Tsirkin
2017-12-07 9:02 ` Wei Wang
[not found] ` <CAJSP0QURjdD8BnOmJo83fzJn_zCijSKQh==Pz+Xu4r6Q2i3SkQ@mail.gmail.com>
2017-12-07 14:02 ` Michael S. Tsirkin
[not found] ` <CAJSP0QVu4iwAu01Sth84VZshQde97x3FW1E1ua_YXVKs-65vhQ@mail.gmail.com>
2017-12-07 16:47 ` Michael S. Tsirkin
[not found] ` <CAJSP0QVnukGD3Afu9myv=v5OjqrPDpXu6JL3Tpf+Cdk=em9V3w@mail.gmail.com>
2017-12-07 17:38 ` Michael S. Tsirkin
[not found] ` <CAJSP0QX4V64OoU4-Dhb93MUZ9Rz0FPR-La5Xq4_yqGH7SG6PjQ@mail.gmail.com>
2017-12-07 23:54 ` Michael S. Tsirkin
2017-12-08 6:43 ` Wei Wang
[not found] ` <CAJSP0QUAqCzFgVtM1cg_KybdyrZa_FRUHhDN7oLfRjZ2ZVkp4g@mail.gmail.com>
2017-12-09 16:23 ` [virtio-dev] " Wang, Wei W
2017-12-11 11:11 ` [virtio-dev] " Stefan Hajnoczi
2017-12-11 13:53 ` [virtio-dev] " Wang, Wei W
2017-12-12 10:14 ` [virtio-dev] " Stefan Hajnoczi
2017-12-13 8:11 ` Wei Wang
[not found] ` <20171213123521.GL16782@stefanha-x1.localdomain>
2017-12-13 15:01 ` Michael S. Tsirkin
[not found] ` <CAJSP0QWHJBL4APkeMt8-P8PFaPF=Vbi0NSnJtU7YX67fJrW=hw@mail.gmail.com>
2017-12-13 20:59 ` Michael S. Tsirkin
2017-12-14 15:06 ` Stefan Hajnoczi
2017-12-15 10:33 ` Wei Wang
2017-12-15 12:37 ` Stefan Hajnoczi
2017-12-13 21:50 ` Maxime Coquelin
2017-12-14 15:46 ` Stefan Hajnoczi
2017-12-14 16:27 ` Michael S. Tsirkin
2017-12-14 16:39 ` Maxime Coquelin
2017-12-14 16:40 ` Michael S. Tsirkin
2017-12-14 16:50 ` Maxime Coquelin
2017-12-14 18:11 ` Stefan Hajnoczi
2017-12-14 5:53 ` Wei Wang
2017-12-14 17:32 ` Stefan Hajnoczi
2017-12-15 9:10 ` Wei Wang
2017-12-15 12:26 ` Stefan Hajnoczi
2017-12-14 18:04 ` Stefan Hajnoczi
2017-12-15 10:33 ` Wei Wang
2017-12-15 12:00 ` Stefan Hajnoczi
[not found] ` <CAJSP0QXYMVBidUd5-NJb5FDYbc6wSkNYgdadjk8+NXvwosLMPw@mail.gmail.com>
2017-12-08 14:27 ` Michael S. Tsirkin
2017-12-09 16:08 ` [virtio-dev] " Wang, Wei W
2017-12-19 11:35 ` Stefan Hajnoczi
2017-12-19 14:56 ` Michael S. Tsirkin
[not found] ` <CAJSP0QV80YLwDfKJaXkjepwttsrX1wuH0c_69KTq_mGimYMf1g@mail.gmail.com>
2017-12-20 4:06 ` [virtio-dev] Re: [Qemu-devel] " Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox