* [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-06-28 14:57 [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
@ 2024-06-28 14:57 ` Albert Esteve
2024-07-11 7:45 ` Stefan Hajnoczi
2024-06-28 14:57 ` [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config Albert Esteve
` (5 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-06-28 14:57 UTC (permalink / raw)
To: qemu-devel
Cc: jasowang, david, slp, Alex Bennée, stefanha,
Michael S. Tsirkin, Albert Esteve
Add SHMEM_MAP/UNMAP requests to vhost-user to
handle VIRTIO Shared Memory mappings.
This request allows backends to dynamically map
fds into a VIRTIO Shared Memory Region indentified
by its `shmid`. Then, the fd memory is advertised
to the driver as a base addres + offset, so it
can be read/written (depending on the mmap flags
requested) while its valid.
The backend can munmap the memory range
in a given VIRTIO Shared Memory Region (again,
identified by its `shmid`), to free it. Upon
receiving this message, the front-end must
mmap the regions with PROT_NONE to reserve
the virtual memory space.
The device model needs to create MemoryRegion
instances for the VIRTIO Shared Memory Regions
and add them to the `VirtIODevice` instance.
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
docs/interop/vhost-user.rst | 27 +++++
hw/virtio/vhost-user.c | 122 ++++++++++++++++++++++
hw/virtio/virtio.c | 12 +++
include/hw/virtio/virtio.h | 5 +
subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
6 files changed, 284 insertions(+)
diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d8419fd2f1..d52ba719d5 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1859,6 +1859,33 @@ is sent by the front-end.
when the operation is successful, or non-zero otherwise. Note that if the
operation fails, no fd is sent to the backend.
+``VHOST_USER_BACKEND_SHMEM_MAP``
+ :id: 9
+ :equivalent ioctl: N/A
+ :request payload: fd and ``struct VhostUserMMap``
+ :reply payload: N/A
+
+ This message can be submitted by the backends to advertise a new mapping
+ to be made in a given VIRTIO Shared Memory Region. Upon receiving the message,
+ The front-end will mmap the given fd into the VIRTIO Shared Memory Region
+ with the requested ``shmid``. A reply is generated indicating whether mapping
+ succeeded.
+
+ Mapping over an already existing map is not allowed and request shall fail.
+ Therefore, the memory range in the request must correspond with a valid,
+ free region of the VIRTIO Shared Memory Region.
+
+``VHOST_USER_BACKEND_SHMEM_UNMAP``
+ :id: 10
+ :equivalent ioctl: N/A
+ :request payload: ``struct VhostUserMMap``
+ :reply payload: N/A
+
+ This message can be submitted by the backends so that the front-end un-mmap
+ a given range (``offset``, ``len``) in the VIRTIO Shared Memory Region with
+ the requested ``shmid``.
+ A reply is generated indicating whether unmapping succeeded.
+
.. _reply_ack:
VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index cdf9af4a4b..7ee8a472c6 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
+ VHOST_USER_BACKEND_SHMEM_MAP = 9,
+ VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
VHOST_USER_BACKEND_MAX
} VhostUserBackendRequest;
@@ -192,6 +194,24 @@ typedef struct VhostUserShared {
unsigned char uuid[16];
} VhostUserShared;
+/* For the flags field of VhostUserMMap */
+#define VHOST_USER_FLAG_MAP_R (1u << 0)
+#define VHOST_USER_FLAG_MAP_W (1u << 1)
+
+typedef struct {
+ /* VIRTIO Shared Memory Region ID */
+ uint8_t shmid;
+ uint8_t padding[7];
+ /* File offset */
+ uint64_t fd_offset;
+ /* Offset within the VIRTIO Shared Memory Region */
+ uint64_t shm_offset;
+ /* Size of the mapping */
+ uint64_t len;
+ /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
+ uint64_t flags;
+} VhostUserMMap;
+
typedef struct {
VhostUserRequest request;
@@ -224,6 +244,7 @@ typedef union {
VhostUserInflight inflight;
VhostUserShared object;
VhostUserTransferDeviceState transfer_state;
+ VhostUserMMap mmap;
} VhostUserPayload;
typedef struct VhostUserMsg {
@@ -1748,6 +1769,100 @@ vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
return 0;
}
+static int
+vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
+ VhostUserMMap *vu_mmap,
+ int fd)
+{
+ void *addr = 0;
+ MemoryRegion *mr = NULL;
+
+ if (fd < 0) {
+ error_report("Bad fd for map");
+ return -EBADF;
+ }
+
+ if (!dev->vdev->shmem_list ||
+ dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
+ error_report("Device only has %d VIRTIO Shared Memory Regions. "
+ "Requested ID: %d",
+ dev->vdev->n_shmem_regions, vu_mmap->shmid);
+ return -EFAULT;
+ }
+
+ mr = &dev->vdev->shmem_list[vu_mmap->shmid];
+
+ if (!mr) {
+ error_report("VIRTIO Shared Memory Region at "
+ "ID %d unitialized", vu_mmap->shmid);
+ return -EFAULT;
+ }
+
+ if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
+ (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
+ error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
+ vu_mmap->shm_offset, vu_mmap->len);
+ return -EFAULT;
+ }
+
+ void *shmem_ptr = memory_region_get_ram_ptr(mr);
+
+ addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
+ ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
+ ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
+ MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
+
+ if (addr == MAP_FAILED) {
+ error_report("Failed to mmap mem fd");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static int
+vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
+ VhostUserMMap *vu_mmap)
+{
+ void *addr = 0;
+ MemoryRegion *mr = NULL;
+
+ if (!dev->vdev->shmem_list ||
+ dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
+ error_report("Device only has %d VIRTIO Shared Memory Regions. "
+ "Requested ID: %d",
+ dev->vdev->n_shmem_regions, vu_mmap->shmid);
+ return -EFAULT;
+ }
+
+ mr = &dev->vdev->shmem_list[vu_mmap->shmid];
+
+ if (!mr) {
+ error_report("VIRTIO Shared Memory Region at "
+ "ID %d unitialized", vu_mmap->shmid);
+ return -EFAULT;
+ }
+
+ if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
+ (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
+ error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
+ vu_mmap->shm_offset, vu_mmap->len);
+ return -EFAULT;
+ }
+
+ void *shmem_ptr = memory_region_get_ram_ptr(mr);
+
+ addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
+ PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+
+ if (addr == MAP_FAILED) {
+ error_report("Failed to unmap memory");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
static void close_backend_channel(struct vhost_user *u)
{
g_source_destroy(u->backend_src);
@@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc, GIOCondition condition,
ret = vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
&hdr, &payload);
break;
+ case VHOST_USER_BACKEND_SHMEM_MAP:
+ ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
+ fd ? fd[0] : -1);
+ break;
+ case VHOST_USER_BACKEND_SHMEM_UNMAP:
+ ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
+ break;
default:
error_report("Received unexpected msg type: %d.", hdr.request);
ret = -EINVAL;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..9f2da5b11e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
}
+MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
+{
+ MemoryRegion *mr = g_new0(MemoryRegion, 1);
+ ++vdev->n_shmem_regions;
+ vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
+ vdev->n_shmem_regions);
+ vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
+ return mr;
+}
+
/* A wrapper for use as a VMState .put function */
static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
const VMStateField *field, JSONWriter *vmdesc)
@@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
virtio_vmstate_change, vdev);
vdev->device_endian = virtio_default_endian();
vdev->use_guest_notifier_mask = true;
+ vdev->shmem_list = NULL;
+ vdev->n_shmem_regions = 0;
}
/*
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..16d598aadc 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -165,6 +165,9 @@ struct VirtIODevice
*/
EventNotifier config_notifier;
bool device_iotlb_enabled;
+ /* Shared memory region for vhost-user mappings. */
+ MemoryRegion *shmem_list;
+ int n_shmem_regions;
};
struct VirtioDeviceClass {
@@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
int virtio_save(VirtIODevice *vdev, QEMUFile *f);
+MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
+
extern const VMStateInfo virtio_vmstate_info;
#define VMSTATE_VIRTIO_DEVICE \
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..28556d183a 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN])
return vu_send_message(dev, &msg);
}
+bool
+vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+ uint64_t shm_offset, uint64_t len, uint64_t flags)
+{
+ bool result = false;
+ VhostUserMsg msg_reply;
+ VhostUserMsg vmsg = {
+ .request = VHOST_USER_BACKEND_SHMEM_MAP,
+ .size = sizeof(vmsg.payload.mmap),
+ .flags = VHOST_USER_VERSION,
+ .payload.mmap = {
+ .shmid = shmid,
+ .fd_offset = fd_offset,
+ .shm_offset = shm_offset,
+ .len = len,
+ .flags = flags,
+ },
+ };
+
+ if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
+ vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
+ }
+
+ pthread_mutex_lock(&dev->backend_mutex);
+ if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
+ pthread_mutex_unlock(&dev->backend_mutex);
+ return false;
+ }
+
+ /* Also unlocks the backend_mutex */
+ return vu_process_message_reply(dev, &vmsg);
+}
+
+bool
+vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+ uint64_t shm_offset, uint64_t len)
+{
+ bool result = false;
+ VhostUserMsg msg_reply;
+ VhostUserMsg vmsg = {
+ .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
+ .size = sizeof(vmsg.payload.mmap),
+ .flags = VHOST_USER_VERSION,
+ .payload.mmap = {
+ .shmid = shmid,
+ .fd_offset = fd_offset,
+ .shm_offset = shm_offset,
+ .len = len,
+ },
+ };
+
+ if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
+ vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
+ }
+
+ pthread_mutex_lock(&dev->backend_mutex);
+ if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
+ pthread_mutex_unlock(&dev->backend_mutex);
+ return false;
+ }
+
+ /* Also unlocks the backend_mutex */
+ return vu_process_message_reply(dev, &vmsg);
+}
+
static bool
vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
{
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..7f6c22cc1a 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
+ VHOST_USER_BACKEND_SHMEM_MAP = 9,
+ VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
VHOST_USER_BACKEND_MAX
} VhostUserBackendRequest;
@@ -186,6 +188,24 @@ typedef struct VhostUserShared {
unsigned char uuid[UUID_LEN];
} VhostUserShared;
+/* For the flags field of VhostUserMMap */
+#define VHOST_USER_FLAG_MAP_R (1u << 0)
+#define VHOST_USER_FLAG_MAP_W (1u << 1)
+
+typedef struct {
+ /* VIRTIO Shared Memory Region ID */
+ uint8_t shmid;
+ uint8_t padding[7];
+ /* File offset */
+ uint64_t fd_offset;
+ /* Offset within the VIRTIO Shared Memory Region */
+ uint64_t shm_offset;
+ /* Size of the mapping */
+ uint64_t len;
+ /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
+ uint64_t flags;
+} VhostUserMMap;
+
#if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
# define VU_PACKED __attribute__((gcc_struct, packed))
#else
@@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
VhostUserVringArea area;
VhostUserInflight inflight;
VhostUserShared object;
+ VhostUserMMap mmap;
} payload;
int fds[VHOST_MEMORY_BASELINE_NREGIONS];
@@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
*/
bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
+/**
+ * vu_shmem_map:
+ * @dev: a VuDev context
+ * @shmid: VIRTIO Shared Memory Region ID
+ * @fd_offset: File offset
+ * @shm_offset: Offset within the VIRTIO Shared Memory Region
+ * @len: Size of the mapping
+ * @flags: Flags for the mmap operation
+ *
+ * Advertises a new mapping to be made in a given VIRTIO Shared Memory Region.
+ *
+ * Returns: TRUE on success, FALSE on failure.
+ */
+bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+ uint64_t shm_offset, uint64_t len, uint64_t flags);
+
+/**
+ * vu_shmem_map:
+ * @dev: a VuDev context
+ * @shmid: VIRTIO Shared Memory Region ID
+ * @fd_offset: File offset
+ * @shm_offset: Offset within the VIRTIO Shared Memory Region
+ * @len: Size of the mapping
+ *
+ * The front-end un-mmaps a given range in the VIRTIO Shared Memory Region
+ * with the requested `shmid`.
+ *
+ * Returns: TRUE on success, FALSE on failure.
+ */
+bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+ uint64_t shm_offset, uint64_t len);
+
/**
* vu_queue_set_notification:
* @dev: a VuDev context
--
2.45.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-06-28 14:57 ` [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request Albert Esteve
@ 2024-07-11 7:45 ` Stefan Hajnoczi
2024-09-03 9:54 ` Albert Esteve
2024-09-04 7:28 ` Albert Esteve
0 siblings, 2 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11 7:45 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 18251 bytes --]
On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> Add SHMEM_MAP/UNMAP requests to vhost-user to
> handle VIRTIO Shared Memory mappings.
>
> This request allows backends to dynamically map
> fds into a VIRTIO Shared Memory Region indentified
> by its `shmid`. Then, the fd memory is advertised
> to the driver as a base addres + offset, so it
> can be read/written (depending on the mmap flags
> requested) while its valid.
>
> The backend can munmap the memory range
> in a given VIRTIO Shared Memory Region (again,
> identified by its `shmid`), to free it. Upon
> receiving this message, the front-end must
> mmap the regions with PROT_NONE to reserve
> the virtual memory space.
>
> The device model needs to create MemoryRegion
> instances for the VIRTIO Shared Memory Regions
> and add them to the `VirtIODevice` instance.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> docs/interop/vhost-user.rst | 27 +++++
> hw/virtio/vhost-user.c | 122 ++++++++++++++++++++++
> hw/virtio/virtio.c | 12 +++
> include/hw/virtio/virtio.h | 5 +
> subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
> subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
> 6 files changed, 284 insertions(+)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d8419fd2f1..d52ba719d5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1859,6 +1859,33 @@ is sent by the front-end.
> when the operation is successful, or non-zero otherwise. Note that if the
> operation fails, no fd is sent to the backend.
>
> +``VHOST_USER_BACKEND_SHMEM_MAP``
> + :id: 9
> + :equivalent ioctl: N/A
> + :request payload: fd and ``struct VhostUserMMap``
> + :reply payload: N/A
> +
> + This message can be submitted by the backends to advertise a new mapping
> + to be made in a given VIRTIO Shared Memory Region. Upon receiving the message,
> + The front-end will mmap the given fd into the VIRTIO Shared Memory Region
> + with the requested ``shmid``. A reply is generated indicating whether mapping
> + succeeded.
> +
> + Mapping over an already existing map is not allowed and request shall fail.
> + Therefore, the memory range in the request must correspond with a valid,
> + free region of the VIRTIO Shared Memory Region.
> +
> +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> + :id: 10
> + :equivalent ioctl: N/A
> + :request payload: ``struct VhostUserMMap``
> + :reply payload: N/A
> +
> + This message can be submitted by the backends so that the front-end un-mmap
> + a given range (``offset``, ``len``) in the VIRTIO Shared Memory Region with
s/offset/shm_offset/
> + the requested ``shmid``.
Please clarify that <offset, len> must correspond to the entirety of a
valid mapped region.
By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
DAX Window:
When a FUSE_SETUPMAPPING request perfectly overlaps a previous
mapping, the previous mapping is replaced. When a mapping partially
overlaps a previous mapping, the previous mapping is split into one or
two smaller mappings. When a mapping is partially unmapped it is also
split into one or two smaller mappings.
Establishing new mappings or splitting existing mappings consumes
resources. If the device runs out of resources the FUSE_SETUPMAPPING
request fails until resources are available again following
FUSE_REMOVEMAPPING.
I think SETUPMAPPING/REMOVMAPPING can be implemented using
SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
partial ranges, but as far as I know that's not necessary for virtiofs
in practice.
It's worth mentioning that mappings consume resources and that SHMEM_MAP
can fail when there are no resources available. The process-wide limit
is vm.max_map_count on Linux although a vhost-user frontend may reduce
it further to control vhost-user resource usage.
> + A reply is generated indicating whether unmapping succeeded.
> +
> .. _reply_ack:
>
> VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..7ee8a472c6 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> VHOST_USER_BACKEND_MAX
> } VhostUserBackendRequest;
>
> @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> unsigned char uuid[16];
> } VhostUserShared;
>
> +/* For the flags field of VhostUserMMap */
> +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> + /* VIRTIO Shared Memory Region ID */
> + uint8_t shmid;
> + uint8_t padding[7];
> + /* File offset */
> + uint64_t fd_offset;
> + /* Offset within the VIRTIO Shared Memory Region */
> + uint64_t shm_offset;
> + /* Size of the mapping */
> + uint64_t len;
> + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> + uint64_t flags;
> +} VhostUserMMap;
> +
> typedef struct {
> VhostUserRequest request;
>
> @@ -224,6 +244,7 @@ typedef union {
> VhostUserInflight inflight;
> VhostUserShared object;
> VhostUserTransferDeviceState transfer_state;
> + VhostUserMMap mmap;
> } VhostUserPayload;
>
> typedef struct VhostUserMsg {
> @@ -1748,6 +1769,100 @@ vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> return 0;
> }
>
> +static int
> +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> + VhostUserMMap *vu_mmap,
> + int fd)
> +{
> + void *addr = 0;
> + MemoryRegion *mr = NULL;
> +
> + if (fd < 0) {
> + error_report("Bad fd for map");
> + return -EBADF;
> + }
> +
> + if (!dev->vdev->shmem_list ||
> + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> + error_report("Device only has %d VIRTIO Shared Memory Regions. "
> + "Requested ID: %d",
> + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> + return -EFAULT;
> + }
> +
> + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> +
> + if (!mr) {
> + error_report("VIRTIO Shared Memory Region at "
> + "ID %d unitialized", vu_mmap->shmid);
> + return -EFAULT;
> + }
> +
> + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> + vu_mmap->shm_offset, vu_mmap->len);
> + return -EFAULT;
> + }
> +
> + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> +
> + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
Missing check for overlap between range [shm_offset, shm_offset + len)
and existing mappings.
> + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> +
> + if (addr == MAP_FAILED) {
> + error_report("Failed to mmap mem fd");
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +static int
> +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> + VhostUserMMap *vu_mmap)
> +{
> + void *addr = 0;
> + MemoryRegion *mr = NULL;
> +
> + if (!dev->vdev->shmem_list ||
> + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> + error_report("Device only has %d VIRTIO Shared Memory Regions. "
> + "Requested ID: %d",
> + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> + return -EFAULT;
> + }
> +
> + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> +
> + if (!mr) {
> + error_report("VIRTIO Shared Memory Region at "
> + "ID %d unitialized", vu_mmap->shmid);
> + return -EFAULT;
> + }
> +
> + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> + vu_mmap->shm_offset, vu_mmap->len);
> + return -EFAULT;
> + }
> +
> + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> +
> + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
Missing check for existing mapping with exact range [shm_offset, len)
match.
> + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
> +
> + if (addr == MAP_FAILED) {
> + error_report("Failed to unmap memory");
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> static void close_backend_channel(struct vhost_user *u)
> {
> g_source_destroy(u->backend_src);
> @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc, GIOCondition condition,
> ret = vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> &hdr, &payload);
> break;
> + case VHOST_USER_BACKEND_SHMEM_MAP:
> + ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> + fd ? fd[0] : -1);
> + break;
> + case VHOST_USER_BACKEND_SHMEM_UNMAP:
> + ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> + break;
> default:
> error_report("Received unexpected msg type: %d.", hdr.request);
> ret = -EINVAL;
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..9f2da5b11e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> }
>
> +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> +{
> + MemoryRegion *mr = g_new0(MemoryRegion, 1);
> + ++vdev->n_shmem_regions;
> + vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> + vdev->n_shmem_regions);
Where is shmem_list freed?
The name "list" is misleading since this is an array, not a list.
> + vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> + return mr;
> +}
This looks weird. The contents of mr are copied into shmem_list[] and
then the pointer to mr is returned? Did you mean for the field's type to
be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
stash the pointer?
> +
> /* A wrapper for use as a VMState .put function */
> static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> const VMStateField *field, JSONWriter *vmdesc)
> @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
> virtio_vmstate_change, vdev);
> vdev->device_endian = virtio_default_endian();
> vdev->use_guest_notifier_mask = true;
> + vdev->shmem_list = NULL;
> + vdev->n_shmem_regions = 0;
> }
>
> /*
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..16d598aadc 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -165,6 +165,9 @@ struct VirtIODevice
> */
> EventNotifier config_notifier;
> bool device_iotlb_enabled;
> + /* Shared memory region for vhost-user mappings. */
> + MemoryRegion *shmem_list;
> + int n_shmem_regions;
> };
>
> struct VirtioDeviceClass {
> @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
>
> int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>
> +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> +
> extern const VMStateInfo virtio_vmstate_info;
>
> #define VMSTATE_VIRTIO_DEVICE \
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index a879149fef..28556d183a 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN])
> return vu_send_message(dev, &msg);
> }
>
> +bool
> +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> + uint64_t shm_offset, uint64_t len, uint64_t flags)
> +{
> + bool result = false;
> + VhostUserMsg msg_reply;
> + VhostUserMsg vmsg = {
> + .request = VHOST_USER_BACKEND_SHMEM_MAP,
> + .size = sizeof(vmsg.payload.mmap),
> + .flags = VHOST_USER_VERSION,
> + .payload.mmap = {
> + .shmid = shmid,
> + .fd_offset = fd_offset,
> + .shm_offset = shm_offset,
> + .len = len,
> + .flags = flags,
> + },
> + };
> +
> + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> + }
> +
> + pthread_mutex_lock(&dev->backend_mutex);
> + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> + pthread_mutex_unlock(&dev->backend_mutex);
> + return false;
> + }
> +
> + /* Also unlocks the backend_mutex */
> + return vu_process_message_reply(dev, &vmsg);
> +}
> +
> +bool
> +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> + uint64_t shm_offset, uint64_t len)
> +{
> + bool result = false;
> + VhostUserMsg msg_reply;
> + VhostUserMsg vmsg = {
> + .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> + .size = sizeof(vmsg.payload.mmap),
> + .flags = VHOST_USER_VERSION,
> + .payload.mmap = {
> + .shmid = shmid,
> + .fd_offset = fd_offset,
What is the meaning of this field? I expected it to be set to 0.
> + .shm_offset = shm_offset,
> + .len = len,
> + },
> + };
> +
> + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> + }
> +
> + pthread_mutex_lock(&dev->backend_mutex);
> + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> + pthread_mutex_unlock(&dev->backend_mutex);
> + return false;
> + }
> +
> + /* Also unlocks the backend_mutex */
> + return vu_process_message_reply(dev, &vmsg);
> +}
> +
> static bool
> vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> {
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> index deb40e77b3..7f6c22cc1a 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> VHOST_USER_BACKEND_MAX
> } VhostUserBackendRequest;
>
> @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> unsigned char uuid[UUID_LEN];
> } VhostUserShared;
>
> +/* For the flags field of VhostUserMMap */
> +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> + /* VIRTIO Shared Memory Region ID */
> + uint8_t shmid;
> + uint8_t padding[7];
> + /* File offset */
> + uint64_t fd_offset;
> + /* Offset within the VIRTIO Shared Memory Region */
> + uint64_t shm_offset;
> + /* Size of the mapping */
> + uint64_t len;
> + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> + uint64_t flags;
> +} VhostUserMMap;
> +
> #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> # define VU_PACKED __attribute__((gcc_struct, packed))
> #else
> @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> VhostUserVringArea area;
> VhostUserInflight inflight;
> VhostUserShared object;
> + VhostUserMMap mmap;
> } payload;
>
> int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> */
> bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
>
> +/**
> + * vu_shmem_map:
> + * @dev: a VuDev context
> + * @shmid: VIRTIO Shared Memory Region ID
> + * @fd_offset: File offset
> + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> + * @len: Size of the mapping
> + * @flags: Flags for the mmap operation
> + *
> + * Advertises a new mapping to be made in a given VIRTIO Shared Memory Region.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> + uint64_t shm_offset, uint64_t len, uint64_t flags);
> +
> +/**
> + * vu_shmem_map:
> + * @dev: a VuDev context
> + * @shmid: VIRTIO Shared Memory Region ID
> + * @fd_offset: File offset
> + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> + * @len: Size of the mapping
> + *
> + * The front-end un-mmaps a given range in the VIRTIO Shared Memory Region
> + * with the requested `shmid`.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> + uint64_t shm_offset, uint64_t len);
> +
> /**
> * vu_queue_set_notification:
> * @dev: a VuDev context
> --
> 2.45.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-07-11 7:45 ` Stefan Hajnoczi
@ 2024-09-03 9:54 ` Albert Esteve
2024-09-03 11:54 ` Albert Esteve
2024-09-04 7:28 ` Albert Esteve
1 sibling, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-09-03 9:54 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 19521 bytes --]
On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> > handle VIRTIO Shared Memory mappings.
> >
> > This request allows backends to dynamically map
> > fds into a VIRTIO Shared Memory Region indentified
> > by its `shmid`. Then, the fd memory is advertised
> > to the driver as a base addres + offset, so it
> > can be read/written (depending on the mmap flags
> > requested) while its valid.
> >
> > The backend can munmap the memory range
> > in a given VIRTIO Shared Memory Region (again,
> > identified by its `shmid`), to free it. Upon
> > receiving this message, the front-end must
> > mmap the regions with PROT_NONE to reserve
> > the virtual memory space.
> >
> > The device model needs to create MemoryRegion
> > instances for the VIRTIO Shared Memory Regions
> > and add them to the `VirtIODevice` instance.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> > docs/interop/vhost-user.rst | 27 +++++
> > hw/virtio/vhost-user.c | 122 ++++++++++++++++++++++
> > hw/virtio/virtio.c | 12 +++
> > include/hw/virtio/virtio.h | 5 +
> > subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
> > subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
> > 6 files changed, 284 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d8419fd2f1..d52ba719d5 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> > when the operation is successful, or non-zero otherwise. Note that if
> the
> > operation fails, no fd is sent to the backend.
> >
> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > + :id: 9
> > + :equivalent ioctl: N/A
> > + :request payload: fd and ``struct VhostUserMMap``
> > + :reply payload: N/A
> > +
> > + This message can be submitted by the backends to advertise a new
> mapping
> > + to be made in a given VIRTIO Shared Memory Region. Upon receiving the
> message,
> > + The front-end will mmap the given fd into the VIRTIO Shared Memory
> Region
> > + with the requested ``shmid``. A reply is generated indicating whether
> mapping
> > + succeeded.
> > +
> > + Mapping over an already existing map is not allowed and request shall
> fail.
> > + Therefore, the memory range in the request must correspond with a
> valid,
> > + free region of the VIRTIO Shared Memory Region.
> > +
> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > + :id: 10
> > + :equivalent ioctl: N/A
> > + :request payload: ``struct VhostUserMMap``
> > + :reply payload: N/A
> > +
> > + This message can be submitted by the backends so that the front-end
> un-mmap
> > + a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> Region with
>
> s/offset/shm_offset/
>
> > + the requested ``shmid``.
>
> Please clarify that <offset, len> must correspond to the entirety of a
> valid mapped region.
>
> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
> DAX Window:
>
> When a FUSE_SETUPMAPPING request perfectly overlaps a previous
> mapping, the previous mapping is replaced. When a mapping partially
> overlaps a previous mapping, the previous mapping is split into one or
> two smaller mappings. When a mapping is partially unmapped it is also
> split into one or two smaller mappings.
>
> Establishing new mappings or splitting existing mappings consumes
> resources. If the device runs out of resources the FUSE_SETUPMAPPING
> request fails until resources are available again following
> FUSE_REMOVEMAPPING.
>
> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> partial ranges, but as far as I know that's not necessary for virtiofs
> in practice.
>
> It's worth mentioning that mappings consume resources and that SHMEM_MAP
> can fail when there are no resources available. The process-wide limit
> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> it further to control vhost-user resource usage.
>
> > + A reply is generated indicating whether unmapping succeeded.
> > +
> > .. _reply_ack:
> >
> > VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index cdf9af4a4b..7ee8a472c6 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > VHOST_USER_BACKEND_MAX
> > } VhostUserBackendRequest;
> >
> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> > unsigned char uuid[16];
> > } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > + /* VIRTIO Shared Memory Region ID */
> > + uint8_t shmid;
> > + uint8_t padding[7];
> > + /* File offset */
> > + uint64_t fd_offset;
> > + /* Offset within the VIRTIO Shared Memory Region */
> > + uint64_t shm_offset;
> > + /* Size of the mapping */
> > + uint64_t len;
> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > + uint64_t flags;
> > +} VhostUserMMap;
> > +
> > typedef struct {
> > VhostUserRequest request;
> >
> > @@ -224,6 +244,7 @@ typedef union {
> > VhostUserInflight inflight;
> > VhostUserShared object;
> > VhostUserTransferDeviceState transfer_state;
> > + VhostUserMMap mmap;
> > } VhostUserPayload;
> >
> > typedef struct VhostUserMsg {
> > @@ -1748,6 +1769,100 @@
> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> > return 0;
> > }
> >
> > +static int
> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > + VhostUserMMap *vu_mmap,
> > + int fd)
> > +{
> > + void *addr = 0;
> > + MemoryRegion *mr = NULL;
> > +
> > + if (fd < 0) {
> > + error_report("Bad fd for map");
> > + return -EBADF;
> > + }
> > +
> > + if (!dev->vdev->shmem_list ||
> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > + error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > + "Requested ID: %d",
> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > + if (!mr) {
> > + error_report("VIRTIO Shared Memory Region at "
> > + "ID %d unitialized", vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > + vu_mmap->shm_offset, vu_mmap->len);
> > + return -EFAULT;
> > + }
> > +
> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for overlap between range [shm_offset, shm_offset + len)
> and existing mappings.
>
Not sure how to do this check. Specifically, I am not sure how previous
ranges are stored within the MemoryRegion. Is looping through mr->subregions
a valid option?
>
> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> > + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > +
> > + if (addr == MAP_FAILED) {
> > + error_report("Failed to mmap mem fd");
> > + return -EFAULT;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int
> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > + VhostUserMMap *vu_mmap)
> > +{
> > + void *addr = 0;
> > + MemoryRegion *mr = NULL;
> > +
> > + if (!dev->vdev->shmem_list ||
> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > + error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > + "Requested ID: %d",
> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > + if (!mr) {
> > + error_report("VIRTIO Shared Memory Region at "
> > + "ID %d unitialized", vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > + vu_mmap->shm_offset, vu_mmap->len);
> > + return -EFAULT;
> > + }
> > +
> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for existing mapping with exact range [shm_offset, len)
> match.
>
> > + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1,
> 0);
> > +
> > + if (addr == MAP_FAILED) {
> > + error_report("Failed to unmap memory");
> > + return -EFAULT;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static void close_backend_channel(struct vhost_user *u)
> > {
> > g_source_destroy(u->backend_src);
> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> GIOCondition condition,
> > ret =
> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> > &hdr,
> &payload);
> > break;
> > + case VHOST_USER_BACKEND_SHMEM_MAP:
> > + ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> > + fd ? fd[0] : -1);
> > + break;
> > + case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > + ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> > + break;
> > default:
> > error_report("Received unexpected msg type: %d.", hdr.request);
> > ret = -EINVAL;
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 893a072c9d..9f2da5b11e 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> > return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> > }
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> > +{
> > + MemoryRegion *mr = g_new0(MemoryRegion, 1);
> > + ++vdev->n_shmem_regions;
> > + vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> > + vdev->n_shmem_regions);
>
> Where is shmem_list freed?
>
> The name "list" is misleading since this is an array, not a list.
>
> > + vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> > + return mr;
> > +}
>
> This looks weird. The contents of mr are copied into shmem_list[] and
> then the pointer to mr is returned? Did you mean for the field's type to
> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> stash the pointer?
>
> > +
> > /* A wrapper for use as a VMState .put function */
> > static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> > const VMStateField *field, JSONWriter
> *vmdesc)
> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> device_id, size_t config_size)
> > virtio_vmstate_change, vdev);
> > vdev->device_endian = virtio_default_endian();
> > vdev->use_guest_notifier_mask = true;
> > + vdev->shmem_list = NULL;
> > + vdev->n_shmem_regions = 0;
> > }
> >
> > /*
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 7d5ffdc145..16d598aadc 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -165,6 +165,9 @@ struct VirtIODevice
> > */
> > EventNotifier config_notifier;
> > bool device_iotlb_enabled;
> > + /* Shared memory region for vhost-user mappings. */
> > + MemoryRegion *shmem_list;
> > + int n_shmem_regions;
> > };
> >
> > struct VirtioDeviceClass {
> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> *vq);
> >
> > int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> > +
> > extern const VMStateInfo virtio_vmstate_info;
> >
> > #define VMSTATE_VIRTIO_DEVICE \
> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> b/subprojects/libvhost-user/libvhost-user.c
> > index a879149fef..28556d183a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN])
> > return vu_send_message(dev, &msg);
> > }
> >
> > +bool
> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len, uint64_t flags)
> > +{
> > + bool result = false;
> > + VhostUserMsg msg_reply;
> > + VhostUserMsg vmsg = {
> > + .request = VHOST_USER_BACKEND_SHMEM_MAP,
> > + .size = sizeof(vmsg.payload.mmap),
> > + .flags = VHOST_USER_VERSION,
> > + .payload.mmap = {
> > + .shmid = shmid,
> > + .fd_offset = fd_offset,
> > + .shm_offset = shm_offset,
> > + .len = len,
> > + .flags = flags,
> > + },
> > + };
> > +
> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > + }
> > +
> > + pthread_mutex_lock(&dev->backend_mutex);
> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > + pthread_mutex_unlock(&dev->backend_mutex);
> > + return false;
> > + }
> > +
> > + /* Also unlocks the backend_mutex */
> > + return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> > +bool
> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len)
> > +{
> > + bool result = false;
> > + VhostUserMsg msg_reply;
> > + VhostUserMsg vmsg = {
> > + .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> > + .size = sizeof(vmsg.payload.mmap),
> > + .flags = VHOST_USER_VERSION,
> > + .payload.mmap = {
> > + .shmid = shmid,
> > + .fd_offset = fd_offset,
>
> What is the meaning of this field? I expected it to be set to 0.
>
> > + .shm_offset = shm_offset,
> > + .len = len,
> > + },
> > + };
> > +
> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > + }
> > +
> > + pthread_mutex_lock(&dev->backend_mutex);
> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > + pthread_mutex_unlock(&dev->backend_mutex);
> > + return false;
> > + }
> > +
> > + /* Also unlocks the backend_mutex */
> > + return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> > static bool
> > vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> > {
> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> b/subprojects/libvhost-user/libvhost-user.h
> > index deb40e77b3..7f6c22cc1a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.h
> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > VHOST_USER_BACKEND_MAX
> > } VhostUserBackendRequest;
> >
> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> > unsigned char uuid[UUID_LEN];
> > } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > + /* VIRTIO Shared Memory Region ID */
> > + uint8_t shmid;
> > + uint8_t padding[7];
> > + /* File offset */
> > + uint64_t fd_offset;
> > + /* Offset within the VIRTIO Shared Memory Region */
> > + uint64_t shm_offset;
> > + /* Size of the mapping */
> > + uint64_t len;
> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > + uint64_t flags;
> > +} VhostUserMMap;
> > +
> > #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> > # define VU_PACKED __attribute__((gcc_struct, packed))
> > #else
> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> > VhostUserVringArea area;
> > VhostUserInflight inflight;
> > VhostUserShared object;
> > + VhostUserMMap mmap;
> > } payload;
> >
> > int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN]);
> > */
> > bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> >
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + * @flags: Flags for the mmap operation
> > + *
> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
> Region.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len, uint64_t flags);
> > +
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + *
> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> Region
> > + * with the requested `shmid`.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len);
> > +
> > /**
> > * vu_queue_set_notification:
> > * @dev: a VuDev context
> > --
> > 2.45.2
> >
>
[-- Attachment #2: Type: text/html, Size: 23955 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-09-03 9:54 ` Albert Esteve
@ 2024-09-03 11:54 ` Albert Esteve
2024-09-05 16:45 ` Stefan Hajnoczi
0 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-09-03 11:54 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 20522 bytes --]
On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com> wrote:
>
>
> On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
>
>> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
>> > Add SHMEM_MAP/UNMAP requests to vhost-user to
>> > handle VIRTIO Shared Memory mappings.
>> >
>> > This request allows backends to dynamically map
>> > fds into a VIRTIO Shared Memory Region indentified
>> > by its `shmid`. Then, the fd memory is advertised
>> > to the driver as a base addres + offset, so it
>> > can be read/written (depending on the mmap flags
>> > requested) while its valid.
>> >
>> > The backend can munmap the memory range
>> > in a given VIRTIO Shared Memory Region (again,
>> > identified by its `shmid`), to free it. Upon
>> > receiving this message, the front-end must
>> > mmap the regions with PROT_NONE to reserve
>> > the virtual memory space.
>> >
>> > The device model needs to create MemoryRegion
>> > instances for the VIRTIO Shared Memory Regions
>> > and add them to the `VirtIODevice` instance.
>> >
>> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
>> > ---
>> > docs/interop/vhost-user.rst | 27 +++++
>> > hw/virtio/vhost-user.c | 122 ++++++++++++++++++++++
>> > hw/virtio/virtio.c | 12 +++
>> > include/hw/virtio/virtio.h | 5 +
>> > subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
>> > subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
>> > 6 files changed, 284 insertions(+)
>> >
>> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
>> > index d8419fd2f1..d52ba719d5 100644
>> > --- a/docs/interop/vhost-user.rst
>> > +++ b/docs/interop/vhost-user.rst
>> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
>> > when the operation is successful, or non-zero otherwise. Note that
>> if the
>> > operation fails, no fd is sent to the backend.
>> >
>> > +``VHOST_USER_BACKEND_SHMEM_MAP``
>> > + :id: 9
>> > + :equivalent ioctl: N/A
>> > + :request payload: fd and ``struct VhostUserMMap``
>> > + :reply payload: N/A
>> > +
>> > + This message can be submitted by the backends to advertise a new
>> mapping
>> > + to be made in a given VIRTIO Shared Memory Region. Upon receiving
>> the message,
>> > + The front-end will mmap the given fd into the VIRTIO Shared Memory
>> Region
>> > + with the requested ``shmid``. A reply is generated indicating
>> whether mapping
>> > + succeeded.
>> > +
>> > + Mapping over an already existing map is not allowed and request
>> shall fail.
>> > + Therefore, the memory range in the request must correspond with a
>> valid,
>> > + free region of the VIRTIO Shared Memory Region.
>> > +
>> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
>> > + :id: 10
>> > + :equivalent ioctl: N/A
>> > + :request payload: ``struct VhostUserMMap``
>> > + :reply payload: N/A
>> > +
>> > + This message can be submitted by the backends so that the front-end
>> un-mmap
>> > + a given range (``offset``, ``len``) in the VIRTIO Shared Memory
>> Region with
>>
>> s/offset/shm_offset/
>>
>> > + the requested ``shmid``.
>>
>> Please clarify that <offset, len> must correspond to the entirety of a
>> valid mapped region.
>>
>> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
>> DAX Window:
>>
>> When a FUSE_SETUPMAPPING request perfectly overlaps a previous
>> mapping, the previous mapping is replaced. When a mapping partially
>> overlaps a previous mapping, the previous mapping is split into one or
>> two smaller mappings. When a mapping is partially unmapped it is also
>> split into one or two smaller mappings.
>>
>> Establishing new mappings or splitting existing mappings consumes
>> resources. If the device runs out of resources the FUSE_SETUPMAPPING
>> request fails until resources are available again following
>> FUSE_REMOVEMAPPING.
>>
>> I think SETUPMAPPING/REMOVMAPPING can be implemented using
>> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
>> partial ranges, but as far as I know that's not necessary for virtiofs
>> in practice.
>>
>> It's worth mentioning that mappings consume resources and that SHMEM_MAP
>> can fail when there are no resources available. The process-wide limit
>> is vm.max_map_count on Linux although a vhost-user frontend may reduce
>> it further to control vhost-user resource usage.
>>
>> > + A reply is generated indicating whether unmapping succeeded.
>> > +
>> > .. _reply_ack:
>> >
>> > VHOST_USER_PROTOCOL_F_REPLY_ACK
>> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> > index cdf9af4a4b..7ee8a472c6 100644
>> > --- a/hw/virtio/vhost-user.c
>> > +++ b/hw/virtio/vhost-user.c
>> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
>> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> > VHOST_USER_BACKEND_MAX
>> > } VhostUserBackendRequest;
>> >
>> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
>> > unsigned char uuid[16];
>> > } VhostUserShared;
>> >
>> > +/* For the flags field of VhostUserMMap */
>> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > +
>> > +typedef struct {
>> > + /* VIRTIO Shared Memory Region ID */
>> > + uint8_t shmid;
>> > + uint8_t padding[7];
>> > + /* File offset */
>> > + uint64_t fd_offset;
>> > + /* Offset within the VIRTIO Shared Memory Region */
>> > + uint64_t shm_offset;
>> > + /* Size of the mapping */
>> > + uint64_t len;
>> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > + uint64_t flags;
>> > +} VhostUserMMap;
>> > +
>> > typedef struct {
>> > VhostUserRequest request;
>> >
>> > @@ -224,6 +244,7 @@ typedef union {
>> > VhostUserInflight inflight;
>> > VhostUserShared object;
>> > VhostUserTransferDeviceState transfer_state;
>> > + VhostUserMMap mmap;
>> > } VhostUserPayload;
>> >
>> > typedef struct VhostUserMsg {
>> > @@ -1748,6 +1769,100 @@
>> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
>> > return 0;
>> > }
>> >
>> > +static int
>> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
>> > + VhostUserMMap *vu_mmap,
>> > + int fd)
>> > +{
>> > + void *addr = 0;
>> > + MemoryRegion *mr = NULL;
>> > +
>> > + if (fd < 0) {
>> > + error_report("Bad fd for map");
>> > + return -EBADF;
>> > + }
>> > +
>> > + if (!dev->vdev->shmem_list ||
>> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > + error_report("Device only has %d VIRTIO Shared Memory Regions.
>> "
>> > + "Requested ID: %d",
>> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > + return -EFAULT;
>> > + }
>> > +
>> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > +
>> > + if (!mr) {
>> > + error_report("VIRTIO Shared Memory Region at "
>> > + "ID %d unitialized", vu_mmap->shmid);
>> > + return -EFAULT;
>> > + }
>> > +
>> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > + vu_mmap->shm_offset, vu_mmap->len);
>> > + return -EFAULT;
>> > + }
>> > +
>> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > +
>> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>>
>> Missing check for overlap between range [shm_offset, shm_offset + len)
>> and existing mappings.
>>
>
> Not sure how to do this check. Specifically, I am not sure how previous
> ranges are stored within the MemoryRegion. Is looping through
> mr->subregions
> a valid option?
>
Maybe something like this would do?
```
if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps "
"with previously mapped memory",
vu_mmap->shm_offset, vu_mmap->len);
return -EFAULT;
}
```
>
>
>>
>> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
>> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
>> > + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
>> > +
>> > + if (addr == MAP_FAILED) {
>> > + error_report("Failed to mmap mem fd");
>> > + return -EFAULT;
>> > + }
>> > +
>> > + return 0;
>> > +}
>> > +
>> > +static int
>> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
>> > + VhostUserMMap *vu_mmap)
>> > +{
>> > + void *addr = 0;
>> > + MemoryRegion *mr = NULL;
>> > +
>> > + if (!dev->vdev->shmem_list ||
>> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > + error_report("Device only has %d VIRTIO Shared Memory Regions.
>> "
>> > + "Requested ID: %d",
>> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > + return -EFAULT;
>> > + }
>> > +
>> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > +
>> > + if (!mr) {
>> > + error_report("VIRTIO Shared Memory Region at "
>> > + "ID %d unitialized", vu_mmap->shmid);
>> > + return -EFAULT;
>> > + }
>> > +
>> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > + vu_mmap->shm_offset, vu_mmap->len);
>> > + return -EFAULT;
>> > + }
>> > +
>> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > +
>> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>>
>> Missing check for existing mapping with exact range [shm_offset, len)
>> match.
>>
>> > + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
>> -1, 0);
>> > +
>> > + if (addr == MAP_FAILED) {
>> > + error_report("Failed to unmap memory");
>> > + return -EFAULT;
>> > + }
>> > +
>> > + return 0;
>> > +}
>> > +
>> > static void close_backend_channel(struct vhost_user *u)
>> > {
>> > g_source_destroy(u->backend_src);
>> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
>> GIOCondition condition,
>> > ret =
>> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
>> > &hdr,
>> &payload);
>> > break;
>> > + case VHOST_USER_BACKEND_SHMEM_MAP:
>> > + ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
>> > + fd ? fd[0] : -1);
>> > + break;
>> > + case VHOST_USER_BACKEND_SHMEM_UNMAP:
>> > + ret = vhost_user_backend_handle_shmem_unmap(dev,
>> &payload.mmap);
>> > + break;
>> > default:
>> > error_report("Received unexpected msg type: %d.", hdr.request);
>> > ret = -EINVAL;
>> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> > index 893a072c9d..9f2da5b11e 100644
>> > --- a/hw/virtio/virtio.c
>> > +++ b/hw/virtio/virtio.c
>> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
>> > return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
>> > }
>> >
>> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
>> > +{
>> > + MemoryRegion *mr = g_new0(MemoryRegion, 1);
>> > + ++vdev->n_shmem_regions;
>> > + vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
>> > + vdev->n_shmem_regions);
>>
>> Where is shmem_list freed?
>>
>> The name "list" is misleading since this is an array, not a list.
>>
>> > + vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
>> > + return mr;
>> > +}
>>
>> This looks weird. The contents of mr are copied into shmem_list[] and
>> then the pointer to mr is returned? Did you mean for the field's type to
>> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
>> stash the pointer?
>>
>> > +
>> > /* A wrapper for use as a VMState .put function */
>> > static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
>> > const VMStateField *field, JSONWriter
>> *vmdesc)
>> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
>> device_id, size_t config_size)
>> > virtio_vmstate_change, vdev);
>> > vdev->device_endian = virtio_default_endian();
>> > vdev->use_guest_notifier_mask = true;
>> > + vdev->shmem_list = NULL;
>> > + vdev->n_shmem_regions = 0;
>> > }
>> >
>> > /*
>> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> > index 7d5ffdc145..16d598aadc 100644
>> > --- a/include/hw/virtio/virtio.h
>> > +++ b/include/hw/virtio/virtio.h
>> > @@ -165,6 +165,9 @@ struct VirtIODevice
>> > */
>> > EventNotifier config_notifier;
>> > bool device_iotlb_enabled;
>> > + /* Shared memory region for vhost-user mappings. */
>> > + MemoryRegion *shmem_list;
>> > + int n_shmem_regions;
>> > };
>> >
>> > struct VirtioDeviceClass {
>> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
>> *vq);
>> >
>> > int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>> >
>> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
>> > +
>> > extern const VMStateInfo virtio_vmstate_info;
>> >
>> > #define VMSTATE_VIRTIO_DEVICE \
>> > diff --git a/subprojects/libvhost-user/libvhost-user.c
>> b/subprojects/libvhost-user/libvhost-user.c
>> > index a879149fef..28556d183a 100644
>> > --- a/subprojects/libvhost-user/libvhost-user.c
>> > +++ b/subprojects/libvhost-user/libvhost-user.c
>> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
>> uuid[UUID_LEN])
>> > return vu_send_message(dev, &msg);
>> > }
>> >
>> > +bool
>> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > + uint64_t shm_offset, uint64_t len, uint64_t flags)
>> > +{
>> > + bool result = false;
>> > + VhostUserMsg msg_reply;
>> > + VhostUserMsg vmsg = {
>> > + .request = VHOST_USER_BACKEND_SHMEM_MAP,
>> > + .size = sizeof(vmsg.payload.mmap),
>> > + .flags = VHOST_USER_VERSION,
>> > + .payload.mmap = {
>> > + .shmid = shmid,
>> > + .fd_offset = fd_offset,
>> > + .shm_offset = shm_offset,
>> > + .len = len,
>> > + .flags = flags,
>> > + },
>> > + };
>> > +
>> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> {
>> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > + }
>> > +
>> > + pthread_mutex_lock(&dev->backend_mutex);
>> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > + pthread_mutex_unlock(&dev->backend_mutex);
>> > + return false;
>> > + }
>> > +
>> > + /* Also unlocks the backend_mutex */
>> > + return vu_process_message_reply(dev, &vmsg);
>> > +}
>> > +
>> > +bool
>> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > + uint64_t shm_offset, uint64_t len)
>> > +{
>> > + bool result = false;
>> > + VhostUserMsg msg_reply;
>> > + VhostUserMsg vmsg = {
>> > + .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
>> > + .size = sizeof(vmsg.payload.mmap),
>> > + .flags = VHOST_USER_VERSION,
>> > + .payload.mmap = {
>> > + .shmid = shmid,
>> > + .fd_offset = fd_offset,
>>
>> What is the meaning of this field? I expected it to be set to 0.
>>
>> > + .shm_offset = shm_offset,
>> > + .len = len,
>> > + },
>> > + };
>> > +
>> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> {
>> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > + }
>> > +
>> > + pthread_mutex_lock(&dev->backend_mutex);
>> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > + pthread_mutex_unlock(&dev->backend_mutex);
>> > + return false;
>> > + }
>> > +
>> > + /* Also unlocks the backend_mutex */
>> > + return vu_process_message_reply(dev, &vmsg);
>> > +}
>> > +
>> > static bool
>> > vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>> > {
>> > diff --git a/subprojects/libvhost-user/libvhost-user.h
>> b/subprojects/libvhost-user/libvhost-user.h
>> > index deb40e77b3..7f6c22cc1a 100644
>> > --- a/subprojects/libvhost-user/libvhost-user.h
>> > +++ b/subprojects/libvhost-user/libvhost-user.h
>> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
>> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> > VHOST_USER_BACKEND_MAX
>> > } VhostUserBackendRequest;
>> >
>> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
>> > unsigned char uuid[UUID_LEN];
>> > } VhostUserShared;
>> >
>> > +/* For the flags field of VhostUserMMap */
>> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > +
>> > +typedef struct {
>> > + /* VIRTIO Shared Memory Region ID */
>> > + uint8_t shmid;
>> > + uint8_t padding[7];
>> > + /* File offset */
>> > + uint64_t fd_offset;
>> > + /* Offset within the VIRTIO Shared Memory Region */
>> > + uint64_t shm_offset;
>> > + /* Size of the mapping */
>> > + uint64_t len;
>> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > + uint64_t flags;
>> > +} VhostUserMMap;
>> > +
>> > #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
>> > # define VU_PACKED __attribute__((gcc_struct, packed))
>> > #else
>> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
>> > VhostUserVringArea area;
>> > VhostUserInflight inflight;
>> > VhostUserShared object;
>> > + VhostUserMMap mmap;
>> > } payload;
>> >
>> > int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
>> char uuid[UUID_LEN]);
>> > */
>> > bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
>> >
>> > +/**
>> > + * vu_shmem_map:
>> > + * @dev: a VuDev context
>> > + * @shmid: VIRTIO Shared Memory Region ID
>> > + * @fd_offset: File offset
>> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > + * @len: Size of the mapping
>> > + * @flags: Flags for the mmap operation
>> > + *
>> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
>> Region.
>> > + *
>> > + * Returns: TRUE on success, FALSE on failure.
>> > + */
>> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > + uint64_t shm_offset, uint64_t len, uint64_t flags);
>> > +
>> > +/**
>> > + * vu_shmem_map:
>> > + * @dev: a VuDev context
>> > + * @shmid: VIRTIO Shared Memory Region ID
>> > + * @fd_offset: File offset
>> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > + * @len: Size of the mapping
>> > + *
>> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
>> Region
>> > + * with the requested `shmid`.
>> > + *
>> > + * Returns: TRUE on success, FALSE on failure.
>> > + */
>> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > + uint64_t shm_offset, uint64_t len);
>> > +
>> > /**
>> > * vu_queue_set_notification:
>> > * @dev: a VuDev context
>> > --
>> > 2.45.2
>> >
>>
>
[-- Attachment #2: Type: text/html, Size: 25022 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-09-03 11:54 ` Albert Esteve
@ 2024-09-05 16:45 ` Stefan Hajnoczi
2024-09-11 11:57 ` Albert Esteve
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-09-05 16:45 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 22300 bytes --]
On Tue, Sep 03, 2024 at 01:54:12PM +0200, Albert Esteve wrote:
> On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com> wrote:
>
> >
> >
> > On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
> > wrote:
> >
> >> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> >> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> >> > handle VIRTIO Shared Memory mappings.
> >> >
> >> > This request allows backends to dynamically map
> >> > fds into a VIRTIO Shared Memory Region indentified
> >> > by its `shmid`. Then, the fd memory is advertised
> >> > to the driver as a base addres + offset, so it
> >> > can be read/written (depending on the mmap flags
> >> > requested) while its valid.
> >> >
> >> > The backend can munmap the memory range
> >> > in a given VIRTIO Shared Memory Region (again,
> >> > identified by its `shmid`), to free it. Upon
> >> > receiving this message, the front-end must
> >> > mmap the regions with PROT_NONE to reserve
> >> > the virtual memory space.
> >> >
> >> > The device model needs to create MemoryRegion
> >> > instances for the VIRTIO Shared Memory Regions
> >> > and add them to the `VirtIODevice` instance.
> >> >
> >> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> >> > ---
> >> > docs/interop/vhost-user.rst | 27 +++++
> >> > hw/virtio/vhost-user.c | 122 ++++++++++++++++++++++
> >> > hw/virtio/virtio.c | 12 +++
> >> > include/hw/virtio/virtio.h | 5 +
> >> > subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
> >> > subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
> >> > 6 files changed, 284 insertions(+)
> >> >
> >> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >> > index d8419fd2f1..d52ba719d5 100644
> >> > --- a/docs/interop/vhost-user.rst
> >> > +++ b/docs/interop/vhost-user.rst
> >> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> >> > when the operation is successful, or non-zero otherwise. Note that
> >> if the
> >> > operation fails, no fd is sent to the backend.
> >> >
> >> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> >> > + :id: 9
> >> > + :equivalent ioctl: N/A
> >> > + :request payload: fd and ``struct VhostUserMMap``
> >> > + :reply payload: N/A
> >> > +
> >> > + This message can be submitted by the backends to advertise a new
> >> mapping
> >> > + to be made in a given VIRTIO Shared Memory Region. Upon receiving
> >> the message,
> >> > + The front-end will mmap the given fd into the VIRTIO Shared Memory
> >> Region
> >> > + with the requested ``shmid``. A reply is generated indicating
> >> whether mapping
> >> > + succeeded.
> >> > +
> >> > + Mapping over an already existing map is not allowed and request
> >> shall fail.
> >> > + Therefore, the memory range in the request must correspond with a
> >> valid,
> >> > + free region of the VIRTIO Shared Memory Region.
> >> > +
> >> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> >> > + :id: 10
> >> > + :equivalent ioctl: N/A
> >> > + :request payload: ``struct VhostUserMMap``
> >> > + :reply payload: N/A
> >> > +
> >> > + This message can be submitted by the backends so that the front-end
> >> un-mmap
> >> > + a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> >> Region with
> >>
> >> s/offset/shm_offset/
> >>
> >> > + the requested ``shmid``.
> >>
> >> Please clarify that <offset, len> must correspond to the entirety of a
> >> valid mapped region.
> >>
> >> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
> >> DAX Window:
> >>
> >> When a FUSE_SETUPMAPPING request perfectly overlaps a previous
> >> mapping, the previous mapping is replaced. When a mapping partially
> >> overlaps a previous mapping, the previous mapping is split into one or
> >> two smaller mappings. When a mapping is partially unmapped it is also
> >> split into one or two smaller mappings.
> >>
> >> Establishing new mappings or splitting existing mappings consumes
> >> resources. If the device runs out of resources the FUSE_SETUPMAPPING
> >> request fails until resources are available again following
> >> FUSE_REMOVEMAPPING.
> >>
> >> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> >> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> >> partial ranges, but as far as I know that's not necessary for virtiofs
> >> in practice.
> >>
> >> It's worth mentioning that mappings consume resources and that SHMEM_MAP
> >> can fail when there are no resources available. The process-wide limit
> >> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> >> it further to control vhost-user resource usage.
> >>
> >> > + A reply is generated indicating whether unmapping succeeded.
> >> > +
> >> > .. _reply_ack:
> >> >
> >> > VHOST_USER_PROTOCOL_F_REPLY_ACK
> >> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> >> > index cdf9af4a4b..7ee8a472c6 100644
> >> > --- a/hw/virtio/vhost-user.c
> >> > +++ b/hw/virtio/vhost-user.c
> >> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> >> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> >> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >> > VHOST_USER_BACKEND_MAX
> >> > } VhostUserBackendRequest;
> >> >
> >> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> >> > unsigned char uuid[16];
> >> > } VhostUserShared;
> >> >
> >> > +/* For the flags field of VhostUserMMap */
> >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> >> > +
> >> > +typedef struct {
> >> > + /* VIRTIO Shared Memory Region ID */
> >> > + uint8_t shmid;
> >> > + uint8_t padding[7];
> >> > + /* File offset */
> >> > + uint64_t fd_offset;
> >> > + /* Offset within the VIRTIO Shared Memory Region */
> >> > + uint64_t shm_offset;
> >> > + /* Size of the mapping */
> >> > + uint64_t len;
> >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> >> > + uint64_t flags;
> >> > +} VhostUserMMap;
> >> > +
> >> > typedef struct {
> >> > VhostUserRequest request;
> >> >
> >> > @@ -224,6 +244,7 @@ typedef union {
> >> > VhostUserInflight inflight;
> >> > VhostUserShared object;
> >> > VhostUserTransferDeviceState transfer_state;
> >> > + VhostUserMMap mmap;
> >> > } VhostUserPayload;
> >> >
> >> > typedef struct VhostUserMsg {
> >> > @@ -1748,6 +1769,100 @@
> >> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> >> > return 0;
> >> > }
> >> >
> >> > +static int
> >> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> >> > + VhostUserMMap *vu_mmap,
> >> > + int fd)
> >> > +{
> >> > + void *addr = 0;
> >> > + MemoryRegion *mr = NULL;
> >> > +
> >> > + if (fd < 0) {
> >> > + error_report("Bad fd for map");
> >> > + return -EBADF;
> >> > + }
> >> > +
> >> > + if (!dev->vdev->shmem_list ||
> >> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> >> > + error_report("Device only has %d VIRTIO Shared Memory Regions.
> >> "
> >> > + "Requested ID: %d",
> >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> >> > +
> >> > + if (!mr) {
> >> > + error_report("VIRTIO Shared Memory Region at "
> >> > + "ID %d unitialized", vu_mmap->shmid);
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> >> > + vu_mmap->shm_offset, vu_mmap->len);
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> >> > +
> >> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> >>
> >> Missing check for overlap between range [shm_offset, shm_offset + len)
> >> and existing mappings.
> >>
> >
> > Not sure how to do this check. Specifically, I am not sure how previous
> > ranges are stored within the MemoryRegion. Is looping through
> > mr->subregions
> > a valid option?
> >
>
> Maybe something like this would do?
> ```
> if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
> error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps "
> "with previously mapped memory",
> vu_mmap->shm_offset, vu_mmap->len);
> return -EFAULT;
> }
> ```
I don't think that works because the QEMU MemoryRegion covers the entire
range, some of which contains mappings and some of which is empty. It
would be necessary to track mappings that have been made.
I'm not aware of a security implication if the overlap check is missing,
so I guess it may be okay to skip it and rely on the vhost-user back-end
author to honor the spec. I'm not totally against that because it's
faster and less code, but it feels a bit iffy to not enforce the input
validation that the spec requires.
Maintain a list of mappings so this check can be performed?
>
> >
> >
> >>
> >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> >> > + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> >> > +
> >> > + if (addr == MAP_FAILED) {
> >> > + error_report("Failed to mmap mem fd");
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + return 0;
> >> > +}
> >> > +
> >> > +static int
> >> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> >> > + VhostUserMMap *vu_mmap)
> >> > +{
> >> > + void *addr = 0;
> >> > + MemoryRegion *mr = NULL;
> >> > +
> >> > + if (!dev->vdev->shmem_list ||
> >> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> >> > + error_report("Device only has %d VIRTIO Shared Memory Regions.
> >> "
> >> > + "Requested ID: %d",
> >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> >> > +
> >> > + if (!mr) {
> >> > + error_report("VIRTIO Shared Memory Region at "
> >> > + "ID %d unitialized", vu_mmap->shmid);
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> >> > + vu_mmap->shm_offset, vu_mmap->len);
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> >> > +
> >> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> >>
> >> Missing check for existing mapping with exact range [shm_offset, len)
> >> match.
> >>
> >> > + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> >> -1, 0);
> >> > +
> >> > + if (addr == MAP_FAILED) {
> >> > + error_report("Failed to unmap memory");
> >> > + return -EFAULT;
> >> > + }
> >> > +
> >> > + return 0;
> >> > +}
> >> > +
> >> > static void close_backend_channel(struct vhost_user *u)
> >> > {
> >> > g_source_destroy(u->backend_src);
> >> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> >> GIOCondition condition,
> >> > ret =
> >> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> >> > &hdr,
> >> &payload);
> >> > break;
> >> > + case VHOST_USER_BACKEND_SHMEM_MAP:
> >> > + ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> >> > + fd ? fd[0] : -1);
> >> > + break;
> >> > + case VHOST_USER_BACKEND_SHMEM_UNMAP:
> >> > + ret = vhost_user_backend_handle_shmem_unmap(dev,
> >> &payload.mmap);
> >> > + break;
> >> > default:
> >> > error_report("Received unexpected msg type: %d.", hdr.request);
> >> > ret = -EINVAL;
> >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> > index 893a072c9d..9f2da5b11e 100644
> >> > --- a/hw/virtio/virtio.c
> >> > +++ b/hw/virtio/virtio.c
> >> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> >> > return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> >> > }
> >> >
> >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> >> > +{
> >> > + MemoryRegion *mr = g_new0(MemoryRegion, 1);
> >> > + ++vdev->n_shmem_regions;
> >> > + vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> >> > + vdev->n_shmem_regions);
> >>
> >> Where is shmem_list freed?
> >>
> >> The name "list" is misleading since this is an array, not a list.
> >>
> >> > + vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> >> > + return mr;
> >> > +}
> >>
> >> This looks weird. The contents of mr are copied into shmem_list[] and
> >> then the pointer to mr is returned? Did you mean for the field's type to
> >> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> >> stash the pointer?
> >>
> >> > +
> >> > /* A wrapper for use as a VMState .put function */
> >> > static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> >> > const VMStateField *field, JSONWriter
> >> *vmdesc)
> >> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> >> device_id, size_t config_size)
> >> > virtio_vmstate_change, vdev);
> >> > vdev->device_endian = virtio_default_endian();
> >> > vdev->use_guest_notifier_mask = true;
> >> > + vdev->shmem_list = NULL;
> >> > + vdev->n_shmem_regions = 0;
> >> > }
> >> >
> >> > /*
> >> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> > index 7d5ffdc145..16d598aadc 100644
> >> > --- a/include/hw/virtio/virtio.h
> >> > +++ b/include/hw/virtio/virtio.h
> >> > @@ -165,6 +165,9 @@ struct VirtIODevice
> >> > */
> >> > EventNotifier config_notifier;
> >> > bool device_iotlb_enabled;
> >> > + /* Shared memory region for vhost-user mappings. */
> >> > + MemoryRegion *shmem_list;
> >> > + int n_shmem_regions;
> >> > };
> >> >
> >> > struct VirtioDeviceClass {
> >> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> >> *vq);
> >> >
> >> > int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> >> >
> >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> >> > +
> >> > extern const VMStateInfo virtio_vmstate_info;
> >> >
> >> > #define VMSTATE_VIRTIO_DEVICE \
> >> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> >> b/subprojects/libvhost-user/libvhost-user.c
> >> > index a879149fef..28556d183a 100644
> >> > --- a/subprojects/libvhost-user/libvhost-user.c
> >> > +++ b/subprojects/libvhost-user/libvhost-user.c
> >> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> >> uuid[UUID_LEN])
> >> > return vu_send_message(dev, &msg);
> >> > }
> >> >
> >> > +bool
> >> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > + uint64_t shm_offset, uint64_t len, uint64_t flags)
> >> > +{
> >> > + bool result = false;
> >> > + VhostUserMsg msg_reply;
> >> > + VhostUserMsg vmsg = {
> >> > + .request = VHOST_USER_BACKEND_SHMEM_MAP,
> >> > + .size = sizeof(vmsg.payload.mmap),
> >> > + .flags = VHOST_USER_VERSION,
> >> > + .payload.mmap = {
> >> > + .shmid = shmid,
> >> > + .fd_offset = fd_offset,
> >> > + .shm_offset = shm_offset,
> >> > + .len = len,
> >> > + .flags = flags,
> >> > + },
> >> > + };
> >> > +
> >> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
> >> {
> >> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> >> > + }
> >> > +
> >> > + pthread_mutex_lock(&dev->backend_mutex);
> >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> >> > + pthread_mutex_unlock(&dev->backend_mutex);
> >> > + return false;
> >> > + }
> >> > +
> >> > + /* Also unlocks the backend_mutex */
> >> > + return vu_process_message_reply(dev, &vmsg);
> >> > +}
> >> > +
> >> > +bool
> >> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > + uint64_t shm_offset, uint64_t len)
> >> > +{
> >> > + bool result = false;
> >> > + VhostUserMsg msg_reply;
> >> > + VhostUserMsg vmsg = {
> >> > + .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> >> > + .size = sizeof(vmsg.payload.mmap),
> >> > + .flags = VHOST_USER_VERSION,
> >> > + .payload.mmap = {
> >> > + .shmid = shmid,
> >> > + .fd_offset = fd_offset,
> >>
> >> What is the meaning of this field? I expected it to be set to 0.
> >>
> >> > + .shm_offset = shm_offset,
> >> > + .len = len,
> >> > + },
> >> > + };
> >> > +
> >> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
> >> {
> >> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> >> > + }
> >> > +
> >> > + pthread_mutex_lock(&dev->backend_mutex);
> >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> >> > + pthread_mutex_unlock(&dev->backend_mutex);
> >> > + return false;
> >> > + }
> >> > +
> >> > + /* Also unlocks the backend_mutex */
> >> > + return vu_process_message_reply(dev, &vmsg);
> >> > +}
> >> > +
> >> > static bool
> >> > vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> >> > {
> >> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> >> b/subprojects/libvhost-user/libvhost-user.h
> >> > index deb40e77b3..7f6c22cc1a 100644
> >> > --- a/subprojects/libvhost-user/libvhost-user.h
> >> > +++ b/subprojects/libvhost-user/libvhost-user.h
> >> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> >> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> >> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >> > VHOST_USER_BACKEND_MAX
> >> > } VhostUserBackendRequest;
> >> >
> >> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> >> > unsigned char uuid[UUID_LEN];
> >> > } VhostUserShared;
> >> >
> >> > +/* For the flags field of VhostUserMMap */
> >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> >> > +
> >> > +typedef struct {
> >> > + /* VIRTIO Shared Memory Region ID */
> >> > + uint8_t shmid;
> >> > + uint8_t padding[7];
> >> > + /* File offset */
> >> > + uint64_t fd_offset;
> >> > + /* Offset within the VIRTIO Shared Memory Region */
> >> > + uint64_t shm_offset;
> >> > + /* Size of the mapping */
> >> > + uint64_t len;
> >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> >> > + uint64_t flags;
> >> > +} VhostUserMMap;
> >> > +
> >> > #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> >> > # define VU_PACKED __attribute__((gcc_struct, packed))
> >> > #else
> >> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> >> > VhostUserVringArea area;
> >> > VhostUserInflight inflight;
> >> > VhostUserShared object;
> >> > + VhostUserMMap mmap;
> >> > } payload;
> >> >
> >> > int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> >> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
> >> char uuid[UUID_LEN]);
> >> > */
> >> > bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> >> >
> >> > +/**
> >> > + * vu_shmem_map:
> >> > + * @dev: a VuDev context
> >> > + * @shmid: VIRTIO Shared Memory Region ID
> >> > + * @fd_offset: File offset
> >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> >> > + * @len: Size of the mapping
> >> > + * @flags: Flags for the mmap operation
> >> > + *
> >> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
> >> Region.
> >> > + *
> >> > + * Returns: TRUE on success, FALSE on failure.
> >> > + */
> >> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > + uint64_t shm_offset, uint64_t len, uint64_t flags);
> >> > +
> >> > +/**
> >> > + * vu_shmem_map:
> >> > + * @dev: a VuDev context
> >> > + * @shmid: VIRTIO Shared Memory Region ID
> >> > + * @fd_offset: File offset
> >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> >> > + * @len: Size of the mapping
> >> > + *
> >> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> >> Region
> >> > + * with the requested `shmid`.
> >> > + *
> >> > + * Returns: TRUE on success, FALSE on failure.
> >> > + */
> >> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > + uint64_t shm_offset, uint64_t len);
> >> > +
> >> > /**
> >> > * vu_queue_set_notification:
> >> > * @dev: a VuDev context
> >> > --
> >> > 2.45.2
> >> >
> >>
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-09-05 16:45 ` Stefan Hajnoczi
@ 2024-09-11 11:57 ` Albert Esteve
2024-09-11 14:54 ` Stefan Hajnoczi
0 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-09-11 11:57 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 23928 bytes --]
On Thu, Sep 5, 2024 at 6:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Tue, Sep 03, 2024 at 01:54:12PM +0200, Albert Esteve wrote:
> > On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com>
> wrote:
> >
> > >
> > >
> > > On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
> > > wrote:
> > >
> > >> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> > >> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> > >> > handle VIRTIO Shared Memory mappings.
> > >> >
> > >> > This request allows backends to dynamically map
> > >> > fds into a VIRTIO Shared Memory Region indentified
> > >> > by its `shmid`. Then, the fd memory is advertised
> > >> > to the driver as a base addres + offset, so it
> > >> > can be read/written (depending on the mmap flags
> > >> > requested) while its valid.
> > >> >
> > >> > The backend can munmap the memory range
> > >> > in a given VIRTIO Shared Memory Region (again,
> > >> > identified by its `shmid`), to free it. Upon
> > >> > receiving this message, the front-end must
> > >> > mmap the regions with PROT_NONE to reserve
> > >> > the virtual memory space.
> > >> >
> > >> > The device model needs to create MemoryRegion
> > >> > instances for the VIRTIO Shared Memory Regions
> > >> > and add them to the `VirtIODevice` instance.
> > >> >
> > >> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > >> > ---
> > >> > docs/interop/vhost-user.rst | 27 +++++
> > >> > hw/virtio/vhost-user.c | 122
> ++++++++++++++++++++++
> > >> > hw/virtio/virtio.c | 12 +++
> > >> > include/hw/virtio/virtio.h | 5 +
> > >> > subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
> > >> > subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
> > >> > 6 files changed, 284 insertions(+)
> > >> >
> > >> > diff --git a/docs/interop/vhost-user.rst
> b/docs/interop/vhost-user.rst
> > >> > index d8419fd2f1..d52ba719d5 100644
> > >> > --- a/docs/interop/vhost-user.rst
> > >> > +++ b/docs/interop/vhost-user.rst
> > >> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> > >> > when the operation is successful, or non-zero otherwise. Note
> that
> > >> if the
> > >> > operation fails, no fd is sent to the backend.
> > >> >
> > >> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > >> > + :id: 9
> > >> > + :equivalent ioctl: N/A
> > >> > + :request payload: fd and ``struct VhostUserMMap``
> > >> > + :reply payload: N/A
> > >> > +
> > >> > + This message can be submitted by the backends to advertise a new
> > >> mapping
> > >> > + to be made in a given VIRTIO Shared Memory Region. Upon receiving
> > >> the message,
> > >> > + The front-end will mmap the given fd into the VIRTIO Shared
> Memory
> > >> Region
> > >> > + with the requested ``shmid``. A reply is generated indicating
> > >> whether mapping
> > >> > + succeeded.
> > >> > +
> > >> > + Mapping over an already existing map is not allowed and request
> > >> shall fail.
> > >> > + Therefore, the memory range in the request must correspond with a
> > >> valid,
> > >> > + free region of the VIRTIO Shared Memory Region.
> > >> > +
> > >> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > >> > + :id: 10
> > >> > + :equivalent ioctl: N/A
> > >> > + :request payload: ``struct VhostUserMMap``
> > >> > + :reply payload: N/A
> > >> > +
> > >> > + This message can be submitted by the backends so that the
> front-end
> > >> un-mmap
> > >> > + a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> > >> Region with
> > >>
> > >> s/offset/shm_offset/
> > >>
> > >> > + the requested ``shmid``.
> > >>
> > >> Please clarify that <offset, len> must correspond to the entirety of a
> > >> valid mapped region.
> > >>
> > >> By the way, the VIRTIO 1.3 gives the following behavior for the
> virtiofs
> > >> DAX Window:
> > >>
> > >> When a FUSE_SETUPMAPPING request perfectly overlaps a previous
> > >> mapping, the previous mapping is replaced. When a mapping partially
> > >> overlaps a previous mapping, the previous mapping is split into one
> or
> > >> two smaller mappings. When a mapping is partially unmapped it is
> also
> > >> split into one or two smaller mappings.
> > >>
> > >> Establishing new mappings or splitting existing mappings consumes
> > >> resources. If the device runs out of resources the FUSE_SETUPMAPPING
> > >> request fails until resources are available again following
> > >> FUSE_REMOVEMAPPING.
> > >>
> > >> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> > >> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> > >> partial ranges, but as far as I know that's not necessary for virtiofs
> > >> in practice.
> > >>
> > >> It's worth mentioning that mappings consume resources and that
> SHMEM_MAP
> > >> can fail when there are no resources available. The process-wide limit
> > >> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> > >> it further to control vhost-user resource usage.
> > >>
> > >> > + A reply is generated indicating whether unmapping succeeded.
> > >> > +
> > >> > .. _reply_ack:
> > >> >
> > >> > VHOST_USER_PROTOCOL_F_REPLY_ACK
> > >> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > >> > index cdf9af4a4b..7ee8a472c6 100644
> > >> > --- a/hw/virtio/vhost-user.c
> > >> > +++ b/hw/virtio/vhost-user.c
> > >> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > >> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > >> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > >> > VHOST_USER_BACKEND_MAX
> > >> > } VhostUserBackendRequest;
> > >> >
> > >> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> > >> > unsigned char uuid[16];
> > >> > } VhostUserShared;
> > >> >
> > >> > +/* For the flags field of VhostUserMMap */
> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > >> > +
> > >> > +typedef struct {
> > >> > + /* VIRTIO Shared Memory Region ID */
> > >> > + uint8_t shmid;
> > >> > + uint8_t padding[7];
> > >> > + /* File offset */
> > >> > + uint64_t fd_offset;
> > >> > + /* Offset within the VIRTIO Shared Memory Region */
> > >> > + uint64_t shm_offset;
> > >> > + /* Size of the mapping */
> > >> > + uint64_t len;
> > >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > >> > + uint64_t flags;
> > >> > +} VhostUserMMap;
> > >> > +
> > >> > typedef struct {
> > >> > VhostUserRequest request;
> > >> >
> > >> > @@ -224,6 +244,7 @@ typedef union {
> > >> > VhostUserInflight inflight;
> > >> > VhostUserShared object;
> > >> > VhostUserTransferDeviceState transfer_state;
> > >> > + VhostUserMMap mmap;
> > >> > } VhostUserPayload;
> > >> >
> > >> > typedef struct VhostUserMsg {
> > >> > @@ -1748,6 +1769,100 @@
> > >> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> > >> > return 0;
> > >> > }
> > >> >
> > >> > +static int
> > >> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > >> > + VhostUserMMap *vu_mmap,
> > >> > + int fd)
> > >> > +{
> > >> > + void *addr = 0;
> > >> > + MemoryRegion *mr = NULL;
> > >> > +
> > >> > + if (fd < 0) {
> > >> > + error_report("Bad fd for map");
> > >> > + return -EBADF;
> > >> > + }
> > >> > +
> > >> > + if (!dev->vdev->shmem_list ||
> > >> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > >> > + error_report("Device only has %d VIRTIO Shared Memory
> Regions.
> > >> "
> > >> > + "Requested ID: %d",
> > >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > >> > +
> > >> > + if (!mr) {
> > >> > + error_report("VIRTIO Shared Memory Region at "
> > >> > + "ID %d unitialized", vu_mmap->shmid);
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%"
> PRIx64,
> > >> > + vu_mmap->shm_offset, vu_mmap->len);
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > >> > +
> > >> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> > >>
> > >> Missing check for overlap between range [shm_offset, shm_offset + len)
> > >> and existing mappings.
> > >>
> > >
> > > Not sure how to do this check. Specifically, I am not sure how previous
> > > ranges are stored within the MemoryRegion. Is looping through
> > > mr->subregions
> > > a valid option?
> > >
> >
> > Maybe something like this would do?
> > ```
> > if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
> > error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps
> "
> > "with previously mapped memory",
> > vu_mmap->shm_offset, vu_mmap->len);
> > return -EFAULT;
> > }
> > ```
>
> I don't think that works because the QEMU MemoryRegion covers the entire
> range, some of which contains mappings and some of which is empty. It
> would be necessary to track mappings that have been made.
>
> I'm not aware of a security implication if the overlap check is missing,
> so I guess it may be okay to skip it and rely on the vhost-user back-end
> author to honor the spec. I'm not totally against that because it's
> faster and less code, but it feels a bit iffy to not enforce the input
> validation that the spec requires.
>
> Maintain a list of mappings so this check can be performed?
>
>
Ok, I prefer to aim for the better solution and see where that takes us.
So I will add a mapped_regions list or something like that to the
MemoryRegion struct in a new commit, so that it can be reviewed
independently. With the infrastructure's code in the patch we can decide if
it is worth to have it.
Thank you!
> >
> > >
> > >
> > >>
> > >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0)
> |
> > >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE :
> 0),
> > >> > + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > >> > +
> > >> > + if (addr == MAP_FAILED) {
> > >> > + error_report("Failed to mmap mem fd");
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + return 0;
> > >> > +}
> > >> > +
> > >> > +static int
> > >> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > >> > + VhostUserMMap *vu_mmap)
> > >> > +{
> > >> > + void *addr = 0;
> > >> > + MemoryRegion *mr = NULL;
> > >> > +
> > >> > + if (!dev->vdev->shmem_list ||
> > >> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > >> > + error_report("Device only has %d VIRTIO Shared Memory
> Regions.
> > >> "
> > >> > + "Requested ID: %d",
> > >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > >> > +
> > >> > + if (!mr) {
> > >> > + error_report("VIRTIO Shared Memory Region at "
> > >> > + "ID %d unitialized", vu_mmap->shmid);
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%"
> PRIx64,
> > >> > + vu_mmap->shm_offset, vu_mmap->len);
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > >> > +
> > >> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> > >>
> > >> Missing check for existing mapping with exact range [shm_offset, len)
> > >> match.
> > >>
> > >> > + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> > >> -1, 0);
> > >> > +
> > >> > + if (addr == MAP_FAILED) {
> > >> > + error_report("Failed to unmap memory");
> > >> > + return -EFAULT;
> > >> > + }
> > >> > +
> > >> > + return 0;
> > >> > +}
> > >> > +
> > >> > static void close_backend_channel(struct vhost_user *u)
> > >> > {
> > >> > g_source_destroy(u->backend_src);
> > >> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> > >> GIOCondition condition,
> > >> > ret =
> > >> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> > >> > &hdr,
> > >> &payload);
> > >> > break;
> > >> > + case VHOST_USER_BACKEND_SHMEM_MAP:
> > >> > + ret = vhost_user_backend_handle_shmem_map(dev,
> &payload.mmap,
> > >> > + fd ? fd[0] : -1);
> > >> > + break;
> > >> > + case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > >> > + ret = vhost_user_backend_handle_shmem_unmap(dev,
> > >> &payload.mmap);
> > >> > + break;
> > >> > default:
> > >> > error_report("Received unexpected msg type: %d.",
> hdr.request);
> > >> > ret = -EINVAL;
> > >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > >> > index 893a072c9d..9f2da5b11e 100644
> > >> > --- a/hw/virtio/virtio.c
> > >> > +++ b/hw/virtio/virtio.c
> > >> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile
> *f)
> > >> > return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> > >> > }
> > >> >
> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> > >> > +{
> > >> > + MemoryRegion *mr = g_new0(MemoryRegion, 1);
> > >> > + ++vdev->n_shmem_regions;
> > >> > + vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> > >> > + vdev->n_shmem_regions);
> > >>
> > >> Where is shmem_list freed?
> > >>
> > >> The name "list" is misleading since this is an array, not a list.
> > >>
> > >> > + vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> > >> > + return mr;
> > >> > +}
> > >>
> > >> This looks weird. The contents of mr are copied into shmem_list[] and
> > >> then the pointer to mr is returned? Did you mean for the field's type
> to
> > >> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> > >> stash the pointer?
> > >>
> > >> > +
> > >> > /* A wrapper for use as a VMState .put function */
> > >> > static int virtio_device_put(QEMUFile *f, void *opaque, size_t
> size,
> > >> > const VMStateField *field, JSONWriter
> > >> *vmdesc)
> > >> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> > >> device_id, size_t config_size)
> > >> > virtio_vmstate_change, vdev);
> > >> > vdev->device_endian = virtio_default_endian();
> > >> > vdev->use_guest_notifier_mask = true;
> > >> > + vdev->shmem_list = NULL;
> > >> > + vdev->n_shmem_regions = 0;
> > >> > }
> > >> >
> > >> > /*
> > >> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > >> > index 7d5ffdc145..16d598aadc 100644
> > >> > --- a/include/hw/virtio/virtio.h
> > >> > +++ b/include/hw/virtio/virtio.h
> > >> > @@ -165,6 +165,9 @@ struct VirtIODevice
> > >> > */
> > >> > EventNotifier config_notifier;
> > >> > bool device_iotlb_enabled;
> > >> > + /* Shared memory region for vhost-user mappings. */
> > >> > + MemoryRegion *shmem_list;
> > >> > + int n_shmem_regions;
> > >> > };
> > >> >
> > >> > struct VirtioDeviceClass {
> > >> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> > >> *vq);
> > >> >
> > >> > int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> > >> >
> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> > >> > +
> > >> > extern const VMStateInfo virtio_vmstate_info;
> > >> >
> > >> > #define VMSTATE_VIRTIO_DEVICE \
> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> > >> b/subprojects/libvhost-user/libvhost-user.c
> > >> > index a879149fef..28556d183a 100644
> > >> > --- a/subprojects/libvhost-user/libvhost-user.c
> > >> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > >> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> > >> uuid[UUID_LEN])
> > >> > return vu_send_message(dev, &msg);
> > >> > }
> > >> >
> > >> > +bool
> > >> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > + uint64_t shm_offset, uint64_t len, uint64_t flags)
> > >> > +{
> > >> > + bool result = false;
> > >> > + VhostUserMsg msg_reply;
> > >> > + VhostUserMsg vmsg = {
> > >> > + .request = VHOST_USER_BACKEND_SHMEM_MAP,
> > >> > + .size = sizeof(vmsg.payload.mmap),
> > >> > + .flags = VHOST_USER_VERSION,
> > >> > + .payload.mmap = {
> > >> > + .shmid = shmid,
> > >> > + .fd_offset = fd_offset,
> > >> > + .shm_offset = shm_offset,
> > >> > + .len = len,
> > >> > + .flags = flags,
> > >> > + },
> > >> > + };
> > >> > +
> > >> > + if (vu_has_protocol_feature(dev,
> VHOST_USER_PROTOCOL_F_REPLY_ACK))
> > >> {
> > >> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > >> > + }
> > >> > +
> > >> > + pthread_mutex_lock(&dev->backend_mutex);
> > >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > >> > + pthread_mutex_unlock(&dev->backend_mutex);
> > >> > + return false;
> > >> > + }
> > >> > +
> > >> > + /* Also unlocks the backend_mutex */
> > >> > + return vu_process_message_reply(dev, &vmsg);
> > >> > +}
> > >> > +
> > >> > +bool
> > >> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > + uint64_t shm_offset, uint64_t len)
> > >> > +{
> > >> > + bool result = false;
> > >> > + VhostUserMsg msg_reply;
> > >> > + VhostUserMsg vmsg = {
> > >> > + .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> > >> > + .size = sizeof(vmsg.payload.mmap),
> > >> > + .flags = VHOST_USER_VERSION,
> > >> > + .payload.mmap = {
> > >> > + .shmid = shmid,
> > >> > + .fd_offset = fd_offset,
> > >>
> > >> What is the meaning of this field? I expected it to be set to 0.
> > >>
> > >> > + .shm_offset = shm_offset,
> > >> > + .len = len,
> > >> > + },
> > >> > + };
> > >> > +
> > >> > + if (vu_has_protocol_feature(dev,
> VHOST_USER_PROTOCOL_F_REPLY_ACK))
> > >> {
> > >> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > >> > + }
> > >> > +
> > >> > + pthread_mutex_lock(&dev->backend_mutex);
> > >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > >> > + pthread_mutex_unlock(&dev->backend_mutex);
> > >> > + return false;
> > >> > + }
> > >> > +
> > >> > + /* Also unlocks the backend_mutex */
> > >> > + return vu_process_message_reply(dev, &vmsg);
> > >> > +}
> > >> > +
> > >> > static bool
> > >> > vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> > >> > {
> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> > >> b/subprojects/libvhost-user/libvhost-user.h
> > >> > index deb40e77b3..7f6c22cc1a 100644
> > >> > --- a/subprojects/libvhost-user/libvhost-user.h
> > >> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > >> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > >> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > >> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > >> > VHOST_USER_BACKEND_MAX
> > >> > } VhostUserBackendRequest;
> > >> >
> > >> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> > >> > unsigned char uuid[UUID_LEN];
> > >> > } VhostUserShared;
> > >> >
> > >> > +/* For the flags field of VhostUserMMap */
> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > >> > +
> > >> > +typedef struct {
> > >> > + /* VIRTIO Shared Memory Region ID */
> > >> > + uint8_t shmid;
> > >> > + uint8_t padding[7];
> > >> > + /* File offset */
> > >> > + uint64_t fd_offset;
> > >> > + /* Offset within the VIRTIO Shared Memory Region */
> > >> > + uint64_t shm_offset;
> > >> > + /* Size of the mapping */
> > >> > + uint64_t len;
> > >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > >> > + uint64_t flags;
> > >> > +} VhostUserMMap;
> > >> > +
> > >> > #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> > >> > # define VU_PACKED __attribute__((gcc_struct, packed))
> > >> > #else
> > >> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> > >> > VhostUserVringArea area;
> > >> > VhostUserInflight inflight;
> > >> > VhostUserShared object;
> > >> > + VhostUserMMap mmap;
> > >> > } payload;
> > >> >
> > >> > int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> > >> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
> > >> char uuid[UUID_LEN]);
> > >> > */
> > >> > bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> > >> >
> > >> > +/**
> > >> > + * vu_shmem_map:
> > >> > + * @dev: a VuDev context
> > >> > + * @shmid: VIRTIO Shared Memory Region ID
> > >> > + * @fd_offset: File offset
> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > >> > + * @len: Size of the mapping
> > >> > + * @flags: Flags for the mmap operation
> > >> > + *
> > >> > + * Advertises a new mapping to be made in a given VIRTIO Shared
> Memory
> > >> Region.
> > >> > + *
> > >> > + * Returns: TRUE on success, FALSE on failure.
> > >> > + */
> > >> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > + uint64_t shm_offset, uint64_t len, uint64_t
> flags);
> > >> > +
> > >> > +/**
> > >> > + * vu_shmem_map:
> > >> > + * @dev: a VuDev context
> > >> > + * @shmid: VIRTIO Shared Memory Region ID
> > >> > + * @fd_offset: File offset
> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > >> > + * @len: Size of the mapping
> > >> > + *
> > >> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> > >> Region
> > >> > + * with the requested `shmid`.
> > >> > + *
> > >> > + * Returns: TRUE on success, FALSE on failure.
> > >> > + */
> > >> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > + uint64_t shm_offset, uint64_t len);
> > >> > +
> > >> > /**
> > >> > * vu_queue_set_notification:
> > >> > * @dev: a VuDev context
> > >> > --
> > >> > 2.45.2
> > >> >
> > >>
> > >
>
[-- Attachment #2: Type: text/html, Size: 33746 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-09-11 11:57 ` Albert Esteve
@ 2024-09-11 14:54 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-09-11 14:54 UTC (permalink / raw)
To: Albert Esteve
Cc: Stefan Hajnoczi, qemu-devel, jasowang, david, slp,
Alex Bennée, Michael S. Tsirkin
On Wed, 11 Sept 2024 at 07:58, Albert Esteve <aesteve@redhat.com> wrote:
> On Thu, Sep 5, 2024 at 6:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>
>> On Tue, Sep 03, 2024 at 01:54:12PM +0200, Albert Esteve wrote:
>> > On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com> wrote:
>> >
>> > >
>> > >
>> > > On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
>> > > wrote:
>> > >
>> > >> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
>> > >> > Add SHMEM_MAP/UNMAP requests to vhost-user to
>> > >> > handle VIRTIO Shared Memory mappings.
>> > >> >
>> > >> > This request allows backends to dynamically map
>> > >> > fds into a VIRTIO Shared Memory Region indentified
>> > >> > by its `shmid`. Then, the fd memory is advertised
>> > >> > to the driver as a base addres + offset, so it
>> > >> > can be read/written (depending on the mmap flags
>> > >> > requested) while its valid.
>> > >> >
>> > >> > The backend can munmap the memory range
>> > >> > in a given VIRTIO Shared Memory Region (again,
>> > >> > identified by its `shmid`), to free it. Upon
>> > >> > receiving this message, the front-end must
>> > >> > mmap the regions with PROT_NONE to reserve
>> > >> > the virtual memory space.
>> > >> >
>> > >> > The device model needs to create MemoryRegion
>> > >> > instances for the VIRTIO Shared Memory Regions
>> > >> > and add them to the `VirtIODevice` instance.
>> > >> >
>> > >> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
>> > >> > ---
>> > >> > docs/interop/vhost-user.rst | 27 +++++
>> > >> > hw/virtio/vhost-user.c | 122 ++++++++++++++++++++++
>> > >> > hw/virtio/virtio.c | 12 +++
>> > >> > include/hw/virtio/virtio.h | 5 +
>> > >> > subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
>> > >> > subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
>> > >> > 6 files changed, 284 insertions(+)
>> > >> >
>> > >> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
>> > >> > index d8419fd2f1..d52ba719d5 100644
>> > >> > --- a/docs/interop/vhost-user.rst
>> > >> > +++ b/docs/interop/vhost-user.rst
>> > >> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
>> > >> > when the operation is successful, or non-zero otherwise. Note that
>> > >> if the
>> > >> > operation fails, no fd is sent to the backend.
>> > >> >
>> > >> > +``VHOST_USER_BACKEND_SHMEM_MAP``
>> > >> > + :id: 9
>> > >> > + :equivalent ioctl: N/A
>> > >> > + :request payload: fd and ``struct VhostUserMMap``
>> > >> > + :reply payload: N/A
>> > >> > +
>> > >> > + This message can be submitted by the backends to advertise a new
>> > >> mapping
>> > >> > + to be made in a given VIRTIO Shared Memory Region. Upon receiving
>> > >> the message,
>> > >> > + The front-end will mmap the given fd into the VIRTIO Shared Memory
>> > >> Region
>> > >> > + with the requested ``shmid``. A reply is generated indicating
>> > >> whether mapping
>> > >> > + succeeded.
>> > >> > +
>> > >> > + Mapping over an already existing map is not allowed and request
>> > >> shall fail.
>> > >> > + Therefore, the memory range in the request must correspond with a
>> > >> valid,
>> > >> > + free region of the VIRTIO Shared Memory Region.
>> > >> > +
>> > >> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
>> > >> > + :id: 10
>> > >> > + :equivalent ioctl: N/A
>> > >> > + :request payload: ``struct VhostUserMMap``
>> > >> > + :reply payload: N/A
>> > >> > +
>> > >> > + This message can be submitted by the backends so that the front-end
>> > >> un-mmap
>> > >> > + a given range (``offset``, ``len``) in the VIRTIO Shared Memory
>> > >> Region with
>> > >>
>> > >> s/offset/shm_offset/
>> > >>
>> > >> > + the requested ``shmid``.
>> > >>
>> > >> Please clarify that <offset, len> must correspond to the entirety of a
>> > >> valid mapped region.
>> > >>
>> > >> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
>> > >> DAX Window:
>> > >>
>> > >> When a FUSE_SETUPMAPPING request perfectly overlaps a previous
>> > >> mapping, the previous mapping is replaced. When a mapping partially
>> > >> overlaps a previous mapping, the previous mapping is split into one or
>> > >> two smaller mappings. When a mapping is partially unmapped it is also
>> > >> split into one or two smaller mappings.
>> > >>
>> > >> Establishing new mappings or splitting existing mappings consumes
>> > >> resources. If the device runs out of resources the FUSE_SETUPMAPPING
>> > >> request fails until resources are available again following
>> > >> FUSE_REMOVEMAPPING.
>> > >>
>> > >> I think SETUPMAPPING/REMOVMAPPING can be implemented using
>> > >> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
>> > >> partial ranges, but as far as I know that's not necessary for virtiofs
>> > >> in practice.
>> > >>
>> > >> It's worth mentioning that mappings consume resources and that SHMEM_MAP
>> > >> can fail when there are no resources available. The process-wide limit
>> > >> is vm.max_map_count on Linux although a vhost-user frontend may reduce
>> > >> it further to control vhost-user resource usage.
>> > >>
>> > >> > + A reply is generated indicating whether unmapping succeeded.
>> > >> > +
>> > >> > .. _reply_ack:
>> > >> >
>> > >> > VHOST_USER_PROTOCOL_F_REPLY_ACK
>> > >> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> > >> > index cdf9af4a4b..7ee8a472c6 100644
>> > >> > --- a/hw/virtio/vhost-user.c
>> > >> > +++ b/hw/virtio/vhost-user.c
>> > >> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
>> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > >> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > >> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> > >> > VHOST_USER_BACKEND_MAX
>> > >> > } VhostUserBackendRequest;
>> > >> >
>> > >> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
>> > >> > unsigned char uuid[16];
>> > >> > } VhostUserShared;
>> > >> >
>> > >> > +/* For the flags field of VhostUserMMap */
>> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > >> > +
>> > >> > +typedef struct {
>> > >> > + /* VIRTIO Shared Memory Region ID */
>> > >> > + uint8_t shmid;
>> > >> > + uint8_t padding[7];
>> > >> > + /* File offset */
>> > >> > + uint64_t fd_offset;
>> > >> > + /* Offset within the VIRTIO Shared Memory Region */
>> > >> > + uint64_t shm_offset;
>> > >> > + /* Size of the mapping */
>> > >> > + uint64_t len;
>> > >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > >> > + uint64_t flags;
>> > >> > +} VhostUserMMap;
>> > >> > +
>> > >> > typedef struct {
>> > >> > VhostUserRequest request;
>> > >> >
>> > >> > @@ -224,6 +244,7 @@ typedef union {
>> > >> > VhostUserInflight inflight;
>> > >> > VhostUserShared object;
>> > >> > VhostUserTransferDeviceState transfer_state;
>> > >> > + VhostUserMMap mmap;
>> > >> > } VhostUserPayload;
>> > >> >
>> > >> > typedef struct VhostUserMsg {
>> > >> > @@ -1748,6 +1769,100 @@
>> > >> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
>> > >> > return 0;
>> > >> > }
>> > >> >
>> > >> > +static int
>> > >> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
>> > >> > + VhostUserMMap *vu_mmap,
>> > >> > + int fd)
>> > >> > +{
>> > >> > + void *addr = 0;
>> > >> > + MemoryRegion *mr = NULL;
>> > >> > +
>> > >> > + if (fd < 0) {
>> > >> > + error_report("Bad fd for map");
>> > >> > + return -EBADF;
>> > >> > + }
>> > >> > +
>> > >> > + if (!dev->vdev->shmem_list ||
>> > >> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > >> > + error_report("Device only has %d VIRTIO Shared Memory Regions.
>> > >> "
>> > >> > + "Requested ID: %d",
>> > >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > >> > +
>> > >> > + if (!mr) {
>> > >> > + error_report("VIRTIO Shared Memory Region at "
>> > >> > + "ID %d unitialized", vu_mmap->shmid);
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > >> > + vu_mmap->shm_offset, vu_mmap->len);
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > >> > +
>> > >> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>> > >>
>> > >> Missing check for overlap between range [shm_offset, shm_offset + len)
>> > >> and existing mappings.
>> > >>
>> > >
>> > > Not sure how to do this check. Specifically, I am not sure how previous
>> > > ranges are stored within the MemoryRegion. Is looping through
>> > > mr->subregions
>> > > a valid option?
>> > >
>> >
>> > Maybe something like this would do?
>> > ```
>> > if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
>> > error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps "
>> > "with previously mapped memory",
>> > vu_mmap->shm_offset, vu_mmap->len);
>> > return -EFAULT;
>> > }
>> > ```
>>
>> I don't think that works because the QEMU MemoryRegion covers the entire
>> range, some of which contains mappings and some of which is empty. It
>> would be necessary to track mappings that have been made.
>>
>> I'm not aware of a security implication if the overlap check is missing,
>> so I guess it may be okay to skip it and rely on the vhost-user back-end
>> author to honor the spec. I'm not totally against that because it's
>> faster and less code, but it feels a bit iffy to not enforce the input
>> validation that the spec requires.
>>
>> Maintain a list of mappings so this check can be performed?
>>
>
> Ok, I prefer to aim for the better solution and see where that takes us.
> So I will add a mapped_regions list or something like that to the
> MemoryRegion struct in a new commit, so that it can be reviewed
> independently. With the infrastructure's code in the patch we can decide if
> it is worth to have it.
Great. MemoryRegion is a core struct that's not related to vhost-user.
I don't think anything else needs a mappings list. Maybe add the
mappings list to shmem_list[] elements instead so that each VIRTIO
Shared Memory Region has a mappings list? Each element could be a
struct with a MemoryRegion field and a mappings list.
Stefan
>
> Thank you!
>
>>
>> >
>> > >
>> > >
>> > >>
>> > >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
>> > >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
>> > >> > + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
>> > >> > +
>> > >> > + if (addr == MAP_FAILED) {
>> > >> > + error_report("Failed to mmap mem fd");
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + return 0;
>> > >> > +}
>> > >> > +
>> > >> > +static int
>> > >> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
>> > >> > + VhostUserMMap *vu_mmap)
>> > >> > +{
>> > >> > + void *addr = 0;
>> > >> > + MemoryRegion *mr = NULL;
>> > >> > +
>> > >> > + if (!dev->vdev->shmem_list ||
>> > >> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > >> > + error_report("Device only has %d VIRTIO Shared Memory Regions.
>> > >> "
>> > >> > + "Requested ID: %d",
>> > >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > >> > +
>> > >> > + if (!mr) {
>> > >> > + error_report("VIRTIO Shared Memory Region at "
>> > >> > + "ID %d unitialized", vu_mmap->shmid);
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > >> > + vu_mmap->shm_offset, vu_mmap->len);
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > >> > +
>> > >> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>> > >>
>> > >> Missing check for existing mapping with exact range [shm_offset, len)
>> > >> match.
>> > >>
>> > >> > + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
>> > >> -1, 0);
>> > >> > +
>> > >> > + if (addr == MAP_FAILED) {
>> > >> > + error_report("Failed to unmap memory");
>> > >> > + return -EFAULT;
>> > >> > + }
>> > >> > +
>> > >> > + return 0;
>> > >> > +}
>> > >> > +
>> > >> > static void close_backend_channel(struct vhost_user *u)
>> > >> > {
>> > >> > g_source_destroy(u->backend_src);
>> > >> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
>> > >> GIOCondition condition,
>> > >> > ret =
>> > >> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
>> > >> > &hdr,
>> > >> &payload);
>> > >> > break;
>> > >> > + case VHOST_USER_BACKEND_SHMEM_MAP:
>> > >> > + ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
>> > >> > + fd ? fd[0] : -1);
>> > >> > + break;
>> > >> > + case VHOST_USER_BACKEND_SHMEM_UNMAP:
>> > >> > + ret = vhost_user_backend_handle_shmem_unmap(dev,
>> > >> &payload.mmap);
>> > >> > + break;
>> > >> > default:
>> > >> > error_report("Received unexpected msg type: %d.", hdr.request);
>> > >> > ret = -EINVAL;
>> > >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> > >> > index 893a072c9d..9f2da5b11e 100644
>> > >> > --- a/hw/virtio/virtio.c
>> > >> > +++ b/hw/virtio/virtio.c
>> > >> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
>> > >> > return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
>> > >> > }
>> > >> >
>> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
>> > >> > +{
>> > >> > + MemoryRegion *mr = g_new0(MemoryRegion, 1);
>> > >> > + ++vdev->n_shmem_regions;
>> > >> > + vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
>> > >> > + vdev->n_shmem_regions);
>> > >>
>> > >> Where is shmem_list freed?
>> > >>
>> > >> The name "list" is misleading since this is an array, not a list.
>> > >>
>> > >> > + vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
>> > >> > + return mr;
>> > >> > +}
>> > >>
>> > >> This looks weird. The contents of mr are copied into shmem_list[] and
>> > >> then the pointer to mr is returned? Did you mean for the field's type to
>> > >> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
>> > >> stash the pointer?
>> > >>
>> > >> > +
>> > >> > /* A wrapper for use as a VMState .put function */
>> > >> > static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
>> > >> > const VMStateField *field, JSONWriter
>> > >> *vmdesc)
>> > >> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
>> > >> device_id, size_t config_size)
>> > >> > virtio_vmstate_change, vdev);
>> > >> > vdev->device_endian = virtio_default_endian();
>> > >> > vdev->use_guest_notifier_mask = true;
>> > >> > + vdev->shmem_list = NULL;
>> > >> > + vdev->n_shmem_regions = 0;
>> > >> > }
>> > >> >
>> > >> > /*
>> > >> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> > >> > index 7d5ffdc145..16d598aadc 100644
>> > >> > --- a/include/hw/virtio/virtio.h
>> > >> > +++ b/include/hw/virtio/virtio.h
>> > >> > @@ -165,6 +165,9 @@ struct VirtIODevice
>> > >> > */
>> > >> > EventNotifier config_notifier;
>> > >> > bool device_iotlb_enabled;
>> > >> > + /* Shared memory region for vhost-user mappings. */
>> > >> > + MemoryRegion *shmem_list;
>> > >> > + int n_shmem_regions;
>> > >> > };
>> > >> >
>> > >> > struct VirtioDeviceClass {
>> > >> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
>> > >> *vq);
>> > >> >
>> > >> > int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>> > >> >
>> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
>> > >> > +
>> > >> > extern const VMStateInfo virtio_vmstate_info;
>> > >> >
>> > >> > #define VMSTATE_VIRTIO_DEVICE \
>> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.c
>> > >> b/subprojects/libvhost-user/libvhost-user.c
>> > >> > index a879149fef..28556d183a 100644
>> > >> > --- a/subprojects/libvhost-user/libvhost-user.c
>> > >> > +++ b/subprojects/libvhost-user/libvhost-user.c
>> > >> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
>> > >> uuid[UUID_LEN])
>> > >> > return vu_send_message(dev, &msg);
>> > >> > }
>> > >> >
>> > >> > +bool
>> > >> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > + uint64_t shm_offset, uint64_t len, uint64_t flags)
>> > >> > +{
>> > >> > + bool result = false;
>> > >> > + VhostUserMsg msg_reply;
>> > >> > + VhostUserMsg vmsg = {
>> > >> > + .request = VHOST_USER_BACKEND_SHMEM_MAP,
>> > >> > + .size = sizeof(vmsg.payload.mmap),
>> > >> > + .flags = VHOST_USER_VERSION,
>> > >> > + .payload.mmap = {
>> > >> > + .shmid = shmid,
>> > >> > + .fd_offset = fd_offset,
>> > >> > + .shm_offset = shm_offset,
>> > >> > + .len = len,
>> > >> > + .flags = flags,
>> > >> > + },
>> > >> > + };
>> > >> > +
>> > >> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> > >> {
>> > >> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > >> > + }
>> > >> > +
>> > >> > + pthread_mutex_lock(&dev->backend_mutex);
>> > >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > >> > + pthread_mutex_unlock(&dev->backend_mutex);
>> > >> > + return false;
>> > >> > + }
>> > >> > +
>> > >> > + /* Also unlocks the backend_mutex */
>> > >> > + return vu_process_message_reply(dev, &vmsg);
>> > >> > +}
>> > >> > +
>> > >> > +bool
>> > >> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > + uint64_t shm_offset, uint64_t len)
>> > >> > +{
>> > >> > + bool result = false;
>> > >> > + VhostUserMsg msg_reply;
>> > >> > + VhostUserMsg vmsg = {
>> > >> > + .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
>> > >> > + .size = sizeof(vmsg.payload.mmap),
>> > >> > + .flags = VHOST_USER_VERSION,
>> > >> > + .payload.mmap = {
>> > >> > + .shmid = shmid,
>> > >> > + .fd_offset = fd_offset,
>> > >>
>> > >> What is the meaning of this field? I expected it to be set to 0.
>> > >>
>> > >> > + .shm_offset = shm_offset,
>> > >> > + .len = len,
>> > >> > + },
>> > >> > + };
>> > >> > +
>> > >> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> > >> {
>> > >> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > >> > + }
>> > >> > +
>> > >> > + pthread_mutex_lock(&dev->backend_mutex);
>> > >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > >> > + pthread_mutex_unlock(&dev->backend_mutex);
>> > >> > + return false;
>> > >> > + }
>> > >> > +
>> > >> > + /* Also unlocks the backend_mutex */
>> > >> > + return vu_process_message_reply(dev, &vmsg);
>> > >> > +}
>> > >> > +
>> > >> > static bool
>> > >> > vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>> > >> > {
>> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.h
>> > >> b/subprojects/libvhost-user/libvhost-user.h
>> > >> > index deb40e77b3..7f6c22cc1a 100644
>> > >> > --- a/subprojects/libvhost-user/libvhost-user.h
>> > >> > +++ b/subprojects/libvhost-user/libvhost-user.h
>> > >> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
>> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> > >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > >> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > >> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> > >> > VHOST_USER_BACKEND_MAX
>> > >> > } VhostUserBackendRequest;
>> > >> >
>> > >> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
>> > >> > unsigned char uuid[UUID_LEN];
>> > >> > } VhostUserShared;
>> > >> >
>> > >> > +/* For the flags field of VhostUserMMap */
>> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > >> > +
>> > >> > +typedef struct {
>> > >> > + /* VIRTIO Shared Memory Region ID */
>> > >> > + uint8_t shmid;
>> > >> > + uint8_t padding[7];
>> > >> > + /* File offset */
>> > >> > + uint64_t fd_offset;
>> > >> > + /* Offset within the VIRTIO Shared Memory Region */
>> > >> > + uint64_t shm_offset;
>> > >> > + /* Size of the mapping */
>> > >> > + uint64_t len;
>> > >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > >> > + uint64_t flags;
>> > >> > +} VhostUserMMap;
>> > >> > +
>> > >> > #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
>> > >> > # define VU_PACKED __attribute__((gcc_struct, packed))
>> > >> > #else
>> > >> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
>> > >> > VhostUserVringArea area;
>> > >> > VhostUserInflight inflight;
>> > >> > VhostUserShared object;
>> > >> > + VhostUserMMap mmap;
>> > >> > } payload;
>> > >> >
>> > >> > int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>> > >> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
>> > >> char uuid[UUID_LEN]);
>> > >> > */
>> > >> > bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
>> > >> >
>> > >> > +/**
>> > >> > + * vu_shmem_map:
>> > >> > + * @dev: a VuDev context
>> > >> > + * @shmid: VIRTIO Shared Memory Region ID
>> > >> > + * @fd_offset: File offset
>> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > >> > + * @len: Size of the mapping
>> > >> > + * @flags: Flags for the mmap operation
>> > >> > + *
>> > >> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
>> > >> Region.
>> > >> > + *
>> > >> > + * Returns: TRUE on success, FALSE on failure.
>> > >> > + */
>> > >> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > + uint64_t shm_offset, uint64_t len, uint64_t flags);
>> > >> > +
>> > >> > +/**
>> > >> > + * vu_shmem_map:
>> > >> > + * @dev: a VuDev context
>> > >> > + * @shmid: VIRTIO Shared Memory Region ID
>> > >> > + * @fd_offset: File offset
>> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > >> > + * @len: Size of the mapping
>> > >> > + *
>> > >> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
>> > >> Region
>> > >> > + * with the requested `shmid`.
>> > >> > + *
>> > >> > + * Returns: TRUE on success, FALSE on failure.
>> > >> > + */
>> > >> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > + uint64_t shm_offset, uint64_t len);
>> > >> > +
>> > >> > /**
>> > >> > * vu_queue_set_notification:
>> > >> > * @dev: a VuDev context
>> > >> > --
>> > >> > 2.45.2
>> > >> >
>> > >>
>> > >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request
2024-07-11 7:45 ` Stefan Hajnoczi
2024-09-03 9:54 ` Albert Esteve
@ 2024-09-04 7:28 ` Albert Esteve
1 sibling, 0 replies; 36+ messages in thread
From: Albert Esteve @ 2024-09-04 7:28 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 19531 bytes --]
On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> > handle VIRTIO Shared Memory mappings.
> >
> > This request allows backends to dynamically map
> > fds into a VIRTIO Shared Memory Region indentified
> > by its `shmid`. Then, the fd memory is advertised
> > to the driver as a base addres + offset, so it
> > can be read/written (depending on the mmap flags
> > requested) while its valid.
> >
> > The backend can munmap the memory range
> > in a given VIRTIO Shared Memory Region (again,
> > identified by its `shmid`), to free it. Upon
> > receiving this message, the front-end must
> > mmap the regions with PROT_NONE to reserve
> > the virtual memory space.
> >
> > The device model needs to create MemoryRegion
> > instances for the VIRTIO Shared Memory Regions
> > and add them to the `VirtIODevice` instance.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> > docs/interop/vhost-user.rst | 27 +++++
> > hw/virtio/vhost-user.c | 122 ++++++++++++++++++++++
> > hw/virtio/virtio.c | 12 +++
> > include/hw/virtio/virtio.h | 5 +
> > subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++
> > subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++
> > 6 files changed, 284 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d8419fd2f1..d52ba719d5 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> > when the operation is successful, or non-zero otherwise. Note that if
> the
> > operation fails, no fd is sent to the backend.
> >
> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > + :id: 9
> > + :equivalent ioctl: N/A
> > + :request payload: fd and ``struct VhostUserMMap``
> > + :reply payload: N/A
> > +
> > + This message can be submitted by the backends to advertise a new
> mapping
> > + to be made in a given VIRTIO Shared Memory Region. Upon receiving the
> message,
> > + The front-end will mmap the given fd into the VIRTIO Shared Memory
> Region
> > + with the requested ``shmid``. A reply is generated indicating whether
> mapping
> > + succeeded.
> > +
> > + Mapping over an already existing map is not allowed and request shall
> fail.
> > + Therefore, the memory range in the request must correspond with a
> valid,
> > + free region of the VIRTIO Shared Memory Region.
> > +
> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > + :id: 10
> > + :equivalent ioctl: N/A
> > + :request payload: ``struct VhostUserMMap``
> > + :reply payload: N/A
> > +
> > + This message can be submitted by the backends so that the front-end
> un-mmap
> > + a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> Region with
>
> s/offset/shm_offset/
>
> > + the requested ``shmid``.
>
> Please clarify that <offset, len> must correspond to the entirety of a
> valid mapped region.
>
> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
> DAX Window:
>
> When a FUSE_SETUPMAPPING request perfectly overlaps a previous
> mapping, the previous mapping is replaced. When a mapping partially
> overlaps a previous mapping, the previous mapping is split into one or
> two smaller mappings. When a mapping is partially unmapped it is also
> split into one or two smaller mappings.
>
> Establishing new mappings or splitting existing mappings consumes
> resources. If the device runs out of resources the FUSE_SETUPMAPPING
> request fails until resources are available again following
> FUSE_REMOVEMAPPING.
>
> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> partial ranges, but as far as I know that's not necessary for virtiofs
> in practice.
>
> It's worth mentioning that mappings consume resources and that SHMEM_MAP
> can fail when there are no resources available. The process-wide limit
> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> it further to control vhost-user resource usage.
>
> > + A reply is generated indicating whether unmapping succeeded.
> > +
> > .. _reply_ack:
> >
> > VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index cdf9af4a4b..7ee8a472c6 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > VHOST_USER_BACKEND_MAX
> > } VhostUserBackendRequest;
> >
> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> > unsigned char uuid[16];
> > } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > + /* VIRTIO Shared Memory Region ID */
> > + uint8_t shmid;
> > + uint8_t padding[7];
> > + /* File offset */
> > + uint64_t fd_offset;
> > + /* Offset within the VIRTIO Shared Memory Region */
> > + uint64_t shm_offset;
> > + /* Size of the mapping */
> > + uint64_t len;
> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > + uint64_t flags;
> > +} VhostUserMMap;
> > +
> > typedef struct {
> > VhostUserRequest request;
> >
> > @@ -224,6 +244,7 @@ typedef union {
> > VhostUserInflight inflight;
> > VhostUserShared object;
> > VhostUserTransferDeviceState transfer_state;
> > + VhostUserMMap mmap;
> > } VhostUserPayload;
> >
> > typedef struct VhostUserMsg {
> > @@ -1748,6 +1769,100 @@
> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> > return 0;
> > }
> >
> > +static int
> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > + VhostUserMMap *vu_mmap,
> > + int fd)
> > +{
> > + void *addr = 0;
> > + MemoryRegion *mr = NULL;
> > +
> > + if (fd < 0) {
> > + error_report("Bad fd for map");
> > + return -EBADF;
> > + }
> > +
> > + if (!dev->vdev->shmem_list ||
> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > + error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > + "Requested ID: %d",
> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > + if (!mr) {
> > + error_report("VIRTIO Shared Memory Region at "
> > + "ID %d unitialized", vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > + vu_mmap->shm_offset, vu_mmap->len);
> > + return -EFAULT;
> > + }
> > +
> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for overlap between range [shm_offset, shm_offset + len)
> and existing mappings.
>
> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> > + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > +
> > + if (addr == MAP_FAILED) {
> > + error_report("Failed to mmap mem fd");
> > + return -EFAULT;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int
> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > + VhostUserMMap *vu_mmap)
> > +{
> > + void *addr = 0;
> > + MemoryRegion *mr = NULL;
> > +
> > + if (!dev->vdev->shmem_list ||
> > + dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > + error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > + "Requested ID: %d",
> > + dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > + if (!mr) {
> > + error_report("VIRTIO Shared Memory Region at "
> > + "ID %d unitialized", vu_mmap->shmid);
> > + return -EFAULT;
> > + }
> > +
> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > + vu_mmap->shm_offset, vu_mmap->len);
> > + return -EFAULT;
> > + }
> > +
> > + void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > + addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for existing mapping with exact range [shm_offset, len)
> match.
>
> > + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1,
> 0);
> > +
> > + if (addr == MAP_FAILED) {
> > + error_report("Failed to unmap memory");
> > + return -EFAULT;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static void close_backend_channel(struct vhost_user *u)
> > {
> > g_source_destroy(u->backend_src);
> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> GIOCondition condition,
> > ret =
> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> > &hdr,
> &payload);
> > break;
> > + case VHOST_USER_BACKEND_SHMEM_MAP:
> > + ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> > + fd ? fd[0] : -1);
> > + break;
> > + case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > + ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> > + break;
> > default:
> > error_report("Received unexpected msg type: %d.", hdr.request);
> > ret = -EINVAL;
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 893a072c9d..9f2da5b11e 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> > return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> > }
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> > +{
> > + MemoryRegion *mr = g_new0(MemoryRegion, 1);
> > + ++vdev->n_shmem_regions;
> > + vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> > + vdev->n_shmem_regions);
>
> Where is shmem_list freed?
>
> The name "list" is misleading since this is an array, not a list.
>
> > + vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> > + return mr;
> > +}
>
> This looks weird. The contents of mr are copied into shmem_list[] and
> then the pointer to mr is returned? Did you mean for the field's type to
> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> stash the pointer?
>
> > +
> > /* A wrapper for use as a VMState .put function */
> > static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> > const VMStateField *field, JSONWriter
> *vmdesc)
> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> device_id, size_t config_size)
> > virtio_vmstate_change, vdev);
> > vdev->device_endian = virtio_default_endian();
> > vdev->use_guest_notifier_mask = true;
> > + vdev->shmem_list = NULL;
> > + vdev->n_shmem_regions = 0;
> > }
> >
> > /*
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 7d5ffdc145..16d598aadc 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -165,6 +165,9 @@ struct VirtIODevice
> > */
> > EventNotifier config_notifier;
> > bool device_iotlb_enabled;
> > + /* Shared memory region for vhost-user mappings. */
> > + MemoryRegion *shmem_list;
> > + int n_shmem_regions;
> > };
> >
> > struct VirtioDeviceClass {
> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> *vq);
> >
> > int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> > +
> > extern const VMStateInfo virtio_vmstate_info;
> >
> > #define VMSTATE_VIRTIO_DEVICE \
> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> b/subprojects/libvhost-user/libvhost-user.c
> > index a879149fef..28556d183a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN])
> > return vu_send_message(dev, &msg);
> > }
> >
> > +bool
> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len, uint64_t flags)
> > +{
> > + bool result = false;
> > + VhostUserMsg msg_reply;
> > + VhostUserMsg vmsg = {
> > + .request = VHOST_USER_BACKEND_SHMEM_MAP,
> > + .size = sizeof(vmsg.payload.mmap),
> > + .flags = VHOST_USER_VERSION,
> > + .payload.mmap = {
> > + .shmid = shmid,
> > + .fd_offset = fd_offset,
> > + .shm_offset = shm_offset,
> > + .len = len,
> > + .flags = flags,
> > + },
> > + };
> > +
> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > + }
> > +
> > + pthread_mutex_lock(&dev->backend_mutex);
> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > + pthread_mutex_unlock(&dev->backend_mutex);
> > + return false;
> > + }
> > +
> > + /* Also unlocks the backend_mutex */
> > + return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> > +bool
> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len)
> > +{
> > + bool result = false;
> > + VhostUserMsg msg_reply;
> > + VhostUserMsg vmsg = {
> > + .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> > + .size = sizeof(vmsg.payload.mmap),
> > + .flags = VHOST_USER_VERSION,
> > + .payload.mmap = {
> > + .shmid = shmid,
> > + .fd_offset = fd_offset,
>
> What is the meaning of this field? I expected it to be set to 0.
>
Probably true. I just kept it generic for backends to decide what to set it
to,
without considering the real use.
I will remove the parameter and set it to 0 in the request.
>
> > + .shm_offset = shm_offset,
> > + .len = len,
> > + },
> > + };
> > +
> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > + vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > + }
> > +
> > + pthread_mutex_lock(&dev->backend_mutex);
> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > + pthread_mutex_unlock(&dev->backend_mutex);
> > + return false;
> > + }
> > +
> > + /* Also unlocks the backend_mutex */
> > + return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> > static bool
> > vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> > {
> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> b/subprojects/libvhost-user/libvhost-user.h
> > index deb40e77b3..7f6c22cc1a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.h
> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > + VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > + VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > VHOST_USER_BACKEND_MAX
> > } VhostUserBackendRequest;
> >
> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> > unsigned char uuid[UUID_LEN];
> > } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > + /* VIRTIO Shared Memory Region ID */
> > + uint8_t shmid;
> > + uint8_t padding[7];
> > + /* File offset */
> > + uint64_t fd_offset;
> > + /* Offset within the VIRTIO Shared Memory Region */
> > + uint64_t shm_offset;
> > + /* Size of the mapping */
> > + uint64_t len;
> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > + uint64_t flags;
> > +} VhostUserMMap;
> > +
> > #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> > # define VU_PACKED __attribute__((gcc_struct, packed))
> > #else
> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> > VhostUserVringArea area;
> > VhostUserInflight inflight;
> > VhostUserShared object;
> > + VhostUserMMap mmap;
> > } payload;
> >
> > int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN]);
> > */
> > bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> >
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + * @flags: Flags for the mmap operation
> > + *
> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
> Region.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len, uint64_t flags);
> > +
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + *
> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> Region
> > + * with the requested `shmid`.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > + uint64_t shm_offset, uint64_t len);
> > +
> > /**
> > * vu_queue_set_notification:
> > * @dev: a VuDev context
> > --
> > 2.45.2
> >
>
[-- Attachment #2: Type: text/html, Size: 23884 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config
2024-06-28 14:57 [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
2024-06-28 14:57 ` [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request Albert Esteve
@ 2024-06-28 14:57 ` Albert Esteve
2024-07-11 8:10 ` Stefan Hajnoczi
2024-07-11 8:15 ` Stefan Hajnoczi
2024-06-28 14:57 ` [RFC PATCH v2 3/5] vhost-user-dev: Add cache BAR Albert Esteve
` (4 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Albert Esteve @ 2024-06-28 14:57 UTC (permalink / raw)
To: qemu-devel
Cc: jasowang, david, slp, Alex Bennée, stefanha,
Michael S. Tsirkin, Albert Esteve
The frontend can use this command to retrieve
VIRTIO Shared Memory Regions configuration from
the backend. The response contains the number of
shared memory regions, their size, and shmid.
This is useful when the frontend is unaware of
specific backend type and configuration,
for example, in the `vhost-user-device` case.
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
docs/interop/vhost-user.rst | 31 +++++++++++++++++++++++
hw/virtio/vhost-user.c | 42 +++++++++++++++++++++++++++++++
include/hw/virtio/vhost-backend.h | 6 +++++
include/hw/virtio/vhost-user.h | 1 +
4 files changed, 80 insertions(+)
diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d52ba719d5..51f01d1d84 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -348,6 +348,19 @@ Device state transfer parameters
In the future, additional phases might be added e.g. to allow
iterative migration while the device is running.
+VIRTIO Shared Memory Region configuration
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
++-------------+---------+------------+----+------------+
+| num regions | padding | mem size 0 | .. | mem size 7 |
++-------------+---------+------------+----+------------+
+
+:num regions: a 32-bit number of regions
+
+:padding: 32-bit
+
+:mem size: 64-bit size of VIRTIO Shared Memory Region
+
C structure
-----------
@@ -369,6 +382,10 @@ In QEMU the vhost-user message is implemented with the following struct:
VhostUserConfig config;
VhostUserVringArea area;
VhostUserInflight inflight;
+ VhostUserShared object;
+ VhostUserTransferDeviceState transfer_state;
+ VhostUserMMap mmap;
+ VhostUserShMemConfig shmem;
};
} QEMU_PACKED VhostUserMsg;
@@ -1051,6 +1068,7 @@ Protocol features
#define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
#define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18
#define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19
+ #define VHOST_USER_PROTOCOL_F_SHMEM 20
Front-end message types
-----------------------
@@ -1725,6 +1743,19 @@ Front-end message types
Using this function requires prior negotiation of the
``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
+``VHOST_USER_GET_SHMEM_CONFIG``
+ :id: 44
+ :equivalent ioctl: N/A
+ :request payload: N/A
+ :reply payload: ``struct VhostUserShMemConfig``
+
+ When the ``VHOST_USER_PROTOCOL_F_SHMEM`` protocol feature has been
+ successfully negotiated, this message can be submitted by the front-end
+ to gather the VIRTIO Shared Memory Region configuration. Back-end will respond
+ with the number of VIRTIO Shared Memory Regions it requires, and each shared memory
+ region size in an array. The shared memory IDs are represented by the index
+ of the array.
+
Back-end message types
----------------------
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 7ee8a472c6..57406dc8b4 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -104,6 +104,7 @@ typedef enum VhostUserRequest {
VHOST_USER_GET_SHARED_OBJECT = 41,
VHOST_USER_SET_DEVICE_STATE_FD = 42,
VHOST_USER_CHECK_DEVICE_STATE = 43,
+ VHOST_USER_GET_SHMEM_CONFIG = 44,
VHOST_USER_MAX
} VhostUserRequest;
@@ -138,6 +139,12 @@ typedef struct VhostUserMemRegMsg {
VhostUserMemoryRegion region;
} VhostUserMemRegMsg;
+typedef struct VhostUserShMemConfig {
+ uint32_t nregions;
+ uint32_t padding;
+ uint64_t memory_sizes[VHOST_MEMORY_BASELINE_NREGIONS];
+} VhostUserShMemConfig;
+
typedef struct VhostUserLog {
uint64_t mmap_size;
uint64_t mmap_offset;
@@ -245,6 +252,7 @@ typedef union {
VhostUserShared object;
VhostUserTransferDeviceState transfer_state;
VhostUserMMap mmap;
+ VhostUserShMemConfig shmem;
} VhostUserPayload;
typedef struct VhostUserMsg {
@@ -3136,6 +3144,39 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
return 0;
}
+static int vhost_user_get_shmem_config(struct vhost_dev *dev,
+ int *nregions,
+ uint64_t *memory_sizes,
+ Error **errp)
+{
+ int ret;
+ VhostUserMsg msg = {
+ .hdr.request = VHOST_USER_GET_SHMEM_CONFIG,
+ .hdr.flags = VHOST_USER_VERSION,
+ };
+
+ if (!virtio_has_feature(dev->protocol_features,
+ VHOST_USER_PROTOCOL_F_SHMEM)) {
+ return 0;
+ }
+
+ ret = vhost_user_write(dev, &msg, NULL, 0);
+ if (ret < 0) {
+ return ret;
+ }
+
+ ret = vhost_user_read(dev, &msg);
+ if (ret < 0) {
+ return ret;
+ }
+
+ *nregions = msg.payload.shmem.nregions;
+ memcpy(memory_sizes,
+ &msg.payload.shmem.memory_sizes,
+ sizeof(uint64_t) * VHOST_MEMORY_BASELINE_NREGIONS);
+ return 0;
+}
+
const VhostOps user_ops = {
.backend_type = VHOST_BACKEND_TYPE_USER,
.vhost_backend_init = vhost_user_backend_init,
@@ -3174,4 +3215,5 @@ const VhostOps user_ops = {
.vhost_supports_device_state = vhost_user_supports_device_state,
.vhost_set_device_state_fd = vhost_user_set_device_state_fd,
.vhost_check_device_state = vhost_user_check_device_state,
+ .vhost_get_shmem_config = vhost_user_get_shmem_config,
};
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 70c2e8ffee..f9c2955420 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -159,6 +159,11 @@ typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
int *reply_fd,
Error **errp);
typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error **errp);
+typedef int (*vhost_get_shmem_config_op)(struct vhost_dev *dev,
+ int *nregions,
+ uint64_t *memory_sizes,
+ Error **errp);
+
typedef struct VhostOps {
VhostBackendType backend_type;
@@ -214,6 +219,7 @@ typedef struct VhostOps {
vhost_supports_device_state_op vhost_supports_device_state;
vhost_set_device_state_fd_op vhost_set_device_state_fd;
vhost_check_device_state_op vhost_check_device_state;
+ vhost_get_shmem_config_op vhost_get_shmem_config;
} VhostOps;
int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index d7c09ffd34..e1b587a908 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -32,6 +32,7 @@ enum VhostUserProtocolFeature {
/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
VHOST_USER_PROTOCOL_F_DEVICE_STATE = 19,
+ VHOST_USER_PROTOCOL_F_SHMEM = 20,
VHOST_USER_PROTOCOL_F_MAX
};
--
2.45.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config
2024-06-28 14:57 ` [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config Albert Esteve
@ 2024-07-11 8:10 ` Stefan Hajnoczi
2024-09-04 9:05 ` Albert Esteve
2024-07-11 8:15 ` Stefan Hajnoczi
1 sibling, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11 8:10 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 8750 bytes --]
On Fri, Jun 28, 2024 at 04:57:07PM +0200, Albert Esteve wrote:
> The frontend can use this command to retrieve
> VIRTIO Shared Memory Regions configuration from
> the backend. The response contains the number of
> shared memory regions, their size, and shmid.
>
> This is useful when the frontend is unaware of
> specific backend type and configuration,
> for example, in the `vhost-user-device` case.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> docs/interop/vhost-user.rst | 31 +++++++++++++++++++++++
> hw/virtio/vhost-user.c | 42 +++++++++++++++++++++++++++++++
> include/hw/virtio/vhost-backend.h | 6 +++++
> include/hw/virtio/vhost-user.h | 1 +
> 4 files changed, 80 insertions(+)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d52ba719d5..51f01d1d84 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -348,6 +348,19 @@ Device state transfer parameters
> In the future, additional phases might be added e.g. to allow
> iterative migration while the device is running.
>
> +VIRTIO Shared Memory Region configuration
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> ++-------------+---------+------------+----+------------+
> +| num regions | padding | mem size 0 | .. | mem size 7 |
> ++-------------+---------+------------+----+------------+
8 regions may not be enough. The max according to the VIRTIO spec is
256 because virtio-pci uses an 8-bit cap.id field for the shmid. I think
the maximum number should be 256 here.
(I haven't checked the QEMU vhost-user code to see whether it's
reasonable to hardcode to 256 or some logic if needed to dynamically
size the buffer depending on the "num regions" field.)
> +
> +:num regions: a 32-bit number of regions
> +
> +:padding: 32-bit
> +
> +:mem size: 64-bit size of VIRTIO Shared Memory Region
> +
> C structure
> -----------
>
> @@ -369,6 +382,10 @@ In QEMU the vhost-user message is implemented with the following struct:
> VhostUserConfig config;
> VhostUserVringArea area;
> VhostUserInflight inflight;
> + VhostUserShared object;
> + VhostUserTransferDeviceState transfer_state;
> + VhostUserMMap mmap;
Why are these added by this patch? Please add them in the same patch
where they are introduced.
> + VhostUserShMemConfig shmem;
> };
> } QEMU_PACKED VhostUserMsg;
>
> @@ -1051,6 +1068,7 @@ Protocol features
> #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
> #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18
> #define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19
> + #define VHOST_USER_PROTOCOL_F_SHMEM 20
>
> Front-end message types
> -----------------------
> @@ -1725,6 +1743,19 @@ Front-end message types
> Using this function requires prior negotiation of the
> ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
>
> +``VHOST_USER_GET_SHMEM_CONFIG``
> + :id: 44
> + :equivalent ioctl: N/A
> + :request payload: N/A
> + :reply payload: ``struct VhostUserShMemConfig``
> +
> + When the ``VHOST_USER_PROTOCOL_F_SHMEM`` protocol feature has been
> + successfully negotiated, this message can be submitted by the front-end
> + to gather the VIRTIO Shared Memory Region configuration. Back-end will respond
> + with the number of VIRTIO Shared Memory Regions it requires, and each shared memory
> + region size in an array. The shared memory IDs are represented by the index
> + of the array.
Is the information returned by SHMEM_CONFIG valid and unchanged for the
entire lifetime of the vhost-user connection?
I think the answer is yes because the enumeration that virtio-pci and
virtio-mmio transports support is basically a one-time operation at
driver startup and it is static (Shared Memory Regions do not appear or
go away at runtime). Please be explicit how VHOST_USER_GET_SHMEM_CONFIG
is intended to be used.
> +
> Back-end message types
> ----------------------
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 7ee8a472c6..57406dc8b4 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -104,6 +104,7 @@ typedef enum VhostUserRequest {
> VHOST_USER_GET_SHARED_OBJECT = 41,
> VHOST_USER_SET_DEVICE_STATE_FD = 42,
> VHOST_USER_CHECK_DEVICE_STATE = 43,
> + VHOST_USER_GET_SHMEM_CONFIG = 44,
> VHOST_USER_MAX
> } VhostUserRequest;
>
> @@ -138,6 +139,12 @@ typedef struct VhostUserMemRegMsg {
> VhostUserMemoryRegion region;
> } VhostUserMemRegMsg;
>
> +typedef struct VhostUserShMemConfig {
> + uint32_t nregions;
> + uint32_t padding;
> + uint64_t memory_sizes[VHOST_MEMORY_BASELINE_NREGIONS];
> +} VhostUserShMemConfig;
> +
> typedef struct VhostUserLog {
> uint64_t mmap_size;
> uint64_t mmap_offset;
> @@ -245,6 +252,7 @@ typedef union {
> VhostUserShared object;
> VhostUserTransferDeviceState transfer_state;
> VhostUserMMap mmap;
> + VhostUserShMemConfig shmem;
> } VhostUserPayload;
>
> typedef struct VhostUserMsg {
> @@ -3136,6 +3144,39 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
> return 0;
> }
>
> +static int vhost_user_get_shmem_config(struct vhost_dev *dev,
> + int *nregions,
> + uint64_t *memory_sizes,
> + Error **errp)
> +{
> + int ret;
> + VhostUserMsg msg = {
> + .hdr.request = VHOST_USER_GET_SHMEM_CONFIG,
> + .hdr.flags = VHOST_USER_VERSION,
> + };
> +
> + if (!virtio_has_feature(dev->protocol_features,
> + VHOST_USER_PROTOCOL_F_SHMEM)) {
> + return 0;
> + }
> +
> + ret = vhost_user_write(dev, &msg, NULL, 0);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + ret = vhost_user_read(dev, &msg);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + *nregions = msg.payload.shmem.nregions;
Missing input validation from the untrusted vhost-user backend. nregions
may be out of range.
> + memcpy(memory_sizes,
> + &msg.payload.shmem.memory_sizes,
> + sizeof(uint64_t) * VHOST_MEMORY_BASELINE_NREGIONS);
> + return 0;
> +}
> +
> const VhostOps user_ops = {
> .backend_type = VHOST_BACKEND_TYPE_USER,
> .vhost_backend_init = vhost_user_backend_init,
> @@ -3174,4 +3215,5 @@ const VhostOps user_ops = {
> .vhost_supports_device_state = vhost_user_supports_device_state,
> .vhost_set_device_state_fd = vhost_user_set_device_state_fd,
> .vhost_check_device_state = vhost_user_check_device_state,
> + .vhost_get_shmem_config = vhost_user_get_shmem_config,
> };
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 70c2e8ffee..f9c2955420 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -159,6 +159,11 @@ typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
> int *reply_fd,
> Error **errp);
> typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error **errp);
> +typedef int (*vhost_get_shmem_config_op)(struct vhost_dev *dev,
> + int *nregions,
> + uint64_t *memory_sizes,
> + Error **errp);
> +
>
> typedef struct VhostOps {
> VhostBackendType backend_type;
> @@ -214,6 +219,7 @@ typedef struct VhostOps {
> vhost_supports_device_state_op vhost_supports_device_state;
> vhost_set_device_state_fd_op vhost_set_device_state_fd;
> vhost_check_device_state_op vhost_check_device_state;
> + vhost_get_shmem_config_op vhost_get_shmem_config;
> } VhostOps;
>
> int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index d7c09ffd34..e1b587a908 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -32,6 +32,7 @@ enum VhostUserProtocolFeature {
> /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
> VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
> VHOST_USER_PROTOCOL_F_DEVICE_STATE = 19,
> + VHOST_USER_PROTOCOL_F_SHMEM = 20,
> VHOST_USER_PROTOCOL_F_MAX
> };
>
> --
> 2.45.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config
2024-07-11 8:10 ` Stefan Hajnoczi
@ 2024-09-04 9:05 ` Albert Esteve
0 siblings, 0 replies; 36+ messages in thread
From: Albert Esteve @ 2024-09-04 9:05 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 9557 bytes --]
On Thu, Jul 11, 2024 at 10:10 AM Stefan Hajnoczi <stefanha@redhat.com>
wrote:
> On Fri, Jun 28, 2024 at 04:57:07PM +0200, Albert Esteve wrote:
> > The frontend can use this command to retrieve
> > VIRTIO Shared Memory Regions configuration from
> > the backend. The response contains the number of
> > shared memory regions, their size, and shmid.
> >
> > This is useful when the frontend is unaware of
> > specific backend type and configuration,
> > for example, in the `vhost-user-device` case.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> > docs/interop/vhost-user.rst | 31 +++++++++++++++++++++++
> > hw/virtio/vhost-user.c | 42 +++++++++++++++++++++++++++++++
> > include/hw/virtio/vhost-backend.h | 6 +++++
> > include/hw/virtio/vhost-user.h | 1 +
> > 4 files changed, 80 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d52ba719d5..51f01d1d84 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -348,6 +348,19 @@ Device state transfer parameters
> > In the future, additional phases might be added e.g. to allow
> > iterative migration while the device is running.
> >
> > +VIRTIO Shared Memory Region configuration
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > ++-------------+---------+------------+----+------------+
> > +| num regions | padding | mem size 0 | .. | mem size 7 |
> > ++-------------+---------+------------+----+------------+
>
> 8 regions may not be enough. The max according to the VIRTIO spec is
> 256 because virtio-pci uses an 8-bit cap.id field for the shmid. I think
> the maximum number should be 256 here.
>
Ok, I'll set it to 255, as it starts at 0.
>
> (I haven't checked the QEMU vhost-user code to see whether it's
> reasonable to hardcode to 256 or some logic if needed to dynamically
> size the buffer depending on the "num regions" field.)
>
> > +
> > +:num regions: a 32-bit number of regions
> > +
> > +:padding: 32-bit
> > +
> > +:mem size: 64-bit size of VIRTIO Shared Memory Region
> > +
> > C structure
> > -----------
> >
> > @@ -369,6 +382,10 @@ In QEMU the vhost-user message is implemented with
> the following struct:
> > VhostUserConfig config;
> > VhostUserVringArea area;
> > VhostUserInflight inflight;
> > + VhostUserShared object;
> > + VhostUserTransferDeviceState transfer_state;
> > + VhostUserMMap mmap;
>
> Why are these added by this patch? Please add them in the same patch
> where they are introduced.
For object and transfer_state that's a ship that already sailed.
In order to still align the excerpt with the actual code, I will
split them into their own commit to avoid confusion.
> > + VhostUserShMemConfig shmem;
> > };
> > } QEMU_PACKED VhostUserMsg;
> >
> > @@ -1051,6 +1068,7 @@ Protocol features
> > #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
> > #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18
> > #define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19
> > + #define VHOST_USER_PROTOCOL_F_SHMEM 20
> >
> > Front-end message types
> > -----------------------
> > @@ -1725,6 +1743,19 @@ Front-end message types
> > Using this function requires prior negotiation of the
> > ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
> >
> > +``VHOST_USER_GET_SHMEM_CONFIG``
> > + :id: 44
> > + :equivalent ioctl: N/A
> > + :request payload: N/A
> > + :reply payload: ``struct VhostUserShMemConfig``
> > +
> > + When the ``VHOST_USER_PROTOCOL_F_SHMEM`` protocol feature has been
> > + successfully negotiated, this message can be submitted by the
> front-end
> > + to gather the VIRTIO Shared Memory Region configuration. Back-end
> will respond
> > + with the number of VIRTIO Shared Memory Regions it requires, and each
> shared memory
> > + region size in an array. The shared memory IDs are represented by the
> index
> > + of the array.
>
> Is the information returned by SHMEM_CONFIG valid and unchanged for the
> entire lifetime of the vhost-user connection?
>
> I think the answer is yes because the enumeration that virtio-pci and
> virtio-mmio transports support is basically a one-time operation at
> driver startup and it is static (Shared Memory Regions do not appear or
> go away at runtime). Please be explicit how VHOST_USER_GET_SHMEM_CONFIG
> is intended to be used.
>
Yes, I will be explicit.
>
> > +
> > Back-end message types
> > ----------------------
> >
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 7ee8a472c6..57406dc8b4 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -104,6 +104,7 @@ typedef enum VhostUserRequest {
> > VHOST_USER_GET_SHARED_OBJECT = 41,
> > VHOST_USER_SET_DEVICE_STATE_FD = 42,
> > VHOST_USER_CHECK_DEVICE_STATE = 43,
> > + VHOST_USER_GET_SHMEM_CONFIG = 44,
> > VHOST_USER_MAX
> > } VhostUserRequest;
> >
> > @@ -138,6 +139,12 @@ typedef struct VhostUserMemRegMsg {
> > VhostUserMemoryRegion region;
> > } VhostUserMemRegMsg;
> >
> > +typedef struct VhostUserShMemConfig {
> > + uint32_t nregions;
> > + uint32_t padding;
> > + uint64_t memory_sizes[VHOST_MEMORY_BASELINE_NREGIONS];
> > +} VhostUserShMemConfig;
> > +
> > typedef struct VhostUserLog {
> > uint64_t mmap_size;
> > uint64_t mmap_offset;
> > @@ -245,6 +252,7 @@ typedef union {
> > VhostUserShared object;
> > VhostUserTransferDeviceState transfer_state;
> > VhostUserMMap mmap;
> > + VhostUserShMemConfig shmem;
> > } VhostUserPayload;
> >
> > typedef struct VhostUserMsg {
> > @@ -3136,6 +3144,39 @@ static int vhost_user_check_device_state(struct
> vhost_dev *dev, Error **errp)
> > return 0;
> > }
> >
> > +static int vhost_user_get_shmem_config(struct vhost_dev *dev,
> > + int *nregions,
> > + uint64_t *memory_sizes,
> > + Error **errp)
> > +{
> > + int ret;
> > + VhostUserMsg msg = {
> > + .hdr.request = VHOST_USER_GET_SHMEM_CONFIG,
> > + .hdr.flags = VHOST_USER_VERSION,
> > + };
> > +
> > + if (!virtio_has_feature(dev->protocol_features,
> > + VHOST_USER_PROTOCOL_F_SHMEM)) {
> > + return 0;
> > + }
> > +
> > + ret = vhost_user_write(dev, &msg, NULL, 0);
> > + if (ret < 0) {
> > + return ret;
> > + }
> > +
> > + ret = vhost_user_read(dev, &msg);
> > + if (ret < 0) {
> > + return ret;
> > + }
> > +
> > + *nregions = msg.payload.shmem.nregions;
>
> Missing input validation from the untrusted vhost-user backend. nregions
> may be out of range.
>
> > + memcpy(memory_sizes,
> > + &msg.payload.shmem.memory_sizes,
> > + sizeof(uint64_t) * VHOST_MEMORY_BASELINE_NREGIONS);
> > + return 0;
> > +}
> > +
> > const VhostOps user_ops = {
> > .backend_type = VHOST_BACKEND_TYPE_USER,
> > .vhost_backend_init = vhost_user_backend_init,
> > @@ -3174,4 +3215,5 @@ const VhostOps user_ops = {
> > .vhost_supports_device_state = vhost_user_supports_device_state,
> > .vhost_set_device_state_fd = vhost_user_set_device_state_fd,
> > .vhost_check_device_state = vhost_user_check_device_state,
> > + .vhost_get_shmem_config = vhost_user_get_shmem_config,
> > };
> > diff --git a/include/hw/virtio/vhost-backend.h
> b/include/hw/virtio/vhost-backend.h
> > index 70c2e8ffee..f9c2955420 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -159,6 +159,11 @@ typedef int (*vhost_set_device_state_fd_op)(struct
> vhost_dev *dev,
> > int *reply_fd,
> > Error **errp);
> > typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error
> **errp);
> > +typedef int (*vhost_get_shmem_config_op)(struct vhost_dev *dev,
> > + int *nregions,
> > + uint64_t *memory_sizes,
> > + Error **errp);
> > +
> >
> > typedef struct VhostOps {
> > VhostBackendType backend_type;
> > @@ -214,6 +219,7 @@ typedef struct VhostOps {
> > vhost_supports_device_state_op vhost_supports_device_state;
> > vhost_set_device_state_fd_op vhost_set_device_state_fd;
> > vhost_check_device_state_op vhost_check_device_state;
> > + vhost_get_shmem_config_op vhost_get_shmem_config;
> > } VhostOps;
> >
> > int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> > diff --git a/include/hw/virtio/vhost-user.h
> b/include/hw/virtio/vhost-user.h
> > index d7c09ffd34..e1b587a908 100644
> > --- a/include/hw/virtio/vhost-user.h
> > +++ b/include/hw/virtio/vhost-user.h
> > @@ -32,6 +32,7 @@ enum VhostUserProtocolFeature {
> > /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
> > VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
> > VHOST_USER_PROTOCOL_F_DEVICE_STATE = 19,
> > + VHOST_USER_PROTOCOL_F_SHMEM = 20,
> > VHOST_USER_PROTOCOL_F_MAX
> > };
> >
> > --
> > 2.45.2
> >
>
[-- Attachment #2: Type: text/html, Size: 12291 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config
2024-06-28 14:57 ` [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config Albert Esteve
2024-07-11 8:10 ` Stefan Hajnoczi
@ 2024-07-11 8:15 ` Stefan Hajnoczi
1 sibling, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11 8:15 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 3415 bytes --]
On Fri, Jun 28, 2024 at 04:57:07PM +0200, Albert Esteve wrote:
> The frontend can use this command to retrieve
> VIRTIO Shared Memory Regions configuration from
> the backend. The response contains the number of
> shared memory regions, their size, and shmid.
>
> This is useful when the frontend is unaware of
> specific backend type and configuration,
> for example, in the `vhost-user-device` case.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> docs/interop/vhost-user.rst | 31 +++++++++++++++++++++++
> hw/virtio/vhost-user.c | 42 +++++++++++++++++++++++++++++++
> include/hw/virtio/vhost-backend.h | 6 +++++
> include/hw/virtio/vhost-user.h | 1 +
> 4 files changed, 80 insertions(+)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d52ba719d5..51f01d1d84 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -348,6 +348,19 @@ Device state transfer parameters
> In the future, additional phases might be added e.g. to allow
> iterative migration while the device is running.
>
> +VIRTIO Shared Memory Region configuration
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> ++-------------+---------+------------+----+------------+
> +| num regions | padding | mem size 0 | .. | mem size 7 |
> ++-------------+---------+------------+----+------------+
> +
> +:num regions: a 32-bit number of regions
> +
> +:padding: 32-bit
> +
> +:mem size: 64-bit size of VIRTIO Shared Memory Region
> +
> C structure
> -----------
>
> @@ -369,6 +382,10 @@ In QEMU the vhost-user message is implemented with the following struct:
> VhostUserConfig config;
> VhostUserVringArea area;
> VhostUserInflight inflight;
> + VhostUserShared object;
> + VhostUserTransferDeviceState transfer_state;
> + VhostUserMMap mmap;
> + VhostUserShMemConfig shmem;
> };
> } QEMU_PACKED VhostUserMsg;
>
> @@ -1051,6 +1068,7 @@ Protocol features
> #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
> #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18
> #define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19
> + #define VHOST_USER_PROTOCOL_F_SHMEM 20
>
> Front-end message types
> -----------------------
> @@ -1725,6 +1743,19 @@ Front-end message types
> Using this function requires prior negotiation of the
> ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
>
> +``VHOST_USER_GET_SHMEM_CONFIG``
> + :id: 44
> + :equivalent ioctl: N/A
> + :request payload: N/A
> + :reply payload: ``struct VhostUserShMemConfig``
> +
> + When the ``VHOST_USER_PROTOCOL_F_SHMEM`` protocol feature has been
> + successfully negotiated, this message can be submitted by the front-end
> + to gather the VIRTIO Shared Memory Region configuration. Back-end will respond
> + with the number of VIRTIO Shared Memory Regions it requires, and each shared memory
> + region size in an array. The shared memory IDs are represented by the index
> + of the array.
Please add:
- The Shared Memory Region size must be a multiple of the page size supported by mmap(2).
- The size may be 0 if the region is unused. This can happen when the
device does not support an optional feature but does support a feature
that uses a higher shmid.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH v2 3/5] vhost-user-dev: Add cache BAR
2024-06-28 14:57 [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
2024-06-28 14:57 ` [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request Albert Esteve
2024-06-28 14:57 ` [RFC PATCH v2 2/5] vhost_user: Add frontend command for shmem config Albert Esteve
@ 2024-06-28 14:57 ` Albert Esteve
2024-07-11 8:25 ` Stefan Hajnoczi
2024-06-28 14:57 ` [RFC PATCH v2 4/5] vhost_user: Add MEM_READ/WRITE backend requests Albert Esteve
` (3 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-06-28 14:57 UTC (permalink / raw)
To: qemu-devel
Cc: jasowang, david, slp, Alex Bennée, stefanha,
Michael S. Tsirkin, Albert Esteve
Add a cache BAR in the vhost-user-device
into which files can be directly mapped.
The number, shmid, and size of the VIRTIO Shared
Memory subregions is retrieved through a get_shmem_config
message sent by the vhost-user-base module
on the realize step, after virtio_init().
By default, if VHOST_USER_PROTOCOL_F_SHMEM
feature is not supported by the backend,
there is no cache.
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
hw/virtio/vhost-user-base.c | 39 +++++++++++++++++++++++++++++--
hw/virtio/vhost-user-device-pci.c | 37 ++++++++++++++++++++++++++---
2 files changed, 71 insertions(+), 5 deletions(-)
diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
index a83167191e..e47c568a55 100644
--- a/hw/virtio/vhost-user-base.c
+++ b/hw/virtio/vhost-user-base.c
@@ -268,7 +268,9 @@ static void vub_device_realize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
VHostUserBase *vub = VHOST_USER_BASE(dev);
- int ret;
+ uint64_t memory_sizes[8];
+ void *cache_ptr;
+ int i, ret, nregions;
if (!vub->chardev.chr) {
error_setg(errp, "vhost-user-base: missing chardev");
@@ -311,7 +313,7 @@ static void vub_device_realize(DeviceState *dev, Error **errp)
/* Allocate queues */
vub->vqs = g_ptr_array_sized_new(vub->num_vqs);
- for (int i = 0; i < vub->num_vqs; i++) {
+ for (i = 0; i < vub->num_vqs; i++) {
g_ptr_array_add(vub->vqs,
virtio_add_queue(vdev, vub->vq_size,
vub_handle_output));
@@ -328,6 +330,39 @@ static void vub_device_realize(DeviceState *dev, Error **errp)
do_vhost_user_cleanup(vdev, vub);
}
+ ret = vub->vhost_dev.vhost_ops->vhost_get_shmem_config(&vub->vhost_dev,
+ &nregions,
+ memory_sizes,
+ errp);
+
+ if (ret < 0) {
+ do_vhost_user_cleanup(vdev, vub);
+ }
+
+ for (i = 0; i < nregions; i++) {
+ if (memory_sizes[i]) {
+ if (!is_power_of_2(memory_sizes[i]) ||
+ memory_sizes[i] < qemu_real_host_page_size()) {
+ error_setg(errp, "Shared memory %d size must be a power of 2 "
+ "no smaller than the page size", i);
+ return;
+ }
+
+ cache_ptr = mmap(NULL, memory_sizes[i], PROT_READ,
+ MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+ if (cache_ptr == MAP_FAILED) {
+ error_setg(errp, "Unable to mmap blank cache: %s",
+ strerror(errno));
+ return;
+ }
+
+ virtio_new_shmem_region(vdev);
+ memory_region_init_ram_ptr(&vdev->shmem_list[i],
+ OBJECT(vdev), "vub-shm-" + i,
+ memory_sizes[i], cache_ptr);
+ }
+ }
+
qemu_chr_fe_set_handlers(&vub->chardev, NULL, NULL, vub_event, NULL,
dev, NULL, true);
}
diff --git a/hw/virtio/vhost-user-device-pci.c b/hw/virtio/vhost-user-device-pci.c
index efaf55d3dd..314bacfb7a 100644
--- a/hw/virtio/vhost-user-device-pci.c
+++ b/hw/virtio/vhost-user-device-pci.c
@@ -8,14 +8,18 @@
*/
#include "qemu/osdep.h"
+#include "qapi/error.h"
#include "hw/qdev-properties.h"
#include "hw/virtio/vhost-user-base.h"
#include "hw/virtio/virtio-pci.h"
+#define VIRTIO_DEVICE_PCI_CACHE_BAR 2
+
struct VHostUserDevicePCI {
VirtIOPCIProxy parent_obj;
VHostUserBase vub;
+ MemoryRegion cachebar;
};
#define TYPE_VHOST_USER_DEVICE_PCI "vhost-user-device-pci-base"
@@ -25,10 +29,37 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserDevicePCI, VHOST_USER_DEVICE_PCI)
static void vhost_user_device_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
{
VHostUserDevicePCI *dev = VHOST_USER_DEVICE_PCI(vpci_dev);
- DeviceState *vdev = DEVICE(&dev->vub);
-
+ DeviceState *dev_state = DEVICE(&dev->vub);
+ VirtIODevice *vdev = VIRTIO_DEVICE(dev_state);
+ uint64_t offset = 0, cache_size = 0;
+ int i;
+
vpci_dev->nvectors = 1;
- qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+ qdev_realize(dev_state, BUS(&vpci_dev->bus), errp);
+
+ for (i = 0; i < vdev->n_shmem_regions; i++) {
+ if (vdev->shmem_list[i].size > UINT64_MAX - cache_size) {
+ error_setg(errp, "Total shared memory required overflow");
+ return;
+ }
+ cache_size = cache_size + vdev->shmem_list[i].size;
+ }
+ if (cache_size) {
+ memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
+ "vhost-device-pci-cachebar", cache_size);
+ for (i = 0; i < vdev->n_shmem_regions; i++) {
+ memory_region_add_subregion(&dev->cachebar, offset,
+ &vdev->shmem_list[i]);
+ virtio_pci_add_shm_cap(vpci_dev, VIRTIO_DEVICE_PCI_CACHE_BAR,
+ offset, vdev->shmem_list[i].size, i);
+ offset = offset + vdev->shmem_list[i].size;
+ }
+ pci_register_bar(&vpci_dev->pci_dev, VIRTIO_DEVICE_PCI_CACHE_BAR,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &dev->cachebar);
+ }
}
static void vhost_user_device_pci_class_init(ObjectClass *klass, void *data)
--
2.45.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 3/5] vhost-user-dev: Add cache BAR
2024-06-28 14:57 ` [RFC PATCH v2 3/5] vhost-user-dev: Add cache BAR Albert Esteve
@ 2024-07-11 8:25 ` Stefan Hajnoczi
2024-09-04 11:20 ` Albert Esteve
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11 8:25 UTC (permalink / raw)
To: Albert Esteve, Michael S. Tsirkin
Cc: qemu-devel, jasowang, david, slp, Alex Bennée
[-- Attachment #1: Type: text/plain, Size: 6420 bytes --]
On Fri, Jun 28, 2024 at 04:57:08PM +0200, Albert Esteve wrote:
> Add a cache BAR in the vhost-user-device
> into which files can be directly mapped.
>
> The number, shmid, and size of the VIRTIO Shared
> Memory subregions is retrieved through a get_shmem_config
> message sent by the vhost-user-base module
> on the realize step, after virtio_init().
>
> By default, if VHOST_USER_PROTOCOL_F_SHMEM
> feature is not supported by the backend,
> there is no cache.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
Michael: Please review vhost_user_device_pci_realize() below regarding
virtio-pci BAR layout. Thanks!
> ---
> hw/virtio/vhost-user-base.c | 39 +++++++++++++++++++++++++++++--
> hw/virtio/vhost-user-device-pci.c | 37 ++++++++++++++++++++++++++---
> 2 files changed, 71 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
> index a83167191e..e47c568a55 100644
> --- a/hw/virtio/vhost-user-base.c
> +++ b/hw/virtio/vhost-user-base.c
> @@ -268,7 +268,9 @@ static void vub_device_realize(DeviceState *dev, Error **errp)
> {
> VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> VHostUserBase *vub = VHOST_USER_BASE(dev);
> - int ret;
> + uint64_t memory_sizes[8];
> + void *cache_ptr;
> + int i, ret, nregions;
>
> if (!vub->chardev.chr) {
> error_setg(errp, "vhost-user-base: missing chardev");
> @@ -311,7 +313,7 @@ static void vub_device_realize(DeviceState *dev, Error **errp)
>
> /* Allocate queues */
> vub->vqs = g_ptr_array_sized_new(vub->num_vqs);
> - for (int i = 0; i < vub->num_vqs; i++) {
> + for (i = 0; i < vub->num_vqs; i++) {
> g_ptr_array_add(vub->vqs,
> virtio_add_queue(vdev, vub->vq_size,
> vub_handle_output));
> @@ -328,6 +330,39 @@ static void vub_device_realize(DeviceState *dev, Error **errp)
> do_vhost_user_cleanup(vdev, vub);
> }
>
> + ret = vub->vhost_dev.vhost_ops->vhost_get_shmem_config(&vub->vhost_dev,
> + &nregions,
> + memory_sizes,
> + errp);
> +
> + if (ret < 0) {
> + do_vhost_user_cleanup(vdev, vub);
> + }
> +
> + for (i = 0; i < nregions; i++) {
> + if (memory_sizes[i]) {
> + if (!is_power_of_2(memory_sizes[i]) ||
> + memory_sizes[i] < qemu_real_host_page_size()) {
Or just if (memory_sizes[i] % qemu_real_host_page_size() != 0)?
> + error_setg(errp, "Shared memory %d size must be a power of 2 "
> + "no smaller than the page size", i);
> + return;
> + }
> +
> + cache_ptr = mmap(NULL, memory_sizes[i], PROT_READ,
Should this be PROT_NONE like in
vhost_user_backend_handle_shmem_unmap()?
> + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + if (cache_ptr == MAP_FAILED) {
> + error_setg(errp, "Unable to mmap blank cache: %s",
> + strerror(errno));
error_setg_errno() can be used here.
> + return;
> + }
> +
> + virtio_new_shmem_region(vdev);
> + memory_region_init_ram_ptr(&vdev->shmem_list[i],
> + OBJECT(vdev), "vub-shm-" + i,
> + memory_sizes[i], cache_ptr);
> + }
> + }
> +
> qemu_chr_fe_set_handlers(&vub->chardev, NULL, NULL, vub_event, NULL,
> dev, NULL, true);
> }
> diff --git a/hw/virtio/vhost-user-device-pci.c b/hw/virtio/vhost-user-device-pci.c
> index efaf55d3dd..314bacfb7a 100644
> --- a/hw/virtio/vhost-user-device-pci.c
> +++ b/hw/virtio/vhost-user-device-pci.c
> @@ -8,14 +8,18 @@
> */
>
> #include "qemu/osdep.h"
> +#include "qapi/error.h"
> #include "hw/qdev-properties.h"
> #include "hw/virtio/vhost-user-base.h"
> #include "hw/virtio/virtio-pci.h"
>
> +#define VIRTIO_DEVICE_PCI_CACHE_BAR 2
> +
> struct VHostUserDevicePCI {
> VirtIOPCIProxy parent_obj;
>
> VHostUserBase vub;
> + MemoryRegion cachebar;
> };
>
> #define TYPE_VHOST_USER_DEVICE_PCI "vhost-user-device-pci-base"
> @@ -25,10 +29,37 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserDevicePCI, VHOST_USER_DEVICE_PCI)
> static void vhost_user_device_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> {
> VHostUserDevicePCI *dev = VHOST_USER_DEVICE_PCI(vpci_dev);
> - DeviceState *vdev = DEVICE(&dev->vub);
> -
> + DeviceState *dev_state = DEVICE(&dev->vub);
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev_state);
> + uint64_t offset = 0, cache_size = 0;
> + int i;
> +
> vpci_dev->nvectors = 1;
> - qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> + qdev_realize(dev_state, BUS(&vpci_dev->bus), errp);
> +
> + for (i = 0; i < vdev->n_shmem_regions; i++) {
> + if (vdev->shmem_list[i].size > UINT64_MAX - cache_size) {
> + error_setg(errp, "Total shared memory required overflow");
> + return;
> + }
> + cache_size = cache_size + vdev->shmem_list[i].size;
> + }
> + if (cache_size) {
> + memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
> + "vhost-device-pci-cachebar", cache_size);
> + for (i = 0; i < vdev->n_shmem_regions; i++) {
> + memory_region_add_subregion(&dev->cachebar, offset,
> + &vdev->shmem_list[i]);
> + virtio_pci_add_shm_cap(vpci_dev, VIRTIO_DEVICE_PCI_CACHE_BAR,
> + offset, vdev->shmem_list[i].size, i);
> + offset = offset + vdev->shmem_list[i].size;
> + }
> + pci_register_bar(&vpci_dev->pci_dev, VIRTIO_DEVICE_PCI_CACHE_BAR,
> + PCI_BASE_ADDRESS_SPACE_MEMORY |
> + PCI_BASE_ADDRESS_MEM_PREFETCH |
> + PCI_BASE_ADDRESS_MEM_TYPE_64,
> + &dev->cachebar);
> + }
> }
>
> static void vhost_user_device_pci_class_init(ObjectClass *klass, void *data)
> --
> 2.45.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 3/5] vhost-user-dev: Add cache BAR
2024-07-11 8:25 ` Stefan Hajnoczi
@ 2024-09-04 11:20 ` Albert Esteve
0 siblings, 0 replies; 36+ messages in thread
From: Albert Esteve @ 2024-09-04 11:20 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Michael S. Tsirkin, qemu-devel, jasowang, david, slp,
Alex Bennée
[-- Attachment #1: Type: text/plain, Size: 7129 bytes --]
On Thu, Jul 11, 2024 at 10:25 AM Stefan Hajnoczi <stefanha@redhat.com>
wrote:
> On Fri, Jun 28, 2024 at 04:57:08PM +0200, Albert Esteve wrote:
> > Add a cache BAR in the vhost-user-device
> > into which files can be directly mapped.
> >
> > The number, shmid, and size of the VIRTIO Shared
> > Memory subregions is retrieved through a get_shmem_config
> > message sent by the vhost-user-base module
> > on the realize step, after virtio_init().
> >
> > By default, if VHOST_USER_PROTOCOL_F_SHMEM
> > feature is not supported by the backend,
> > there is no cache.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
>
> Michael: Please review vhost_user_device_pci_realize() below regarding
> virtio-pci BAR layout. Thanks!
>
> > ---
> > hw/virtio/vhost-user-base.c | 39 +++++++++++++++++++++++++++++--
> > hw/virtio/vhost-user-device-pci.c | 37 ++++++++++++++++++++++++++---
> > 2 files changed, 71 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
> > index a83167191e..e47c568a55 100644
> > --- a/hw/virtio/vhost-user-base.c
> > +++ b/hw/virtio/vhost-user-base.c
> > @@ -268,7 +268,9 @@ static void vub_device_realize(DeviceState *dev,
> Error **errp)
> > {
> > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > VHostUserBase *vub = VHOST_USER_BASE(dev);
> > - int ret;
> > + uint64_t memory_sizes[8];
> > + void *cache_ptr;
> > + int i, ret, nregions;
> >
> > if (!vub->chardev.chr) {
> > error_setg(errp, "vhost-user-base: missing chardev");
> > @@ -311,7 +313,7 @@ static void vub_device_realize(DeviceState *dev,
> Error **errp)
> >
> > /* Allocate queues */
> > vub->vqs = g_ptr_array_sized_new(vub->num_vqs);
> > - for (int i = 0; i < vub->num_vqs; i++) {
> > + for (i = 0; i < vub->num_vqs; i++) {
> > g_ptr_array_add(vub->vqs,
> > virtio_add_queue(vdev, vub->vq_size,
> > vub_handle_output));
> > @@ -328,6 +330,39 @@ static void vub_device_realize(DeviceState *dev,
> Error **errp)
> > do_vhost_user_cleanup(vdev, vub);
> > }
> >
> > + ret =
> vub->vhost_dev.vhost_ops->vhost_get_shmem_config(&vub->vhost_dev,
> > + &nregions,
> > + memory_sizes,
> > + errp);
> > +
> > + if (ret < 0) {
> > + do_vhost_user_cleanup(vdev, vub);
> > + }
> > +
> > + for (i = 0; i < nregions; i++) {
> > + if (memory_sizes[i]) {
> > + if (!is_power_of_2(memory_sizes[i]) ||
> > + memory_sizes[i] < qemu_real_host_page_size()) {
>
> Or just if (memory_sizes[i] % qemu_real_host_page_size() != 0)?
I like both options. The original is more explicit, your proposal is more
concise.
I will change it.
>
> > + error_setg(errp, "Shared memory %d size must be a power
> of 2 "
> > + "no smaller than the page size", i);
> > + return;
> > + }
> > +
> > + cache_ptr = mmap(NULL, memory_sizes[i], PROT_READ,
>
> Should this be PROT_NONE like in
> vhost_user_backend_handle_shmem_unmap()?
>
Since this is supposed to be blank memory, I think you may be
right. But I am not completely certain. I'll change it and check if
everything works as expected on my side.
>
> > + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > + if (cache_ptr == MAP_FAILED) {
> > + error_setg(errp, "Unable to mmap blank cache: %s",
> > + strerror(errno));
>
> error_setg_errno() can be used here.
>
> > + return;
> > + }
> > +
> > + virtio_new_shmem_region(vdev);
> > + memory_region_init_ram_ptr(&vdev->shmem_list[i],
> > + OBJECT(vdev), "vub-shm-" + i,
> > + memory_sizes[i], cache_ptr);
> > + }
> > + }
> > +
> > qemu_chr_fe_set_handlers(&vub->chardev, NULL, NULL, vub_event, NULL,
> > dev, NULL, true);
> > }
> > diff --git a/hw/virtio/vhost-user-device-pci.c
> b/hw/virtio/vhost-user-device-pci.c
> > index efaf55d3dd..314bacfb7a 100644
> > --- a/hw/virtio/vhost-user-device-pci.c
> > +++ b/hw/virtio/vhost-user-device-pci.c
> > @@ -8,14 +8,18 @@
> > */
> >
> > #include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > #include "hw/qdev-properties.h"
> > #include "hw/virtio/vhost-user-base.h"
> > #include "hw/virtio/virtio-pci.h"
> >
> > +#define VIRTIO_DEVICE_PCI_CACHE_BAR 2
> > +
> > struct VHostUserDevicePCI {
> > VirtIOPCIProxy parent_obj;
> >
> > VHostUserBase vub;
> > + MemoryRegion cachebar;
> > };
> >
> > #define TYPE_VHOST_USER_DEVICE_PCI "vhost-user-device-pci-base"
> > @@ -25,10 +29,37 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserDevicePCI,
> VHOST_USER_DEVICE_PCI)
> > static void vhost_user_device_pci_realize(VirtIOPCIProxy *vpci_dev,
> Error **errp)
> > {
> > VHostUserDevicePCI *dev = VHOST_USER_DEVICE_PCI(vpci_dev);
> > - DeviceState *vdev = DEVICE(&dev->vub);
> > -
> > + DeviceState *dev_state = DEVICE(&dev->vub);
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev_state);
> > + uint64_t offset = 0, cache_size = 0;
> > + int i;
> > +
> > vpci_dev->nvectors = 1;
> > - qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > + qdev_realize(dev_state, BUS(&vpci_dev->bus), errp);
> > +
> > + for (i = 0; i < vdev->n_shmem_regions; i++) {
> > + if (vdev->shmem_list[i].size > UINT64_MAX - cache_size) {
> > + error_setg(errp, "Total shared memory required overflow");
> > + return;
> > + }
> > + cache_size = cache_size + vdev->shmem_list[i].size;
> > + }
> > + if (cache_size) {
> > + memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
> > + "vhost-device-pci-cachebar", cache_size);
> > + for (i = 0; i < vdev->n_shmem_regions; i++) {
> > + memory_region_add_subregion(&dev->cachebar, offset,
> > + &vdev->shmem_list[i]);
> > + virtio_pci_add_shm_cap(vpci_dev,
> VIRTIO_DEVICE_PCI_CACHE_BAR,
> > + offset, vdev->shmem_list[i].size, i);
> > + offset = offset + vdev->shmem_list[i].size;
> > + }
> > + pci_register_bar(&vpci_dev->pci_dev,
> VIRTIO_DEVICE_PCI_CACHE_BAR,
> > + PCI_BASE_ADDRESS_SPACE_MEMORY |
> > + PCI_BASE_ADDRESS_MEM_PREFETCH |
> > + PCI_BASE_ADDRESS_MEM_TYPE_64,
> > + &dev->cachebar);
> > + }
> > }
> >
> > static void vhost_user_device_pci_class_init(ObjectClass *klass, void
> *data)
> > --
> > 2.45.2
> >
>
[-- Attachment #2: Type: text/html, Size: 9586 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH v2 4/5] vhost_user: Add MEM_READ/WRITE backend requests
2024-06-28 14:57 [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
` (2 preceding siblings ...)
2024-06-28 14:57 ` [RFC PATCH v2 3/5] vhost-user-dev: Add cache BAR Albert Esteve
@ 2024-06-28 14:57 ` Albert Esteve
2024-07-11 8:53 ` Stefan Hajnoczi
2024-06-28 14:57 ` [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers Albert Esteve
` (2 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-06-28 14:57 UTC (permalink / raw)
To: qemu-devel
Cc: jasowang, david, slp, Alex Bennée, stefanha,
Michael S. Tsirkin, Albert Esteve
With SHMEM_MAP messages, sharing descriptors between
devices will cause that these devices do not see the
mappings, and fail to access these memory regions.
To solve this, introduce MEM_READ/WRITE requests
that will get triggered as a fallback when
vhost-user memory translation fails.
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
hw/virtio/vhost-user.c | 31 +++++++++
subprojects/libvhost-user/libvhost-user.c | 84 +++++++++++++++++++++++
subprojects/libvhost-user/libvhost-user.h | 38 ++++++++++
3 files changed, 153 insertions(+)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 57406dc8b4..18cacb2d68 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -118,6 +118,8 @@ typedef enum VhostUserBackendRequest {
VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
VHOST_USER_BACKEND_SHMEM_MAP = 9,
VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
+ VHOST_USER_BACKEND_MEM_READ = 11,
+ VHOST_USER_BACKEND_MEM_WRITE = 12,
VHOST_USER_BACKEND_MAX
} VhostUserBackendRequest;
@@ -145,6 +147,12 @@ typedef struct VhostUserShMemConfig {
uint64_t memory_sizes[VHOST_MEMORY_BASELINE_NREGIONS];
} VhostUserShMemConfig;
+typedef struct VhostUserMemRWMsg {
+ uint64_t guest_address;
+ uint32_t size;
+ uint8_t data[];
+} VhostUserMemRWMsg;
+
typedef struct VhostUserLog {
uint64_t mmap_size;
uint64_t mmap_offset;
@@ -253,6 +261,7 @@ typedef union {
VhostUserTransferDeviceState transfer_state;
VhostUserMMap mmap;
VhostUserShMemConfig shmem;
+ VhostUserMemRWMsg mem_rw;
} VhostUserPayload;
typedef struct VhostUserMsg {
@@ -1871,6 +1880,22 @@ vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
return 0;
}
+static int
+vhost_user_backend_handle_mem_read(struct vhost_dev *dev,
+ VhostUserMemRWMsg *mem_rw)
+{
+ /* TODO */
+ return -EPERM;
+}
+
+static int
+vhost_user_backend_handle_mem_write(struct vhost_dev *dev,
+ VhostUserMemRWMsg *mem_rw)
+{
+ /* TODO */
+ return -EPERM;
+}
+
static void close_backend_channel(struct vhost_user *u)
{
g_source_destroy(u->backend_src);
@@ -1946,6 +1971,12 @@ static gboolean backend_read(QIOChannel *ioc, GIOCondition condition,
case VHOST_USER_BACKEND_SHMEM_UNMAP:
ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
break;
+ case VHOST_USER_BACKEND_MEM_READ:
+ ret = vhost_user_backend_handle_mem_read(dev, &payload.mem_rw);
+ break;
+ case VHOST_USER_BACKEND_MEM_WRITE:
+ ret = vhost_user_backend_handle_mem_write(dev, &payload.mem_rw);
+ break;
default:
error_report("Received unexpected msg type: %d.", hdr.request);
ret = -EINVAL;
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 28556d183a..b5184064b5 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1651,6 +1651,90 @@ vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
return vu_process_message_reply(dev, &vmsg);
}
+bool
+vu_send_mem_read(VuDev *dev, uint64_t guest_addr, uint32_t size,
+ uint8_t *data)
+{
+ VhostUserMsg msg_reply;
+ VhostUserMsg msg = {
+ .request = VHOST_USER_BACKEND_MEM_READ,
+ .size = sizeof(msg.payload.mem_rw),
+ .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+ .payload = {
+ .mem_rw = {
+ .guest_address = guest_addr,
+ .size = size,
+ }
+ }
+ };
+
+ pthread_mutex_lock(&dev->backend_mutex);
+ if (!vu_message_write(dev, dev->backend_fd, &msg)) {
+ goto out_err;
+ }
+
+ if (!vu_message_read_default(dev, dev->backend_fd, &msg_reply)) {
+ goto out_err;
+ }
+
+ if (msg_reply.request != msg.request) {
+ DPRINT("Received unexpected msg type. Expected %d, received %d",
+ msg.request, msg_reply.request);
+ goto out_err;
+ }
+
+ if (msg_reply.payload.mem_rw.size != size) {
+ DPRINT("Received unexpected number of bytes in the response. "
+ "Expected %d, received %d",
+ size, msg_reply.payload.mem_rw.size);
+ goto out_err;
+ }
+
+ data = malloc(msg_reply.payload.mem_rw.size);
+ if (!data) {
+ DPRINT("Failed to malloc read memory data");
+ goto out_err;
+ }
+
+ memcpy(data, msg_reply.payload.mem_rw.data, size);
+ pthread_mutex_unlock(&dev->backend_mutex);
+ return true;
+
+out_err:
+ pthread_mutex_unlock(&dev->backend_mutex);
+ return false;
+}
+
+bool
+vu_send_mem_write(VuDev *dev, uint64_t guest_addr, uint32_t size,
+ uint8_t *data)
+{
+ VhostUserMsg msg = {
+ .request = VHOST_USER_BACKEND_MEM_WRITE,
+ .size = sizeof(msg.payload.mem_rw),
+ .flags = VHOST_USER_VERSION,
+ .payload = {
+ .mem_rw = {
+ .guest_address = guest_addr,
+ .size = size,
+ }
+ }
+ };
+ memcpy(msg.payload.mem_rw.data, data, size);
+
+ if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
+ msg.flags |= VHOST_USER_NEED_REPLY_MASK;
+ }
+
+ if (!vu_message_write(dev, dev->backend_fd, &msg)) {
+ pthread_mutex_unlock(&dev->backend_mutex);
+ return false;
+ }
+
+ /* Also unlocks the backend_mutex */
+ return vu_process_message_reply(dev, &msg);
+}
+
static bool
vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
{
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 7f6c22cc1a..8ef794870d 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -129,6 +129,8 @@ typedef enum VhostUserBackendRequest {
VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
VHOST_USER_BACKEND_SHMEM_MAP = 9,
VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
+ VHOST_USER_BACKEND_MEM_READ = 11,
+ VHOST_USER_BACKEND_MEM_WRITE = 12,
VHOST_USER_BACKEND_MAX
} VhostUserBackendRequest;
@@ -152,6 +154,12 @@ typedef struct VhostUserMemRegMsg {
VhostUserMemoryRegion region;
} VhostUserMemRegMsg;
+typedef struct VhostUserMemRWMsg {
+ uint64_t guest_address;
+ uint32_t size;
+ uint8_t data[];
+} VhostUserMemRWMsg;
+
typedef struct VhostUserLog {
uint64_t mmap_size;
uint64_t mmap_offset;
@@ -235,6 +243,7 @@ typedef struct VhostUserMsg {
VhostUserInflight inflight;
VhostUserShared object;
VhostUserMMap mmap;
+ VhostUserMemRWMsg mem_rw;
} payload;
int fds[VHOST_MEMORY_BASELINE_NREGIONS];
@@ -650,6 +659,35 @@ bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
uint64_t shm_offset, uint64_t len);
+/**
+ * vu_send_mem_read:
+ * @dev: a VuDev context
+ * @guest_addr: guest physical address to read
+ * @size: number of bytes to read
+ * @data: head of an unitialized bytes array
+ *
+ * Reads `size` bytes of `guest_addr` in the frontend and stores
+ * them in `data`.
+ *
+ * Returns: TRUE on success, FALSE on failure.
+ */
+bool vu_send_mem_read(VuDev *dev, uint64_t guest_addr, uint32_t size,
+ uint8_t *data);
+
+/**
+ * vu_send_mem_write:
+ * @dev: a VuDev context
+ * @guest_addr: guest physical address to write
+ * @size: number of bytes to write
+ * @data: head of an array with `size` bytes to write
+ *
+ * Writes `size` bytes from `data` into `guest_addr` in the frontend.
+ *
+ * Returns: TRUE on success, FALSE on failure.
+ */
+bool vu_send_mem_write(VuDev *dev, uint64_t guest_addr, uint32_t size,
+ uint8_t *data);
+
/**
* vu_queue_set_notification:
* @dev: a VuDev context
--
2.45.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 4/5] vhost_user: Add MEM_READ/WRITE backend requests
2024-06-28 14:57 ` [RFC PATCH v2 4/5] vhost_user: Add MEM_READ/WRITE backend requests Albert Esteve
@ 2024-07-11 8:53 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11 8:53 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 9813 bytes --]
On Fri, Jun 28, 2024 at 04:57:09PM +0200, Albert Esteve wrote:
> With SHMEM_MAP messages, sharing descriptors between
> devices will cause that these devices do not see the
> mappings, and fail to access these memory regions.
>
> To solve this, introduce MEM_READ/WRITE requests
> that will get triggered as a fallback when
> vhost-user memory translation fails.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> hw/virtio/vhost-user.c | 31 +++++++++
> subprojects/libvhost-user/libvhost-user.c | 84 +++++++++++++++++++++++
> subprojects/libvhost-user/libvhost-user.h | 38 ++++++++++
> 3 files changed, 153 insertions(+)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 57406dc8b4..18cacb2d68 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -118,6 +118,8 @@ typedef enum VhostUserBackendRequest {
> VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> VHOST_USER_BACKEND_SHMEM_MAP = 9,
> VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> + VHOST_USER_BACKEND_MEM_READ = 11,
> + VHOST_USER_BACKEND_MEM_WRITE = 12,
> VHOST_USER_BACKEND_MAX
> } VhostUserBackendRequest;
>
> @@ -145,6 +147,12 @@ typedef struct VhostUserShMemConfig {
> uint64_t memory_sizes[VHOST_MEMORY_BASELINE_NREGIONS];
> } VhostUserShMemConfig;
>
> +typedef struct VhostUserMemRWMsg {
> + uint64_t guest_address;
> + uint32_t size;
> + uint8_t data[];
I don't think flexible array members work in VhostUserMsg payload
structs in its current form. It would be necessary to move the
VhostUserMsg.payload field to the end of the VhostUserMsg and then
heap-allocate VhostUserMsg with the additional size required for
VhostUserMemRWMsg.data[].
Right now this patch is calling memcpy() on memory beyond
VhostUserMsg.payload because the VhostUserMsg struct does not have size
bytes of extra space and the payload field is in the middle of the
struct where flexible array members cannot be used.
> +} VhostUserMemRWMsg;
> +
> typedef struct VhostUserLog {
> uint64_t mmap_size;
> uint64_t mmap_offset;
> @@ -253,6 +261,7 @@ typedef union {
> VhostUserTransferDeviceState transfer_state;
> VhostUserMMap mmap;
> VhostUserShMemConfig shmem;
> + VhostUserMemRWMsg mem_rw;
> } VhostUserPayload;
>
> typedef struct VhostUserMsg {
> @@ -1871,6 +1880,22 @@ vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> return 0;
> }
>
> +static int
> +vhost_user_backend_handle_mem_read(struct vhost_dev *dev,
> + VhostUserMemRWMsg *mem_rw)
> +{
> + /* TODO */
> + return -EPERM;
> +}
> +
> +static int
> +vhost_user_backend_handle_mem_write(struct vhost_dev *dev,
> + VhostUserMemRWMsg *mem_rw)
> +{
> + /* TODO */
> + return -EPERM;
> +}
Reading/writing guest memory can be done via
address_space_read/write(vdev->dma_as, ...).
> +
> static void close_backend_channel(struct vhost_user *u)
> {
> g_source_destroy(u->backend_src);
> @@ -1946,6 +1971,12 @@ static gboolean backend_read(QIOChannel *ioc, GIOCondition condition,
> case VHOST_USER_BACKEND_SHMEM_UNMAP:
> ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> break;
> + case VHOST_USER_BACKEND_MEM_READ:
> + ret = vhost_user_backend_handle_mem_read(dev, &payload.mem_rw);
> + break;
> + case VHOST_USER_BACKEND_MEM_WRITE:
> + ret = vhost_user_backend_handle_mem_write(dev, &payload.mem_rw);
> + break;
> default:
> error_report("Received unexpected msg type: %d.", hdr.request);
> ret = -EINVAL;
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 28556d183a..b5184064b5 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -1651,6 +1651,90 @@ vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> return vu_process_message_reply(dev, &vmsg);
> }
>
> +bool
> +vu_send_mem_read(VuDev *dev, uint64_t guest_addr, uint32_t size,
> + uint8_t *data)
> +{
> + VhostUserMsg msg_reply;
> + VhostUserMsg msg = {
> + .request = VHOST_USER_BACKEND_MEM_READ,
> + .size = sizeof(msg.payload.mem_rw),
> + .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
> + .payload = {
> + .mem_rw = {
> + .guest_address = guest_addr,
> + .size = size,
> + }
> + }
> + };
> +
> + pthread_mutex_lock(&dev->backend_mutex);
> + if (!vu_message_write(dev, dev->backend_fd, &msg)) {
> + goto out_err;
> + }
> +
> + if (!vu_message_read_default(dev, dev->backend_fd, &msg_reply)) {
> + goto out_err;
> + }
> +
> + if (msg_reply.request != msg.request) {
> + DPRINT("Received unexpected msg type. Expected %d, received %d",
> + msg.request, msg_reply.request);
> + goto out_err;
> + }
> +
> + if (msg_reply.payload.mem_rw.size != size) {
> + DPRINT("Received unexpected number of bytes in the response. "
> + "Expected %d, received %d",
> + size, msg_reply.payload.mem_rw.size);
> + goto out_err;
> + }
> +
> + data = malloc(msg_reply.payload.mem_rw.size);
The caller passed in size and data so the caller has provided the
buffer. malloc() is not necessary here.
> + if (!data) {
> + DPRINT("Failed to malloc read memory data");
> + goto out_err;
> + }
> +
> + memcpy(data, msg_reply.payload.mem_rw.data, size);
It should be possible to avoid memcpy() here by receiving directly into
the caller's buffer. If you don't want to look into this, please leave a
TODO comment.
> + pthread_mutex_unlock(&dev->backend_mutex);
> + return true;
> +
> +out_err:
> + pthread_mutex_unlock(&dev->backend_mutex);
> + return false;
> +}
> +
> +bool
> +vu_send_mem_write(VuDev *dev, uint64_t guest_addr, uint32_t size,
> + uint8_t *data)
> +{
> + VhostUserMsg msg = {
> + .request = VHOST_USER_BACKEND_MEM_WRITE,
> + .size = sizeof(msg.payload.mem_rw),
> + .flags = VHOST_USER_VERSION,
> + .payload = {
> + .mem_rw = {
> + .guest_address = guest_addr,
> + .size = size,
> + }
> + }
> + };
> + memcpy(msg.payload.mem_rw.data, data, size);
This memcpy() can be eliminated too. It's worth a code comment in case
someone looks at optimizing this in the future.
> +
> + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> + msg.flags |= VHOST_USER_NEED_REPLY_MASK;
> + }
> +
> + if (!vu_message_write(dev, dev->backend_fd, &msg)) {
> + pthread_mutex_unlock(&dev->backend_mutex);
> + return false;
> + }
> +
> + /* Also unlocks the backend_mutex */
> + return vu_process_message_reply(dev, &msg);
> +}
> +
> static bool
> vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> {
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> index 7f6c22cc1a..8ef794870d 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -129,6 +129,8 @@ typedef enum VhostUserBackendRequest {
> VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> VHOST_USER_BACKEND_SHMEM_MAP = 9,
> VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> + VHOST_USER_BACKEND_MEM_READ = 11,
> + VHOST_USER_BACKEND_MEM_WRITE = 12,
> VHOST_USER_BACKEND_MAX
> } VhostUserBackendRequest;
>
> @@ -152,6 +154,12 @@ typedef struct VhostUserMemRegMsg {
> VhostUserMemoryRegion region;
> } VhostUserMemRegMsg;
>
> +typedef struct VhostUserMemRWMsg {
> + uint64_t guest_address;
> + uint32_t size;
> + uint8_t data[];
> +} VhostUserMemRWMsg;
> +
> typedef struct VhostUserLog {
> uint64_t mmap_size;
> uint64_t mmap_offset;
> @@ -235,6 +243,7 @@ typedef struct VhostUserMsg {
> VhostUserInflight inflight;
> VhostUserShared object;
> VhostUserMMap mmap;
> + VhostUserMemRWMsg mem_rw;
> } payload;
>
> int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> @@ -650,6 +659,35 @@ bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> uint64_t shm_offset, uint64_t len);
>
> +/**
> + * vu_send_mem_read:
> + * @dev: a VuDev context
> + * @guest_addr: guest physical address to read
> + * @size: number of bytes to read
> + * @data: head of an unitialized bytes array
> + *
> + * Reads `size` bytes of `guest_addr` in the frontend and stores
> + * them in `data`.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_send_mem_read(VuDev *dev, uint64_t guest_addr, uint32_t size,
> + uint8_t *data);
> +
> +/**
> + * vu_send_mem_write:
> + * @dev: a VuDev context
> + * @guest_addr: guest physical address to write
> + * @size: number of bytes to write
> + * @data: head of an array with `size` bytes to write
> + *
> + * Writes `size` bytes from `data` into `guest_addr` in the frontend.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_send_mem_write(VuDev *dev, uint64_t guest_addr, uint32_t size,
> + uint8_t *data);
> +
> /**
> * vu_queue_set_notification:
> * @dev: a VuDev context
> --
> 2.45.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers
2024-06-28 14:57 [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
` (3 preceding siblings ...)
2024-06-28 14:57 ` [RFC PATCH v2 4/5] vhost_user: Add MEM_READ/WRITE backend requests Albert Esteve
@ 2024-06-28 14:57 ` Albert Esteve
2024-07-11 8:55 ` Stefan Hajnoczi
2024-07-11 9:01 ` [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Stefan Hajnoczi
2024-07-11 10:56 ` Alyssa Ross
6 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-06-28 14:57 UTC (permalink / raw)
To: qemu-devel
Cc: jasowang, david, slp, Alex Bennée, stefanha,
Michael S. Tsirkin, Albert Esteve
Implement function handlers for memory read and write
operations.
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
hw/virtio/vhost-user.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 18cacb2d68..79becbc87b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1884,16 +1884,42 @@ static int
vhost_user_backend_handle_mem_read(struct vhost_dev *dev,
VhostUserMemRWMsg *mem_rw)
{
- /* TODO */
- return -EPERM;
+ ram_addr_t offset;
+ int fd;
+ MemoryRegion *mr;
+
+ mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
+
+ if (!mr) {
+ error_report("Failed to get memory region with address %" PRIx64,
+ mem_rw->guest_address);
+ return -EFAULT;
+ }
+
+ memcpy(mem_rw->data, memory_region_get_ram_ptr(mr) + offset, mem_rw->size);
+
+ return 0;
}
static int
vhost_user_backend_handle_mem_write(struct vhost_dev *dev,
VhostUserMemRWMsg *mem_rw)
{
- /* TODO */
- return -EPERM;
+ ram_addr_t offset;
+ int fd;
+ MemoryRegion *mr;
+
+ mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
+
+ if (!mr) {
+ error_report("Failed to get memory region with address %" PRIx64,
+ mem_rw->guest_address);
+ return -EFAULT;
+ }
+
+ memcpy(memory_region_get_ram_ptr(mr) + offset, mem_rw->data, mem_rw->size);
+
+ return 0;
}
static void close_backend_channel(struct vhost_user *u)
--
2.45.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers
2024-06-28 14:57 ` [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers Albert Esteve
@ 2024-07-11 8:55 ` Stefan Hajnoczi
2024-09-04 13:01 ` Albert Esteve
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11 8:55 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 2036 bytes --]
On Fri, Jun 28, 2024 at 04:57:10PM +0200, Albert Esteve wrote:
> Implement function handlers for memory read and write
> operations.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> hw/virtio/vhost-user.c | 34 ++++++++++++++++++++++++++++++----
> 1 file changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 18cacb2d68..79becbc87b 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1884,16 +1884,42 @@ static int
> vhost_user_backend_handle_mem_read(struct vhost_dev *dev,
> VhostUserMemRWMsg *mem_rw)
> {
> - /* TODO */
> - return -EPERM;
> + ram_addr_t offset;
> + int fd;
> + MemoryRegion *mr;
> +
> + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
> +
> + if (!mr) {
> + error_report("Failed to get memory region with address %" PRIx64,
> + mem_rw->guest_address);
> + return -EFAULT;
> + }
> +
> + memcpy(mem_rw->data, memory_region_get_ram_ptr(mr) + offset, mem_rw->size);
Don't try to write this from scratch. Use address_space_read/write(). It
supports corner cases like crossing MemoryRegions.
> +
> + return 0;
> }
>
> static int
> vhost_user_backend_handle_mem_write(struct vhost_dev *dev,
> VhostUserMemRWMsg *mem_rw)
> {
> - /* TODO */
> - return -EPERM;
> + ram_addr_t offset;
> + int fd;
> + MemoryRegion *mr;
> +
> + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
> +
> + if (!mr) {
> + error_report("Failed to get memory region with address %" PRIx64,
> + mem_rw->guest_address);
> + return -EFAULT;
> + }
> +
> + memcpy(memory_region_get_ram_ptr(mr) + offset, mem_rw->data, mem_rw->size);
> +
> + return 0;
> }
>
> static void close_backend_channel(struct vhost_user *u)
> --
> 2.45.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers
2024-07-11 8:55 ` Stefan Hajnoczi
@ 2024-09-04 13:01 ` Albert Esteve
2024-09-05 19:18 ` Stefan Hajnoczi
0 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-09-04 13:01 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 2663 bytes --]
On Thu, Jul 11, 2024 at 10:55 AM Stefan Hajnoczi <stefanha@redhat.com>
wrote:
> On Fri, Jun 28, 2024 at 04:57:10PM +0200, Albert Esteve wrote:
> > Implement function handlers for memory read and write
> > operations.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> > hw/virtio/vhost-user.c | 34 ++++++++++++++++++++++++++++++----
> > 1 file changed, 30 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 18cacb2d68..79becbc87b 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -1884,16 +1884,42 @@ static int
> > vhost_user_backend_handle_mem_read(struct vhost_dev *dev,
> > VhostUserMemRWMsg *mem_rw)
> > {
> > - /* TODO */
> > - return -EPERM;
> > + ram_addr_t offset;
> > + int fd;
> > + MemoryRegion *mr;
> > +
> > + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
> > +
> > + if (!mr) {
> > + error_report("Failed to get memory region with address %"
> PRIx64,
> > + mem_rw->guest_address);
> > + return -EFAULT;
> > + }
> > +
> > + memcpy(mem_rw->data, memory_region_get_ram_ptr(mr) + offset,
> mem_rw->size);
>
> Don't try to write this from scratch. Use address_space_read/write(). It
> supports corner cases like crossing MemoryRegions.
>
I am having issues getting the address space from the vhost_dev struct to
feed
address_spave_read/write() function with the first parameter. But I found
mr->ops.
Would something like this perhaps be enough?
```
mr->ops->read_with_attrs(mr->opaque, mem_rw->guest_address,
&mem_rw->data, mem_rw->size,
MEMTXATTRS_UNSPECIFIED);
```
>
> > +
> > + return 0;
> > }
> >
> > static int
> > vhost_user_backend_handle_mem_write(struct vhost_dev *dev,
> > VhostUserMemRWMsg *mem_rw)
> > {
> > - /* TODO */
> > - return -EPERM;
> > + ram_addr_t offset;
> > + int fd;
> > + MemoryRegion *mr;
> > +
> > + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
> > +
> > + if (!mr) {
> > + error_report("Failed to get memory region with address %"
> PRIx64,
> > + mem_rw->guest_address);
> > + return -EFAULT;
> > + }
> > +
> > + memcpy(memory_region_get_ram_ptr(mr) + offset, mem_rw->data,
> mem_rw->size);
> > +
> > + return 0;
> > }
> >
> > static void close_backend_channel(struct vhost_user *u)
> > --
> > 2.45.2
> >
>
[-- Attachment #2: Type: text/html, Size: 3907 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers
2024-09-04 13:01 ` Albert Esteve
@ 2024-09-05 19:18 ` Stefan Hajnoczi
2024-09-10 7:14 ` Albert Esteve
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-09-05 19:18 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 3122 bytes --]
On Wed, Sep 04, 2024 at 03:01:06PM +0200, Albert Esteve wrote:
> On Thu, Jul 11, 2024 at 10:55 AM Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
>
> > On Fri, Jun 28, 2024 at 04:57:10PM +0200, Albert Esteve wrote:
> > > Implement function handlers for memory read and write
> > > operations.
> > >
> > > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > > ---
> > > hw/virtio/vhost-user.c | 34 ++++++++++++++++++++++++++++++----
> > > 1 file changed, 30 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 18cacb2d68..79becbc87b 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -1884,16 +1884,42 @@ static int
> > > vhost_user_backend_handle_mem_read(struct vhost_dev *dev,
> > > VhostUserMemRWMsg *mem_rw)
> > > {
> > > - /* TODO */
> > > - return -EPERM;
> > > + ram_addr_t offset;
> > > + int fd;
> > > + MemoryRegion *mr;
> > > +
> > > + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
> > > +
> > > + if (!mr) {
> > > + error_report("Failed to get memory region with address %"
> > PRIx64,
> > > + mem_rw->guest_address);
> > > + return -EFAULT;
> > > + }
> > > +
> > > + memcpy(mem_rw->data, memory_region_get_ram_ptr(mr) + offset,
> > mem_rw->size);
> >
> > Don't try to write this from scratch. Use address_space_read/write(). It
> > supports corner cases like crossing MemoryRegions.
> >
>
> I am having issues getting the address space from the vhost_dev struct to
> feed
> address_spave_read/write() function with the first parameter. But I found
> mr->ops.
> Would something like this perhaps be enough?
>
> ```
> mr->ops->read_with_attrs(mr->opaque, mem_rw->guest_address,
> &mem_rw->data, mem_rw->size,
> MEMTXATTRS_UNSPECIFIED);
> ```
You can use dev->vdev->dma_as to get the AddressSpace for
address_space_read/write():
struct vhost_dev {
VirtIODevice *vdev;
struct VirtIODevice
{
...
AddressSpace *dma_as;
>
>
> >
> > > +
> > > + return 0;
> > > }
> > >
> > > static int
> > > vhost_user_backend_handle_mem_write(struct vhost_dev *dev,
> > > VhostUserMemRWMsg *mem_rw)
> > > {
> > > - /* TODO */
> > > - return -EPERM;
> > > + ram_addr_t offset;
> > > + int fd;
> > > + MemoryRegion *mr;
> > > +
> > > + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset, &fd);
> > > +
> > > + if (!mr) {
> > > + error_report("Failed to get memory region with address %"
> > PRIx64,
> > > + mem_rw->guest_address);
> > > + return -EFAULT;
> > > + }
> > > +
> > > + memcpy(memory_region_get_ram_ptr(mr) + offset, mem_rw->data,
> > mem_rw->size);
> > > +
> > > + return 0;
> > > }
> > >
> > > static void close_backend_channel(struct vhost_user *u)
> > > --
> > > 2.45.2
> > >
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers
2024-09-05 19:18 ` Stefan Hajnoczi
@ 2024-09-10 7:14 ` Albert Esteve
0 siblings, 0 replies; 36+ messages in thread
From: Albert Esteve @ 2024-09-10 7:14 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 3498 bytes --]
On Thu, Sep 5, 2024 at 9:18 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Wed, Sep 04, 2024 at 03:01:06PM +0200, Albert Esteve wrote:
> > On Thu, Jul 11, 2024 at 10:55 AM Stefan Hajnoczi <stefanha@redhat.com>
> > wrote:
> >
> > > On Fri, Jun 28, 2024 at 04:57:10PM +0200, Albert Esteve wrote:
> > > > Implement function handlers for memory read and write
> > > > operations.
> > > >
> > > > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > > > ---
> > > > hw/virtio/vhost-user.c | 34 ++++++++++++++++++++++++++++++----
> > > > 1 file changed, 30 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 18cacb2d68..79becbc87b 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -1884,16 +1884,42 @@ static int
> > > > vhost_user_backend_handle_mem_read(struct vhost_dev *dev,
> > > > VhostUserMemRWMsg *mem_rw)
> > > > {
> > > > - /* TODO */
> > > > - return -EPERM;
> > > > + ram_addr_t offset;
> > > > + int fd;
> > > > + MemoryRegion *mr;
> > > > +
> > > > + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset,
> &fd);
> > > > +
> > > > + if (!mr) {
> > > > + error_report("Failed to get memory region with address %"
> > > PRIx64,
> > > > + mem_rw->guest_address);
> > > > + return -EFAULT;
> > > > + }
> > > > +
> > > > + memcpy(mem_rw->data, memory_region_get_ram_ptr(mr) + offset,
> > > mem_rw->size);
> > >
> > > Don't try to write this from scratch. Use address_space_read/write().
> It
> > > supports corner cases like crossing MemoryRegions.
> > >
> >
> > I am having issues getting the address space from the vhost_dev struct to
> > feed
> > address_spave_read/write() function with the first parameter. But I found
> > mr->ops.
> > Would something like this perhaps be enough?
> >
> > ```
> > mr->ops->read_with_attrs(mr->opaque, mem_rw->guest_address,
> > &mem_rw->data, mem_rw->size,
> > MEMTXATTRS_UNSPECIFIED);
> > ```
>
> You can use dev->vdev->dma_as to get the AddressSpace for
> address_space_read/write():
>
Oof, I see, thanks!
I still struggle a bit with the structs relationships...
>
> struct vhost_dev {
> VirtIODevice *vdev;
>
> struct VirtIODevice
> {
> ...
> AddressSpace *dma_as;
>
> >
> >
> > >
> > > > +
> > > > + return 0;
> > > > }
> > > >
> > > > static int
> > > > vhost_user_backend_handle_mem_write(struct vhost_dev *dev,
> > > > VhostUserMemRWMsg *mem_rw)
> > > > {
> > > > - /* TODO */
> > > > - return -EPERM;
> > > > + ram_addr_t offset;
> > > > + int fd;
> > > > + MemoryRegion *mr;
> > > > +
> > > > + mr = vhost_user_get_mr_data(mem_rw->guest_address, &offset,
> &fd);
> > > > +
> > > > + if (!mr) {
> > > > + error_report("Failed to get memory region with address %"
> > > PRIx64,
> > > > + mem_rw->guest_address);
> > > > + return -EFAULT;
> > > > + }
> > > > +
> > > > + memcpy(memory_region_get_ram_ptr(mr) + offset, mem_rw->data,
> > > mem_rw->size);
> > > > +
> > > > + return 0;
> > > > }
> > > >
> > > > static void close_backend_channel(struct vhost_user *u)
> > > > --
> > > > 2.45.2
> > > >
> > >
>
[-- Attachment #2: Type: text/html, Size: 5456 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-06-28 14:57 [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
` (4 preceding siblings ...)
2024-06-28 14:57 ` [RFC PATCH v2 5/5] vhost_user: Implement mem_read/mem_write handlers Albert Esteve
@ 2024-07-11 9:01 ` Stefan Hajnoczi
2024-07-11 10:56 ` Alyssa Ross
6 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11 9:01 UTC (permalink / raw)
To: Albert Esteve
Cc: qemu-devel, jasowang, david, slp, Alex Bennée,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 4323 bytes --]
On Fri, Jun 28, 2024 at 04:57:05PM +0200, Albert Esteve wrote:
> Hi all,
>
> v1->v2:
> - Corrected typos and clarifications from
> first review
> - Added SHMEM_CONFIG frontend request to
> query VIRTIO shared memory regions from
> backends
> - vhost-user-device to use SHMEM_CONFIG
> to request and initialise regions
> - Added MEM_READ/WRITE backend requests
> in case address translation fails
> accessing VIRTIO Shared Memory Regions
> with MMAPs
Hi Albert,
I will be offline next week. I've posted comments.
I think the hard part will be adjusting vhost-user backend code to make
use of MEM_READ/MEM_WRITE when address translation fails. Ideally every
guest memory access (including vring accesses) should fall back to
MEM_READ/MEM_WRITE.
A good test for MEM_READ/MEM_WRITE is to completely skip setting the
memory table from the frontend and fall back for every guest memory
access. If the vhost-user backend still works without a memory table
then you know MEM_READ/MEM_WRITE is working too.
The vhost-user spec should probably contain a comment explaining that
MEM_READ/MEM_WRITE may be necessary when other device backends use
SHMEM_MAP due to the incomplete memory table that prevents translating
those memory addresses. In other words, if the guest has a device that
uses SHMEM_MAP, then all other vhost-user devices should support
MEM_READ/MEM_WRITE in order to ensure that DMA works with Shared Memory
Regions.
Stefan
>
> This is an update of my attempt to have
> backends support dynamic fd mapping into VIRTIO
> Shared Memory Regions. After the first review
> I have added more commits and new messages
> to the vhost-user protocol.
> However, I still have some doubts as to
> how will this work, specially regarding
> the MEM_READ and MEM_WRITE commands.
> Thus, I am still looking for feedback,
> to ensure that I am going in the right
> direction with the implementation.
>
> The usecase for this patch is, e.g., to support
> vhost-user-gpu RESOURCE_BLOB operations,
> or DAX Window request for virtio-fs. In
> general, any operation where a backend
> need to request the frontend to mmap an
> fd into a VIRTIO Shared Memory Region,
> so that the guest can then access it.
>
> After receiving the SHMEM_MAP/UNMAP request,
> the frontend will perform the mmap with the
> instructed parameters (i.e., shmid, shm_offset,
> fd_offset, fd, lenght).
>
> As there are already a couple devices
> that could benefit of such a feature,
> and more could require it in the future,
> the goal is to make the implementation
> generic.
>
> To that end, the VIRTIO Shared Memory
> Region list is declared in the `VirtIODevice`
> struct.
>
> This patch also includes:
> SHMEM_CONFIG frontend request that is
> specifically meant to allow generic
> vhost-user-device frontend to be able to
> query VIRTIO Shared Memory settings from the
> backend (as this device is generic and agnostic
> of the actual backend configuration).
>
> Finally, MEM_READ/WRITE backend requests are
> added to deal with a potential issue when having
> any backend sharing a descriptor that references
> a mapping to another backend. The first
> backend will not be able to see these
> mappings. So these requests are a fallback
> for vhost-user memory translation fails.
>
> Albert Esteve (5):
> vhost-user: Add VIRTIO Shared Memory map request
> vhost_user: Add frontend command for shmem config
> vhost-user-dev: Add cache BAR
> vhost_user: Add MEM_READ/WRITE backend requests
> vhost_user: Implement mem_read/mem_write handlers
>
> docs/interop/vhost-user.rst | 58 ++++++
> hw/virtio/vhost-user-base.c | 39 +++-
> hw/virtio/vhost-user-device-pci.c | 37 +++-
> hw/virtio/vhost-user.c | 221 ++++++++++++++++++++++
> hw/virtio/virtio.c | 12 ++
> include/hw/virtio/vhost-backend.h | 6 +
> include/hw/virtio/vhost-user.h | 1 +
> include/hw/virtio/virtio.h | 5 +
> subprojects/libvhost-user/libvhost-user.c | 149 +++++++++++++++
> subprojects/libvhost-user/libvhost-user.h | 91 +++++++++
> 10 files changed, 614 insertions(+), 5 deletions(-)
>
> --
> 2.45.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-06-28 14:57 [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
` (5 preceding siblings ...)
2024-07-11 9:01 ` [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests Stefan Hajnoczi
@ 2024-07-11 10:56 ` Alyssa Ross
2024-07-12 2:06 ` David Stevens
6 siblings, 1 reply; 36+ messages in thread
From: Alyssa Ross @ 2024-07-11 10:56 UTC (permalink / raw)
To: Albert Esteve, qemu-devel, David Stevens
Cc: jasowang, david, slp, Alex Bennée, stefanha,
Michael S. Tsirkin, Albert Esteve
[-- Attachment #1: Type: text/plain, Size: 3841 bytes --]
Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
crosvm a couple of years ago.
David, I'd be particularly interested for your thoughts on the MEM_READ
and MEM_WRITE commands, since as far as I know crosvm doesn't implement
anything like that. The discussion leading to those being added starts
here:
https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
It would be great if this could be standardised between QEMU and crosvm
(and therefore have a clearer path toward being implemented in other VMMs)!
Albert Esteve <aesteve@redhat.com> writes:
> Hi all,
>
> v1->v2:
> - Corrected typos and clarifications from
> first review
> - Added SHMEM_CONFIG frontend request to
> query VIRTIO shared memory regions from
> backends
> - vhost-user-device to use SHMEM_CONFIG
> to request and initialise regions
> - Added MEM_READ/WRITE backend requests
> in case address translation fails
> accessing VIRTIO Shared Memory Regions
> with MMAPs
>
> This is an update of my attempt to have
> backends support dynamic fd mapping into VIRTIO
> Shared Memory Regions. After the first review
> I have added more commits and new messages
> to the vhost-user protocol.
> However, I still have some doubts as to
> how will this work, specially regarding
> the MEM_READ and MEM_WRITE commands.
> Thus, I am still looking for feedback,
> to ensure that I am going in the right
> direction with the implementation.
>
> The usecase for this patch is, e.g., to support
> vhost-user-gpu RESOURCE_BLOB operations,
> or DAX Window request for virtio-fs. In
> general, any operation where a backend
> need to request the frontend to mmap an
> fd into a VIRTIO Shared Memory Region,
> so that the guest can then access it.
>
> After receiving the SHMEM_MAP/UNMAP request,
> the frontend will perform the mmap with the
> instructed parameters (i.e., shmid, shm_offset,
> fd_offset, fd, lenght).
>
> As there are already a couple devices
> that could benefit of such a feature,
> and more could require it in the future,
> the goal is to make the implementation
> generic.
>
> To that end, the VIRTIO Shared Memory
> Region list is declared in the `VirtIODevice`
> struct.
>
> This patch also includes:
> SHMEM_CONFIG frontend request that is
> specifically meant to allow generic
> vhost-user-device frontend to be able to
> query VIRTIO Shared Memory settings from the
> backend (as this device is generic and agnostic
> of the actual backend configuration).
>
> Finally, MEM_READ/WRITE backend requests are
> added to deal with a potential issue when having
> any backend sharing a descriptor that references
> a mapping to another backend. The first
> backend will not be able to see these
> mappings. So these requests are a fallback
> for vhost-user memory translation fails.
>
> Albert Esteve (5):
> vhost-user: Add VIRTIO Shared Memory map request
> vhost_user: Add frontend command for shmem config
> vhost-user-dev: Add cache BAR
> vhost_user: Add MEM_READ/WRITE backend requests
> vhost_user: Implement mem_read/mem_write handlers
>
> docs/interop/vhost-user.rst | 58 ++++++
> hw/virtio/vhost-user-base.c | 39 +++-
> hw/virtio/vhost-user-device-pci.c | 37 +++-
> hw/virtio/vhost-user.c | 221 ++++++++++++++++++++++
> hw/virtio/virtio.c | 12 ++
> include/hw/virtio/vhost-backend.h | 6 +
> include/hw/virtio/vhost-user.h | 1 +
> include/hw/virtio/virtio.h | 5 +
> subprojects/libvhost-user/libvhost-user.c | 149 +++++++++++++++
> subprojects/libvhost-user/libvhost-user.h | 91 +++++++++
> 10 files changed, 614 insertions(+), 5 deletions(-)
>
> --
> 2.45.2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-07-11 10:56 ` Alyssa Ross
@ 2024-07-12 2:06 ` David Stevens
2024-07-12 5:47 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: David Stevens @ 2024-07-12 2:06 UTC (permalink / raw)
To: Alyssa Ross
Cc: Albert Esteve, qemu-devel, jasowang, david, slp, Alex Bennée,
stefanha, Michael S. Tsirkin
On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
>
> Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> crosvm a couple of years ago.
>
> David, I'd be particularly interested for your thoughts on the MEM_READ
> and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> anything like that. The discussion leading to those being added starts
> here:
>
> https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
>
> It would be great if this could be standardised between QEMU and crosvm
> (and therefore have a clearer path toward being implemented in other VMMs)!
Setting aside vhost-user for a moment, the DAX example given by Stefan
won't work in crosvm today.
Is universal access to virtio shared memory regions actually mandated
by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
seems reasonable enough, but what about virtio-pmem to virtio-blk?
What about screenshotting a framebuffer in virtio-gpu shared memory to
virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
virtualized environment. But what about when you have real hardware
that speaks virtio involved? That's outside my wheelhouse, but it
doesn't seem like that would be easy to solve.
For what it's worth, my interpretation of the target scenario:
> Other backends don't see these mappings. If the guest submits a vring
> descriptor referencing a mapping to another backend, then that backend
> won't be able to access this memory
is that it's omitting how the implementation is reconciled with
section 2.10.1 of v1.3 of the virtio spec, which states that:
> References into shared memory regions are represented as offsets from
> the beginning of the region instead of absolute memory addresses. Offsets
> are used both for references between structures stored within shared
> memory and for requests placed in virtqueues that refer to shared memory.
My interpretation of that statement is that putting raw guest physical
addresses corresponding to virtio shared memory regions into a vring
is a driver spec violation.
-David
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-07-12 2:06 ` David Stevens
@ 2024-07-12 5:47 ` Michael S. Tsirkin
2024-07-15 2:30 ` Jason Wang
2024-07-16 1:21 ` David Stevens
0 siblings, 2 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2024-07-12 5:47 UTC (permalink / raw)
To: David Stevens
Cc: Alyssa Ross, Albert Esteve, qemu-devel, jasowang, david, slp,
Alex Bennée, stefanha
On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> >
> > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > crosvm a couple of years ago.
> >
> > David, I'd be particularly interested for your thoughts on the MEM_READ
> > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > anything like that. The discussion leading to those being added starts
> > here:
> >
> > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> >
> > It would be great if this could be standardised between QEMU and crosvm
> > (and therefore have a clearer path toward being implemented in other VMMs)!
>
> Setting aside vhost-user for a moment, the DAX example given by Stefan
> won't work in crosvm today.
>
> Is universal access to virtio shared memory regions actually mandated
> by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> seems reasonable enough, but what about virtio-pmem to virtio-blk?
> What about screenshotting a framebuffer in virtio-gpu shared memory to
> virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> virtualized environment. But what about when you have real hardware
> that speaks virtio involved? That's outside my wheelhouse, but it
> doesn't seem like that would be easy to solve.
Yes, it can work for physical devices if allowed by host configuration.
E.g. VFIO supports that I think. Don't think VDPA does.
> For what it's worth, my interpretation of the target scenario:
>
> > Other backends don't see these mappings. If the guest submits a vring
> > descriptor referencing a mapping to another backend, then that backend
> > won't be able to access this memory
>
> is that it's omitting how the implementation is reconciled with
> section 2.10.1 of v1.3 of the virtio spec, which states that:
>
> > References into shared memory regions are represented as offsets from
> > the beginning of the region instead of absolute memory addresses. Offsets
> > are used both for references between structures stored within shared
> > memory and for requests placed in virtqueues that refer to shared memory.
>
> My interpretation of that statement is that putting raw guest physical
> addresses corresponding to virtio shared memory regions into a vring
> is a driver spec violation.
>
> -David
This really applies within device I think. Should be clarified ...
--
MST
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-07-12 5:47 ` Michael S. Tsirkin
@ 2024-07-15 2:30 ` Jason Wang
2024-07-16 1:21 ` David Stevens
1 sibling, 0 replies; 36+ messages in thread
From: Jason Wang @ 2024-07-15 2:30 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: David Stevens, Alyssa Ross, Albert Esteve, qemu-devel, david, slp,
Alex Bennée, stefanha
On Fri, Jul 12, 2024 at 1:48 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > >
> > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > crosvm a couple of years ago.
> > >
> > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > anything like that. The discussion leading to those being added starts
> > > here:
> > >
> > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > >
> > > It would be great if this could be standardised between QEMU and crosvm
> > > (and therefore have a clearer path toward being implemented in other VMMs)!
> >
> > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > won't work in crosvm today.
> >
> > Is universal access to virtio shared memory regions actually mandated
> > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > virtualized environment. But what about when you have real hardware
> > that speaks virtio involved? That's outside my wheelhouse, but it
> > doesn't seem like that would be easy to solve.
>
> Yes, it can work for physical devices if allowed by host configuration.
> E.g. VFIO supports that I think. Don't think VDPA does.
>
I guess you meant iommufd support here?
Thanks
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-07-12 5:47 ` Michael S. Tsirkin
2024-07-15 2:30 ` Jason Wang
@ 2024-07-16 1:21 ` David Stevens
2024-09-03 8:42 ` Albert Esteve
2024-09-05 15:56 ` Stefan Hajnoczi
1 sibling, 2 replies; 36+ messages in thread
From: David Stevens @ 2024-07-16 1:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Alyssa Ross, Albert Esteve, qemu-devel, jasowang, david, slp,
Alex Bennée, stefanha
On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > >
> > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > crosvm a couple of years ago.
> > >
> > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > anything like that. The discussion leading to those being added starts
> > > here:
> > >
> > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > >
> > > It would be great if this could be standardised between QEMU and crosvm
> > > (and therefore have a clearer path toward being implemented in other VMMs)!
> >
> > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > won't work in crosvm today.
> >
> > Is universal access to virtio shared memory regions actually mandated
> > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > virtualized environment. But what about when you have real hardware
> > that speaks virtio involved? That's outside my wheelhouse, but it
> > doesn't seem like that would be easy to solve.
>
> Yes, it can work for physical devices if allowed by host configuration.
> E.g. VFIO supports that I think. Don't think VDPA does.
I'm sure it can work, but that sounds more like a SHOULD (MAY?),
rather than a MUST.
> > For what it's worth, my interpretation of the target scenario:
> >
> > > Other backends don't see these mappings. If the guest submits a vring
> > > descriptor referencing a mapping to another backend, then that backend
> > > won't be able to access this memory
> >
> > is that it's omitting how the implementation is reconciled with
> > section 2.10.1 of v1.3 of the virtio spec, which states that:
> >
> > > References into shared memory regions are represented as offsets from
> > > the beginning of the region instead of absolute memory addresses. Offsets
> > > are used both for references between structures stored within shared
> > > memory and for requests placed in virtqueues that refer to shared memory.
> >
> > My interpretation of that statement is that putting raw guest physical
> > addresses corresponding to virtio shared memory regions into a vring
> > is a driver spec violation.
> >
> > -David
>
> This really applies within device I think. Should be clarified ...
You mean that a virtio device can use absolute memory addresses for
other devices' shared memory regions, but it can't use absolute memory
addresses for its own shared memory regions? That's a rather strange
requirement. Or is the statement simply giving an addressing strategy
that device type specifications are free to ignore?
-David
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-07-16 1:21 ` David Stevens
@ 2024-09-03 8:42 ` Albert Esteve
2024-09-05 16:39 ` Stefan Hajnoczi
2024-09-05 15:56 ` Stefan Hajnoczi
1 sibling, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-09-03 8:42 UTC (permalink / raw)
To: David Stevens
Cc: Michael S. Tsirkin, Alyssa Ross, qemu-devel, jasowang, david, slp,
Alex Bennée, stefanha
[-- Attachment #1: Type: text/plain, Size: 4481 bytes --]
Hello all,
Sorry, I have been a bit disconnected from this thread as I was on
vacations and then had to switch tasks for a while.
I will try to go through all comments and address them for the first
non-RFC drop of this patch series.
But I was discussing with some colleagues on this. So turns out rust-vmm's
vhost-user-gpu will potentially use
this soon, and a rust-vmm/vhost patch have been already posted:
https://github.com/rust-vmm/vhost/pull/251.
So I think it may make sense to:
1. Split the vhost-user documentation patch once settled. Since it is taken
as the official spec,
having it upstreamed independently of the implementation will benefit
other projects to
work/integrate their own code.
2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
If I remember correctly, this addresses a virtio-fs specific issue,
that will not
impact either virtio-gpu nor virtio-media, or any other. So it may make
sense
to separate them so that one does not stall the other. I will try to
have both
integrated in the mid term.
WDYT?
BR,
Albert.
On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote:
> On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > >
> > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > crosvm a couple of years ago.
> > > >
> > > > David, I'd be particularly interested for your thoughts on the
> MEM_READ
> > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
> implement
> > > > anything like that. The discussion leading to those being added
> starts
> > > > here:
> > > >
> > > >
> https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > >
> > > > It would be great if this could be standardised between QEMU and
> crosvm
> > > > (and therefore have a clearer path toward being implemented in other
> VMMs)!
> > >
> > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > won't work in crosvm today.
> > >
> > > Is universal access to virtio shared memory regions actually mandated
> > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > virtualized environment. But what about when you have real hardware
> > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > doesn't seem like that would be easy to solve.
> >
> > Yes, it can work for physical devices if allowed by host configuration.
> > E.g. VFIO supports that I think. Don't think VDPA does.
>
> I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> rather than a MUST.
>
> > > For what it's worth, my interpretation of the target scenario:
> > >
> > > > Other backends don't see these mappings. If the guest submits a vring
> > > > descriptor referencing a mapping to another backend, then that
> backend
> > > > won't be able to access this memory
> > >
> > > is that it's omitting how the implementation is reconciled with
> > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > >
> > > > References into shared memory regions are represented as offsets from
> > > > the beginning of the region instead of absolute memory addresses.
> Offsets
> > > > are used both for references between structures stored within shared
> > > > memory and for requests placed in virtqueues that refer to shared
> memory.
> > >
> > > My interpretation of that statement is that putting raw guest physical
> > > addresses corresponding to virtio shared memory regions into a vring
> > > is a driver spec violation.
> > >
> > > -David
> >
> > This really applies within device I think. Should be clarified ...
>
> You mean that a virtio device can use absolute memory addresses for
> other devices' shared memory regions, but it can't use absolute memory
> addresses for its own shared memory regions? That's a rather strange
> requirement. Or is the statement simply giving an addressing strategy
> that device type specifications are free to ignore?
>
> -David
>
>
[-- Attachment #2: Type: text/html, Size: 5878 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-09-03 8:42 ` Albert Esteve
@ 2024-09-05 16:39 ` Stefan Hajnoczi
2024-09-06 7:03 ` Albert Esteve
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-09-05 16:39 UTC (permalink / raw)
To: Albert Esteve
Cc: David Stevens, Michael S. Tsirkin, Alyssa Ross, qemu-devel,
jasowang, david, slp, Alex Bennée
[-- Attachment #1: Type: text/plain, Size: 6280 bytes --]
On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote:
> Hello all,
>
> Sorry, I have been a bit disconnected from this thread as I was on
> vacations and then had to switch tasks for a while.
>
> I will try to go through all comments and address them for the first
> non-RFC drop of this patch series.
>
> But I was discussing with some colleagues on this. So turns out rust-vmm's
> vhost-user-gpu will potentially use
> this soon, and a rust-vmm/vhost patch have been already posted:
> https://github.com/rust-vmm/vhost/pull/251.
> So I think it may make sense to:
> 1. Split the vhost-user documentation patch once settled. Since it is taken
> as the official spec,
> having it upstreamed independently of the implementation will benefit
> other projects to
> work/integrate their own code.
> 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
> If I remember correctly, this addresses a virtio-fs specific issue,
> that will not
> impact either virtio-gpu nor virtio-media, or any other.
This is an architectural issue that arises from exposing VIRTIO Shared
Memory Regions in vhost-user. It was first seen with Linux virtiofs but
it could happen with other devices and/or guest operating systems.
Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace
may trigger this issue. Userspace may write(2) to an O_DIRECT file with
the mmap as the source. The vhost-user-blk device will not be able to
access the source device's VIRTIO Shared Memory Region and will fail.
> So it may make
> sense
> to separate them so that one does not stall the other. I will try to
> have both
> integrated in the mid term.
If READ_/WRITE_MEM is a pain to implement (I think it is in the
vhost-user back-end, even though I've been a proponent of it), then
another way to deal with this issue is to specify that upon receiving
MAP/UNMAP messages, the vhost-user front-end must update the vhost-user
memory tables of all other vhost-user devices. That way vhost-user
devices will be able to access VIRTIO Shared Memory Regions mapped by
other devices.
Implementing this in QEMU should be much easier than implementing
READ_/WRITE_MEM support in device back-ends.
This will be slow and scale poorly but performance is only a problem for
devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and
virtio-media use MAP/UNMAP often at runtime? They might be able to get
away with this simple solution.
I'd be happy with that. If someone wants to make virtiofs DAX faster,
they can implement READ/WRITE_MEM or another solution later, but let's
at least make things correct from the start.
Stefan
>
> WDYT?
>
> BR,
> Albert.
>
> On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote:
>
> > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > >
> > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > > crosvm a couple of years ago.
> > > > >
> > > > > David, I'd be particularly interested for your thoughts on the
> > MEM_READ
> > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
> > implement
> > > > > anything like that. The discussion leading to those being added
> > starts
> > > > > here:
> > > > >
> > > > >
> > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > >
> > > > > It would be great if this could be standardised between QEMU and
> > crosvm
> > > > > (and therefore have a clearer path toward being implemented in other
> > VMMs)!
> > > >
> > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > > won't work in crosvm today.
> > > >
> > > > Is universal access to virtio shared memory regions actually mandated
> > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > > virtualized environment. But what about when you have real hardware
> > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > doesn't seem like that would be easy to solve.
> > >
> > > Yes, it can work for physical devices if allowed by host configuration.
> > > E.g. VFIO supports that I think. Don't think VDPA does.
> >
> > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > rather than a MUST.
> >
> > > > For what it's worth, my interpretation of the target scenario:
> > > >
> > > > > Other backends don't see these mappings. If the guest submits a vring
> > > > > descriptor referencing a mapping to another backend, then that
> > backend
> > > > > won't be able to access this memory
> > > >
> > > > is that it's omitting how the implementation is reconciled with
> > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > >
> > > > > References into shared memory regions are represented as offsets from
> > > > > the beginning of the region instead of absolute memory addresses.
> > Offsets
> > > > > are used both for references between structures stored within shared
> > > > > memory and for requests placed in virtqueues that refer to shared
> > memory.
> > > >
> > > > My interpretation of that statement is that putting raw guest physical
> > > > addresses corresponding to virtio shared memory regions into a vring
> > > > is a driver spec violation.
> > > >
> > > > -David
> > >
> > > This really applies within device I think. Should be clarified ...
> >
> > You mean that a virtio device can use absolute memory addresses for
> > other devices' shared memory regions, but it can't use absolute memory
> > addresses for its own shared memory regions? That's a rather strange
> > requirement. Or is the statement simply giving an addressing strategy
> > that device type specifications are free to ignore?
> >
> > -David
> >
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-09-05 16:39 ` Stefan Hajnoczi
@ 2024-09-06 7:03 ` Albert Esteve
2024-09-06 13:15 ` Stefan Hajnoczi
0 siblings, 1 reply; 36+ messages in thread
From: Albert Esteve @ 2024-09-06 7:03 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: David Stevens, Michael S. Tsirkin, Alyssa Ross, qemu-devel,
jasowang, david, slp, Alex Bennée
[-- Attachment #1: Type: text/plain, Size: 7633 bytes --]
On Thu, Sep 5, 2024 at 6:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote:
> > Hello all,
> >
> > Sorry, I have been a bit disconnected from this thread as I was on
> > vacations and then had to switch tasks for a while.
> >
> > I will try to go through all comments and address them for the first
> > non-RFC drop of this patch series.
> >
> > But I was discussing with some colleagues on this. So turns out
> rust-vmm's
> > vhost-user-gpu will potentially use
> > this soon, and a rust-vmm/vhost patch have been already posted:
> > https://github.com/rust-vmm/vhost/pull/251.
> > So I think it may make sense to:
> > 1. Split the vhost-user documentation patch once settled. Since it is
> taken
> > as the official spec,
> > having it upstreamed independently of the implementation will benefit
> > other projects to
> > work/integrate their own code.
> > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
> > If I remember correctly, this addresses a virtio-fs specific issue,
> > that will not
> > impact either virtio-gpu nor virtio-media, or any other.
>
> This is an architectural issue that arises from exposing VIRTIO Shared
> Memory Regions in vhost-user. It was first seen with Linux virtiofs but
> it could happen with other devices and/or guest operating systems.
>
> Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace
> may trigger this issue. Userspace may write(2) to an O_DIRECT file with
> the mmap as the source. The vhost-user-blk device will not be able to
> access the source device's VIRTIO Shared Memory Region and will fail.
>
> > So it may make
> > sense
> > to separate them so that one does not stall the other. I will try to
> > have both
> > integrated in the mid term.
>
> If READ_/WRITE_MEM is a pain to implement (I think it is in the
> vhost-user back-end, even though I've been a proponent of it), then
> another way to deal with this issue is to specify that upon receiving
> MAP/UNMAP messages, the vhost-user front-end must update the vhost-user
> memory tables of all other vhost-user devices. That way vhost-user
> devices will be able to access VIRTIO Shared Memory Regions mapped by
> other devices.
>
> Implementing this in QEMU should be much easier than implementing
> READ_/WRITE_MEM support in device back-ends.
>
> This will be slow and scale poorly but performance is only a problem for
> devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and
> virtio-media use MAP/UNMAP often at runtime? They might be able to get
> away with this simple solution.
>
> I'd be happy with that. If someone wants to make virtiofs DAX faster,
> they can implement READ/WRITE_MEM or another solution later, but let's
> at least make things correct from the start.
>
I agree. I want it to be correct first. If you agree on splitting the spec
bits from this
patch I'm already happy. I suggested splitting READ_/WRITE_MEM messages
because I thought that it was a virtiofs-specific issue.
The alternative that you proposed is interesting. I'll take it into
account. But I
feel I prefer to go for the better solution, and if I get too entangled,
then switch
to the easier implementation.
I think we could do this in 2 patches:
1. Split the documentation bits for SHMEM_MAP/_UNMAP. The
implementation for these messages will go into the second patch.
2. The implementation patch: keep going for the time being with
READ_/WRITE_MEM support. And the documentation for that
is kept it within this patch. This way if we switch to the frontend
updating vhost-user memory table, we weren't set in any specific
solution if patch 1 has been already merged.
BR,
Albert.
>
> Stefan
>
> >
> > WDYT?
> >
> > BR,
> > Albert.
> >
> > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org>
> wrote:
> >
> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > >
> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > > >
> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP
> in
> > > > > > crosvm a couple of years ago.
> > > > > >
> > > > > > David, I'd be particularly interested for your thoughts on the
> > > MEM_READ
> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
> > > implement
> > > > > > anything like that. The discussion leading to those being added
> > > starts
> > > > > > here:
> > > > > >
> > > > > >
> > >
> https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > > >
> > > > > > It would be great if this could be standardised between QEMU and
> > > crosvm
> > > > > > (and therefore have a clearer path toward being implemented in
> other
> > > VMMs)!
> > > > >
> > > > > Setting aside vhost-user for a moment, the DAX example given by
> Stefan
> > > > > won't work in crosvm today.
> > > > >
> > > > > Is universal access to virtio shared memory regions actually
> mandated
> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > > What about screenshotting a framebuffer in virtio-gpu shared
> memory to
> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable
> in a
> > > > > virtualized environment. But what about when you have real hardware
> > > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > > doesn't seem like that would be easy to solve.
> > > >
> > > > Yes, it can work for physical devices if allowed by host
> configuration.
> > > > E.g. VFIO supports that I think. Don't think VDPA does.
> > >
> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > > rather than a MUST.
> > >
> > > > > For what it's worth, my interpretation of the target scenario:
> > > > >
> > > > > > Other backends don't see these mappings. If the guest submits a
> vring
> > > > > > descriptor referencing a mapping to another backend, then that
> > > backend
> > > > > > won't be able to access this memory
> > > > >
> > > > > is that it's omitting how the implementation is reconciled with
> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > > >
> > > > > > References into shared memory regions are represented as offsets
> from
> > > > > > the beginning of the region instead of absolute memory addresses.
> > > Offsets
> > > > > > are used both for references between structures stored within
> shared
> > > > > > memory and for requests placed in virtqueues that refer to shared
> > > memory.
> > > > >
> > > > > My interpretation of that statement is that putting raw guest
> physical
> > > > > addresses corresponding to virtio shared memory regions into a
> vring
> > > > > is a driver spec violation.
> > > > >
> > > > > -David
> > > >
> > > > This really applies within device I think. Should be clarified ...
> > >
> > > You mean that a virtio device can use absolute memory addresses for
> > > other devices' shared memory regions, but it can't use absolute memory
> > > addresses for its own shared memory regions? That's a rather strange
> > > requirement. Or is the statement simply giving an addressing strategy
> > > that device type specifications are free to ignore?
> > >
> > > -David
> > >
> > >
>
[-- Attachment #2: Type: text/html, Size: 10028 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-09-06 7:03 ` Albert Esteve
@ 2024-09-06 13:15 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-09-06 13:15 UTC (permalink / raw)
To: Albert Esteve
Cc: Stefan Hajnoczi, David Stevens, Michael S. Tsirkin, Alyssa Ross,
qemu-devel, jasowang, david, slp, Alex Bennée
On Fri, 6 Sept 2024 at 03:06, Albert Esteve <aesteve@redhat.com> wrote:
> On Thu, Sep 5, 2024 at 6:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>
>> On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote:
>> > Hello all,
>> >
>> > Sorry, I have been a bit disconnected from this thread as I was on
>> > vacations and then had to switch tasks for a while.
>> >
>> > I will try to go through all comments and address them for the first
>> > non-RFC drop of this patch series.
>> >
>> > But I was discussing with some colleagues on this. So turns out rust-vmm's
>> > vhost-user-gpu will potentially use
>> > this soon, and a rust-vmm/vhost patch have been already posted:
>> > https://github.com/rust-vmm/vhost/pull/251.
>> > So I think it may make sense to:
>> > 1. Split the vhost-user documentation patch once settled. Since it is taken
>> > as the official spec,
>> > having it upstreamed independently of the implementation will benefit
>> > other projects to
>> > work/integrate their own code.
>> > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
>> > If I remember correctly, this addresses a virtio-fs specific issue,
>> > that will not
>> > impact either virtio-gpu nor virtio-media, or any other.
>>
>> This is an architectural issue that arises from exposing VIRTIO Shared
>> Memory Regions in vhost-user. It was first seen with Linux virtiofs but
>> it could happen with other devices and/or guest operating systems.
>>
>> Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace
>> may trigger this issue. Userspace may write(2) to an O_DIRECT file with
>> the mmap as the source. The vhost-user-blk device will not be able to
>> access the source device's VIRTIO Shared Memory Region and will fail.
>>
>> > So it may make
>> > sense
>> > to separate them so that one does not stall the other. I will try to
>> > have both
>> > integrated in the mid term.
>>
>> If READ_/WRITE_MEM is a pain to implement (I think it is in the
>> vhost-user back-end, even though I've been a proponent of it), then
>> another way to deal with this issue is to specify that upon receiving
>> MAP/UNMAP messages, the vhost-user front-end must update the vhost-user
>> memory tables of all other vhost-user devices. That way vhost-user
>> devices will be able to access VIRTIO Shared Memory Regions mapped by
>> other devices.
>>
>> Implementing this in QEMU should be much easier than implementing
>> READ_/WRITE_MEM support in device back-ends.
>>
>> This will be slow and scale poorly but performance is only a problem for
>> devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and
>> virtio-media use MAP/UNMAP often at runtime? They might be able to get
>> away with this simple solution.
>>
>> I'd be happy with that. If someone wants to make virtiofs DAX faster,
>> they can implement READ/WRITE_MEM or another solution later, but let's
>> at least make things correct from the start.
>
>
> I agree. I want it to be correct first. If you agree on splitting the spec bits from this
> patch I'm already happy. I suggested splitting READ_/WRITE_MEM messages
> because I thought that it was a virtiofs-specific issue.
>
> The alternative that you proposed is interesting. I'll take it into account. But I
> feel I prefer to go for the better solution, and if I get too entangled, then switch
> to the easier implementation.
Great. The difficult part to implementing READ_/WRITE_MEM messages is
modifying libvhost-user and rust-vmm's vhost crate to send the new
messages when address translation fails. This needs to cover all
memory accesses (including vring struct accesses). That code may be a
few levels down in the call stack and assume it can always load/store
directly from mmapped memory.
>
> I think we could do this in 2 patches:
> 1. Split the documentation bits for SHMEM_MAP/_UNMAP. The
> implementation for these messages will go into the second patch.
> 2. The implementation patch: keep going for the time being with
> READ_/WRITE_MEM support. And the documentation for that
> is kept it within this patch. This way if we switch to the frontend
> updating vhost-user memory table, we weren't set in any specific
> solution if patch 1 has been already merged.
I'm happy as long as the vhost-user spec patch that introduces
MAP/UNMAP also covers a solution for the memory access problem (either
READ_/WRITE_MEM or propagating mappings to all vhost-user back-ends).
Stefan
>
> BR,
> Albert.
>
>>
>>
>> Stefan
>>
>> >
>> > WDYT?
>> >
>> > BR,
>> > Albert.
>> >
>> > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote:
>> >
>> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> > > >
>> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
>> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
>> > > > > >
>> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
>> > > > > > crosvm a couple of years ago.
>> > > > > >
>> > > > > > David, I'd be particularly interested for your thoughts on the
>> > > MEM_READ
>> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
>> > > implement
>> > > > > > anything like that. The discussion leading to those being added
>> > > starts
>> > > > > > here:
>> > > > > >
>> > > > > >
>> > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
>> > > > > >
>> > > > > > It would be great if this could be standardised between QEMU and
>> > > crosvm
>> > > > > > (and therefore have a clearer path toward being implemented in other
>> > > VMMs)!
>> > > > >
>> > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
>> > > > > won't work in crosvm today.
>> > > > >
>> > > > > Is universal access to virtio shared memory regions actually mandated
>> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
>> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
>> > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
>> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
>> > > > > virtualized environment. But what about when you have real hardware
>> > > > > that speaks virtio involved? That's outside my wheelhouse, but it
>> > > > > doesn't seem like that would be easy to solve.
>> > > >
>> > > > Yes, it can work for physical devices if allowed by host configuration.
>> > > > E.g. VFIO supports that I think. Don't think VDPA does.
>> > >
>> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
>> > > rather than a MUST.
>> > >
>> > > > > For what it's worth, my interpretation of the target scenario:
>> > > > >
>> > > > > > Other backends don't see these mappings. If the guest submits a vring
>> > > > > > descriptor referencing a mapping to another backend, then that
>> > > backend
>> > > > > > won't be able to access this memory
>> > > > >
>> > > > > is that it's omitting how the implementation is reconciled with
>> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
>> > > > >
>> > > > > > References into shared memory regions are represented as offsets from
>> > > > > > the beginning of the region instead of absolute memory addresses.
>> > > Offsets
>> > > > > > are used both for references between structures stored within shared
>> > > > > > memory and for requests placed in virtqueues that refer to shared
>> > > memory.
>> > > > >
>> > > > > My interpretation of that statement is that putting raw guest physical
>> > > > > addresses corresponding to virtio shared memory regions into a vring
>> > > > > is a driver spec violation.
>> > > > >
>> > > > > -David
>> > > >
>> > > > This really applies within device I think. Should be clarified ...
>> > >
>> > > You mean that a virtio device can use absolute memory addresses for
>> > > other devices' shared memory regions, but it can't use absolute memory
>> > > addresses for its own shared memory regions? That's a rather strange
>> > > requirement. Or is the statement simply giving an addressing strategy
>> > > that device type specifications are free to ignore?
>> > >
>> > > -David
>> > >
>> > >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-07-16 1:21 ` David Stevens
2024-09-03 8:42 ` Albert Esteve
@ 2024-09-05 15:56 ` Stefan Hajnoczi
2024-09-06 4:18 ` David Stevens
1 sibling, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-09-05 15:56 UTC (permalink / raw)
To: David Stevens
Cc: Michael S. Tsirkin, Alyssa Ross, Albert Esteve, qemu-devel,
jasowang, david, slp, Alex Bennée
[-- Attachment #1: Type: text/plain, Size: 4057 bytes --]
On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote:
> On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > >
> > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > crosvm a couple of years ago.
> > > >
> > > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > > anything like that. The discussion leading to those being added starts
> > > > here:
> > > >
> > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > >
> > > > It would be great if this could be standardised between QEMU and crosvm
> > > > (and therefore have a clearer path toward being implemented in other VMMs)!
> > >
> > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > won't work in crosvm today.
> > >
> > > Is universal access to virtio shared memory regions actually mandated
> > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > virtualized environment. But what about when you have real hardware
> > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > doesn't seem like that would be easy to solve.
> >
> > Yes, it can work for physical devices if allowed by host configuration.
> > E.g. VFIO supports that I think. Don't think VDPA does.
>
> I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> rather than a MUST.
>
> > > For what it's worth, my interpretation of the target scenario:
> > >
> > > > Other backends don't see these mappings. If the guest submits a vring
> > > > descriptor referencing a mapping to another backend, then that backend
> > > > won't be able to access this memory
> > >
> > > is that it's omitting how the implementation is reconciled with
> > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > >
> > > > References into shared memory regions are represented as offsets from
> > > > the beginning of the region instead of absolute memory addresses. Offsets
> > > > are used both for references between structures stored within shared
> > > > memory and for requests placed in virtqueues that refer to shared memory.
> > >
> > > My interpretation of that statement is that putting raw guest physical
> > > addresses corresponding to virtio shared memory regions into a vring
> > > is a driver spec violation.
> > >
> > > -David
> >
> > This really applies within device I think. Should be clarified ...
>
> You mean that a virtio device can use absolute memory addresses for
> other devices' shared memory regions, but it can't use absolute memory
> addresses for its own shared memory regions? That's a rather strange
> requirement. Or is the statement simply giving an addressing strategy
> that device type specifications are free to ignore?
My recollection of the intent behind the quoted section is:
1. Structures in shared memory that point to shared memory must used
relative offsets instead of absolute physical addresses.
2. Virtqueue requests that refer to shared memory (e.g. map this page
from virtiofs file to this location in shared memory) must use
relative offsets instead of absolute physical addresses.
In other words, shared memory must be relocatable. Don't assume Shared
Memory Regions have an absolute guest physical address. This makes
device implementations independent of the guest physical memory layout
and might also help when Shared Memory Regions are exposed to guest
user-space where the guest physical memory layout isn't known.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-09-05 15:56 ` Stefan Hajnoczi
@ 2024-09-06 4:18 ` David Stevens
2024-09-06 13:00 ` Stefan Hajnoczi
0 siblings, 1 reply; 36+ messages in thread
From: David Stevens @ 2024-09-06 4:18 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Michael S. Tsirkin, Alyssa Ross, Albert Esteve, qemu-devel,
jasowang, david, slp, Alex Bennée
On Fri, Sep 6, 2024 at 12:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote:
> > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > >
> > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > > crosvm a couple of years ago.
> > > > >
> > > > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > > > anything like that. The discussion leading to those being added starts
> > > > > here:
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > >
> > > > > It would be great if this could be standardised between QEMU and crosvm
> > > > > (and therefore have a clearer path toward being implemented in other VMMs)!
> > > >
> > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > > won't work in crosvm today.
> > > >
> > > > Is universal access to virtio shared memory regions actually mandated
> > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > > virtualized environment. But what about when you have real hardware
> > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > doesn't seem like that would be easy to solve.
> > >
> > > Yes, it can work for physical devices if allowed by host configuration.
> > > E.g. VFIO supports that I think. Don't think VDPA does.
> >
> > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > rather than a MUST.
> >
> > > > For what it's worth, my interpretation of the target scenario:
> > > >
> > > > > Other backends don't see these mappings. If the guest submits a vring
> > > > > descriptor referencing a mapping to another backend, then that backend
> > > > > won't be able to access this memory
> > > >
> > > > is that it's omitting how the implementation is reconciled with
> > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > >
> > > > > References into shared memory regions are represented as offsets from
> > > > > the beginning of the region instead of absolute memory addresses. Offsets
> > > > > are used both for references between structures stored within shared
> > > > > memory and for requests placed in virtqueues that refer to shared memory.
> > > >
> > > > My interpretation of that statement is that putting raw guest physical
> > > > addresses corresponding to virtio shared memory regions into a vring
> > > > is a driver spec violation.
> > > >
> > > > -David
> > >
> > > This really applies within device I think. Should be clarified ...
> >
> > You mean that a virtio device can use absolute memory addresses for
> > other devices' shared memory regions, but it can't use absolute memory
> > addresses for its own shared memory regions? That's a rather strange
> > requirement. Or is the statement simply giving an addressing strategy
> > that device type specifications are free to ignore?
>
> My recollection of the intent behind the quoted section is:
>
> 1. Structures in shared memory that point to shared memory must used
> relative offsets instead of absolute physical addresses.
> 2. Virtqueue requests that refer to shared memory (e.g. map this page
> from virtiofs file to this location in shared memory) must use
> relative offsets instead of absolute physical addresses.
>
> In other words, shared memory must be relocatable. Don't assume Shared
> Memory Regions have an absolute guest physical address. This makes
> device implementations independent of the guest physical memory layout
> and might also help when Shared Memory Regions are exposed to guest
> user-space where the guest physical memory layout isn't known.
Doesn't this discussion contradict the necessity of point 1? If I'm
understanding things correctly, it is valid for virtio device A to
refer to a structure in virtio device B's shared memory region by
absolute guest physical address. At least there is nothing in the spec
about resolving shmid values among different virtio devices, so
absolute guest physical addresses is the only way this sharing can be
done. And if it's valid for a pointer to a structure in a shared
memory region to exist, it's not clear to me why you can't have
pointers within a shared memory region.
It definitely makes sense that setting up a mapping should be done
with offsets. But unless a shared memory region can be dynamically
reallocated at runtime, then it doesn't seem necessary to ban pointers
within a shared memory region.
-David
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2 0/5] vhost-user: Add SHMEM_MAP/UNMAP requests
2024-09-06 4:18 ` David Stevens
@ 2024-09-06 13:00 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2024-09-06 13:00 UTC (permalink / raw)
To: David Stevens
Cc: Stefan Hajnoczi, Michael S. Tsirkin, Alyssa Ross, Albert Esteve,
qemu-devel, jasowang, david, slp, Alex Bennée
On Fri, 6 Sept 2024 at 00:19, David Stevens <stevensd@chromium.org> wrote:
>
> On Fri, Sep 6, 2024 at 12:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote:
> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > > >
> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > > > crosvm a couple of years ago.
> > > > > >
> > > > > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > > > > anything like that. The discussion leading to those being added starts
> > > > > > here:
> > > > > >
> > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > > >
> > > > > > It would be great if this could be standardised between QEMU and crosvm
> > > > > > (and therefore have a clearer path toward being implemented in other VMMs)!
> > > > >
> > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > > > won't work in crosvm today.
> > > > >
> > > > > Is universal access to virtio shared memory regions actually mandated
> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > > > virtualized environment. But what about when you have real hardware
> > > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > > doesn't seem like that would be easy to solve.
> > > >
> > > > Yes, it can work for physical devices if allowed by host configuration.
> > > > E.g. VFIO supports that I think. Don't think VDPA does.
> > >
> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > > rather than a MUST.
> > >
> > > > > For what it's worth, my interpretation of the target scenario:
> > > > >
> > > > > > Other backends don't see these mappings. If the guest submits a vring
> > > > > > descriptor referencing a mapping to another backend, then that backend
> > > > > > won't be able to access this memory
> > > > >
> > > > > is that it's omitting how the implementation is reconciled with
> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > > >
> > > > > > References into shared memory regions are represented as offsets from
> > > > > > the beginning of the region instead of absolute memory addresses. Offsets
> > > > > > are used both for references between structures stored within shared
> > > > > > memory and for requests placed in virtqueues that refer to shared memory.
> > > > >
> > > > > My interpretation of that statement is that putting raw guest physical
> > > > > addresses corresponding to virtio shared memory regions into a vring
> > > > > is a driver spec violation.
> > > > >
> > > > > -David
> > > >
> > > > This really applies within device I think. Should be clarified ...
> > >
> > > You mean that a virtio device can use absolute memory addresses for
> > > other devices' shared memory regions, but it can't use absolute memory
> > > addresses for its own shared memory regions? That's a rather strange
> > > requirement. Or is the statement simply giving an addressing strategy
> > > that device type specifications are free to ignore?
> >
> > My recollection of the intent behind the quoted section is:
> >
> > 1. Structures in shared memory that point to shared memory must used
> > relative offsets instead of absolute physical addresses.
> > 2. Virtqueue requests that refer to shared memory (e.g. map this page
> > from virtiofs file to this location in shared memory) must use
> > relative offsets instead of absolute physical addresses.
> >
> > In other words, shared memory must be relocatable. Don't assume Shared
> > Memory Regions have an absolute guest physical address. This makes
> > device implementations independent of the guest physical memory layout
> > and might also help when Shared Memory Regions are exposed to guest
> > user-space where the guest physical memory layout isn't known.
>
> Doesn't this discussion contradict the necessity of point 1? If I'm
> understanding things correctly, it is valid for virtio device A to
> refer to a structure in virtio device B's shared memory region by
> absolute guest physical address. At least there is nothing in the spec
> about resolving shmid values among different virtio devices, so
> absolute guest physical addresses is the only way this sharing can be
> done. And if it's valid for a pointer to a structure in a shared
> memory region to exist, it's not clear to me why you can't have
> pointers within a shared memory region.
The reason is that VIRTIO has a layered design where the transport and
vring layout deal with bus addresses but device types generally do not
(except for specific exceptions like memory ballooning, etc).
A device's virtqueue requests do not contain addresses (e.g. struct
virtio_net_hdr). The virtqueue interface hides the details of memory
organization and access. In theory a transport could be implemented
over a medium that doesn't even offer shared memory (I think people
have played with remote VIRTIO over TCP) and this is possible because
device types don't know how virtqueue elements are represented.
This same design constraint extends to VIRTIO Shared Memory Regions
because a VIRTIO Shared Memory Region's contents are defined by the
device type, just like virtqueue requests.. I mentioned that it avoids
address translation in device type implementations and also makes it
easy to expose VIRTIO Shared Memory Regions to guest userspace.
(Similarly, putting addresses into the VIRTIO Configuration Space is
also problematic because it exposes details of memory to the device
type. They should be hidden by the VIRTIO transport.)
That explains the intention within the VIRTIO world. The question you
raised was why you're allowed to then pass the address of a VIRTIO
Shared Memory Region to another device instead of passing a <shmid,
offset> pair. The answer is because DMA is beyond the scope of the
VIRTIO spec. If the architecture allows you to expose a buffer that
happens to be located in a VIRTIO Shared Memory Region to another
device, then it's possible to pass that address. The other device may
not even be a VIRTIO device. It just performs a DMA transaction to
read/write that memory. This is happening at another layer and it's a
valid thing to do.
So the answer is that in terms of designing VIRTIO device types,
VIRTIO Shared Memory Region structure layouts or virtqueue request
structs referring to VIRTIO Shared Memory Regions must not use
addresses. But you may be able to pass the address of a VIRTIO Shared
Memory Region to another device for DMA. They don't conflict because
they are at different levels.
> It definitely makes sense that setting up a mapping should be done
> with offsets. But unless a shared memory region can be dynamically
> reallocated at runtime, then it doesn't seem necessary to ban pointers
> within a shared memory region.
On a PCI device the BAR containing the VIRTIO Shared Memory Region can
be remapped at runtime, so the shared memory region can move.
The reason is the same as for why device types don't use address
inside virtqueue request structures: the details of memory are hidden
by the VIRTIO transport and the device type doesn't deal in addresses.
Also, it makes mmapping a VIRTIO Shared Memory Region difficult in
guest userspace because now some kind of address translation mechanism
is necessary (i.e. IOMMU) so that the userspace application (which
doesn't know about physical memory addresses) and the device
implementation can translate memory. Just using offsets avoids this
problem.
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread