qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Albert Esteve <aesteve@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: qemu-devel@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
	hi@alyssa.is, stefanha@redhat.com, david@redhat.com,
	jasowang@redhat.com, dbassey@redhat.com, stevensd@chromium.org,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	slp@redhat.com, manos.pitsidianakis@linaro.org
Subject: Re: [PATCH v10 1/7] vhost-user: Add VirtIO Shared Memory map request
Date: Fri, 17 Oct 2025 14:35:44 +0200	[thread overview]
Message-ID: <CADSE00LLz+hc1DN_x4zHXUSJ30Rf1Gy_WWQmMgMRKE6yOkY8xw@mail.gmail.com> (raw)
In-Reply-To: <5eyt46m7aausn2b26cgtjxhqwt7f6iia3wj7c2kkjaxjjic64p@dyhbrjq32djr>

On Fri, Oct 17, 2025 at 2:13 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Fri, Oct 17, 2025 at 01:24:52PM +0200, Albert Esteve wrote:
> >On Fri, Oct 17, 2025 at 11:23 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>
> >> On Thu, Oct 16, 2025 at 04:38:21PM +0200, Albert Esteve wrote:
> >> >Add SHMEM_MAP/UNMAP requests to vhost-user for dynamic management of
> >> >VIRTIO Shared Memory mappings.
> >> >
> >> >This implementation introduces VirtioSharedMemoryMapping as a unified
> >> >QOM object that manages both the mapping metadata and MemoryRegion
> >> >lifecycle. This object provides reference-counted lifecycle management
> >> >with automatic cleanup of file descriptors and memory regions
> >> >through QOM finalization.
> >> >
> >> >This request allows backends to dynamically map file descriptors into a
> >> >VIRTIO Shared Memory Region identified by their shmid. Maps are created
> >> >using memory_region_init_ram_from_fd() with configurable read/write
> >> >permissions, and the resulting MemoryRegions are added as subregions to
> >> >the shmem container region. The mapped memory is then advertised to the
> >> >guest VIRTIO drivers as a base address plus offset for reading and
> >> >writting according to the requested mmap flags.
> >> >
> >> >The backend can unmap memory ranges within a given VIRTIO Shared Memory
> >> >Region to free resources. Upon receiving this message, the frontend
> >> >removes the MemoryRegion as a subregion and automatically unreferences
> >> >the VirtioSharedMemoryMapping object, triggering cleanup if no other
> >> >references exist.
> >> >
> >> >Error handling has been improved to ensure consistent behavior across
> >> >handlers that manage their own vhost_user_send_resp() calls. Since
> >> >these handlers clear the VHOST_USER_NEED_REPLY_MASK flag, explicit
> >> >error checking ensures proper connection closure on failures,
> >> >maintaining the expected error flow.
> >> >
> >> >Note the memory region commit for these operations needs to be delayed
> >> >until after we reply to the backend to avoid deadlocks. Otherwise,
> >> >the MemoryListener would send a VHOST_USER_SET_MEM_TABLE message
> >> >before the reply.
> >> >
> >> >Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> >Signed-off-by: Albert Esteve <aesteve@redhat.com>
> >> >---
> >> > hw/virtio/vhost-user.c                    | 267 ++++++++++++++++++++++
> >> > hw/virtio/virtio.c                        | 199 ++++++++++++++++
> >> > include/hw/virtio/virtio.h                | 135 +++++++++++
> >> > subprojects/libvhost-user/libvhost-user.c |  70 ++++++
> >> > subprojects/libvhost-user/libvhost-user.h |  54 +++++
> >> > 5 files changed, 725 insertions(+)
> >> >
> >> >diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> >> >index 36c9c2e04d..890be55937 100644
> >> >--- a/hw/virtio/vhost-user.c
> >> >+++ b/hw/virtio/vhost-user.c
> >> >@@ -104,6 +104,7 @@ typedef enum VhostUserRequest {
> >> >     VHOST_USER_GET_SHARED_OBJECT = 41,
> >> >     VHOST_USER_SET_DEVICE_STATE_FD = 42,
> >> >     VHOST_USER_CHECK_DEVICE_STATE = 43,
> >> >+    VHOST_USER_GET_SHMEM_CONFIG = 44,
> >> >     VHOST_USER_MAX
> >> > } VhostUserRequest;
> >> >
> >> >@@ -115,6 +116,8 @@ typedef enum VhostUserBackendRequest {
> >> >     VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >> >     VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >> >     VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> >> >+    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> >> >+    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >> >     VHOST_USER_BACKEND_MAX
> >> > }  VhostUserBackendRequest;
> >> >
> >> >@@ -136,6 +139,12 @@ typedef struct VhostUserMemRegMsg {
> >> >     VhostUserMemoryRegion region;
> >> > } VhostUserMemRegMsg;
> >> >
> >> >+typedef struct VhostUserShMemConfig {
> >> >+    uint32_t nregions;
> >> >+    uint32_t padding;
> >> >+    uint64_t memory_sizes[VIRTIO_MAX_SHMEM_REGIONS];
> >> >+} VhostUserShMemConfig;
> >> >+
> >> > typedef struct VhostUserLog {
> >> >     uint64_t mmap_size;
> >> >     uint64_t mmap_offset;
> >> >@@ -192,6 +201,23 @@ typedef struct VhostUserShared {
> >> >     unsigned char uuid[16];
> >> > } VhostUserShared;
> >> >
> >> >+/* For the flags field of VhostUserMMap */
> >> >+#define VHOST_USER_FLAG_MAP_RW (1u << 0)
> >> >+
> >> >+typedef struct {
> >> >+    /* VIRTIO Shared Memory Region ID */
> >> >+    uint8_t shmid;
> >> >+    uint8_t padding[7];
> >> >+    /* File offset */
> >> >+    uint64_t fd_offset;
> >> >+    /* Offset within the VIRTIO Shared Memory Region */
> >> >+    uint64_t shm_offset;
> >> >+    /* Size of the mapping */
> >> >+    uint64_t len;
> >> >+    /* Flags for the mmap operation, from VHOST_USER_FLAG_MAP_* */
> >> >+    uint64_t flags;
> >> >+} VhostUserMMap;
> >> >+
> >> > typedef struct {
> >> >     VhostUserRequest request;
> >> >
> >> >@@ -224,6 +250,8 @@ typedef union {
> >> >         VhostUserInflight inflight;
> >> >         VhostUserShared object;
> >> >         VhostUserTransferDeviceState transfer_state;
> >> >+        VhostUserMMap mmap;
> >> >+        VhostUserShMemConfig shmem;
> >> > } VhostUserPayload;
> >> >
> >> > typedef struct VhostUserMsg {
> >> >@@ -1768,6 +1796,196 @@ vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> >> >     return 0;
> >> > }
> >> >
> >> >+/**
> >> >+ * vhost_user_backend_handle_shmem_map() - Handle SHMEM_MAP backend request
> >> >+ * @dev: vhost device
> >> >+ * @ioc: QIOChannel for communication
> >> >+ * @hdr: vhost-user message header
> >> >+ * @payload: message payload containing mapping details
> >> >+ * @fd: file descriptor for the shared memory region
> >> >+ *
> >> >+ * Handles VHOST_USER_BACKEND_SHMEM_MAP requests from the backend. Creates
> >> >+ * a VhostUserShmemObject to manage the shared memory mapping and adds it
> >> >+ * to the appropriate VirtIO shared memory region. The VhostUserShmemObject
> >> >+ * serves as an intermediate parent for the MemoryRegion, ensuring proper
> >> >+ * lifecycle management with reference counting.
> >> >+ *
> >> >+ * Returns: 0 on success, negative errno on failure
> >> >+ */
> >> >+static int
> >> >+vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> >> >+                                    QIOChannel *ioc,
> >> >+                                    VhostUserHeader *hdr,
> >> >+                                    VhostUserPayload *payload,
> >> >+                                    int fd)
> >> >+{
> >> >+    VirtioSharedMemory *shmem;
> >> >+    VhostUserMMap *vu_mmap = &payload->mmap;
> >> >+    VirtioSharedMemoryMapping *existing;
> >> >+    Error *local_err = NULL;
> >> >+    int ret = 0;
> >> >+
> >> >+    if (fd < 0) {
> >> >+        error_report("Bad fd for map");
> >> >+        ret = -EBADF;
> >> >+        goto send_reply;
> >> >+    }
> >> >+
> >> >+    if (QSIMPLEQ_EMPTY(&dev->vdev->shmem_list)) {
> >> >+        error_report("Device has no VIRTIO Shared Memory Regions. "
> >> >+                     "Requested ID: %d", vu_mmap->shmid);
> >> >+        ret = -EFAULT;
> >> >+        goto send_reply;
> >> >+    }
> >> >+
> >> >+    shmem = virtio_find_shmem_region(dev->vdev, vu_mmap->shmid);
> >> >+    if (!shmem) {
> >> >+        error_report("VIRTIO Shared Memory Region at "
> >> >+                     "ID %d not found or uninitialized", vu_mmap->shmid);
> >> >+        ret = -EFAULT;
> >> >+        goto send_reply;
> >> >+    }
> >> >+
> >> >+    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> >> >+        (vu_mmap->shm_offset + vu_mmap->len) > shmem->mr.size) {
> >> >+        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> >> >+                     vu_mmap->shm_offset, vu_mmap->len);
> >> >+        ret = -EFAULT;
> >> >+        goto send_reply;
> >> >+    }
> >> >+
> >> >+    QTAILQ_FOREACH(existing, &shmem->mmaps, link) {
> >> >+        if (ranges_overlap(existing->offset, existing->len,
> >> >+                           vu_mmap->shm_offset, vu_mmap->len)) {
> >> >+            error_report("VIRTIO Shared Memory mapping overlap");
> >> >+            ret = -EFAULT;
> >> >+            goto send_reply;
> >> >+        }
> >> >+    }
> >> >+
> >> >+    memory_region_transaction_begin();
> >> >+
> >> >+    /* Create VirtioSharedMemoryMapping object */
> >> >+    VirtioSharedMemoryMapping *mapping = virtio_shared_memory_mapping_new(
> >> >+        vu_mmap->shmid, fd, vu_mmap->fd_offset, vu_mmap->shm_offset,
> >> >+        vu_mmap->len, vu_mmap->flags & VHOST_USER_FLAG_MAP_RW);
> >> >+
> >> >+    if (!mapping) {
> >> >+        ret = -EFAULT;
> >> >+        goto send_reply_commit;
> >> >+    }
> >> >+
> >> >+    /* Add the mapping to the shared memory region */
> >> >+    if (virtio_add_shmem_map(shmem, mapping) != 0) {
> >> >+        error_report("Failed to add shared memory mapping");
> >> >+        object_unref(OBJECT(mapping));
> >> >+        ret = -EFAULT;
> >> >+        goto send_reply_commit;
> >> >+    }
> >> >+
> >> >+send_reply_commit:
> >> >+    /* Send reply and commit after transaction started */
> >> >+    if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
> >> >+        payload->u64 = !!ret;
> >> >+        hdr->size = sizeof(payload->u64);
> >> >+        if (!vhost_user_send_resp(ioc, hdr, payload, &local_err)) {
> >> >+            error_report_err(local_err);
> >> >+            memory_region_transaction_commit();
> >> >+            return -EFAULT;
> >> >+        }
> >> >+    }
> >> >+    memory_region_transaction_commit();
> >>
> >> Sorry to be late, I did a quick review, my only doubts is here, maybe it
> >> was already discussed, but why do we commit after responding to the
> >> backend?
> >>
> >> Should we do it first to prevent the backend from “seeing” something
> >> that hasn't been committed yet?
> >
> >There is a race that leads to a deadlock. hw/virtio/vhost.c has a
> >MemoryListener that sends VHOST_USER_SET_MEM_TABLE messages in its
> >.commit() callback. If this happens before the reply, the backend will
> >not process it as it is stuck waiting for the SHMEM reply, and the
> >handler in qemu will not send it as it is waiting for the reply to the
> >SET_MEM_TABLE. So we have to delay the transaction commit to
> >immediately after the reply.
>
> Okay, I see now that you mentioned that in the commit description,
> great, I should have read it more carefully!
> IMO it would be worth adding a comment here, but I definitely won't ask
> you to send a v11 for this! (maybe a followup patch later).
>
> >
> >>
> >> Also, if vhost_user_send_resp() fails, should we call
> >> virtio_del_shmem_map()?
> >
> >If vhost_user_send_resp() fails, the connection with the backend is
> >closed, so the mappings will indeed never be removed unless we reset.
> >
> >Maybe better than removing the single mapping, would be to loop
> >through mappings in the shared memory and clean them all (same we do :
> >
> >```
> >        while (!QTAILQ_EMPTY(&shmem->mmaps)) {
> >            VirtioSharedMemoryMapping *mapping = QTAILQ_FIRST(&shmem->mmaps);
> >            virtio_del_shmem_map(shmem, mapping->offset, mapping->mr->size);
> >        }
> >```
> >
> >But since a backend may utilize more than one shared memory region,
> >and we do not know the mapping between a given backend and its shared
> >memories, whatever we do will be incomplete (?).
>
> I don't know if this is the right place to do this kind of cleanup,
> maybe further up in the stack.
>
>
> >I think the only
> >solution after this happens is to reset (virtio_reset) to remove all
> >mappings from the all shared regions, and re-establish the backend
> >channel (is it possible?). Even if the channel cannot be restablished,
> >I wouldn't bother just removing one mapping, I would assume it needs a
> >reset.
>
> So, in conclusion, we are saying that if we can no longer communicate
> with the backend, there is no point in maintaining a consistent state,
> because we have to reset the device anyway.

I guess what I'm saying after looking at the issue you raised (which
is reasonable and founded) is that the is no good option to ensure we
go back to a consistent state unless we reset.

> Are we already doing this, or should we be doing it?

I don't think we are? What I do not know is if we should. Probably yes.

I feel we should at least set the broken flag to true in case of an error:
dev->vdev->broken = true;

Looking at virtio/virtio.h: `bool broken; /* device in invalid state,
needs reset */`

I can send a separate patch.

>
> BTW, I don't want to stop this series, I just found this error path
> strange.

On the contrary, thanks for taking a look! It is better to clear any
potential issues before merging. Even if it needs a new version.

>
> Thanks,
> Stefano
>



  reply	other threads:[~2025-10-17 12:37 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16 14:38 [PATCH v10 0/7] vhost-user: Add SHMEM_MAP/UNMAP requests Albert Esteve
2025-10-16 14:38 ` [PATCH v10 1/7] vhost-user: Add VirtIO Shared Memory map request Albert Esteve
2025-10-16 15:18   ` Albert Esteve
2025-10-16 18:31     ` Stefan Hajnoczi
2025-10-17  6:58       ` Albert Esteve
2025-10-17  9:22   ` Stefano Garzarella
2025-10-17  9:29     ` Manos Pitsidianakis
2025-10-17 11:24     ` Albert Esteve
2025-10-17 12:12       ` Stefano Garzarella
2025-10-17 12:35         ` Albert Esteve [this message]
2025-10-20 13:50   ` David Hildenbrand
2025-10-20 14:06     ` Albert Esteve
2025-10-16 14:38 ` [PATCH v10 2/7] vhost_user.rst: Align VhostUserMsg excerpt members Albert Esteve
2025-10-16 14:38 ` [PATCH v10 3/7] vhost_user.rst: Add SHMEM_MAP/_UNMAP to spec Albert Esteve
2025-10-20 13:27   ` Stefan Hajnoczi
2025-10-20 13:49     ` Albert Esteve
2025-10-16 14:38 ` [PATCH v10 4/7] vhost_user: Add frontend get_shmem_config command Albert Esteve
2025-10-16 14:38 ` [PATCH v10 5/7] vhost_user.rst: Add GET_SHMEM_CONFIG message Albert Esteve
2025-10-16 14:38 ` [PATCH v10 6/7] qmp: add shmem feature map Albert Esteve
2025-10-16 14:38 ` [PATCH v10 7/7] vhost-user-device: Add shared memory BAR Albert Esteve
2025-10-16 15:05 ` [PATCH v10 0/7] vhost-user: Add SHMEM_MAP/UNMAP requests Stefan Hajnoczi
2025-10-17 12:42   ` Stefano Garzarella
2025-10-17 13:52   ` Michael S. Tsirkin
2025-10-20 13:51   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADSE00LLz+hc1DN_x4zHXUSJ30Rf1Gy_WWQmMgMRKE6yOkY8xw@mail.gmail.com \
    --to=aesteve@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=david@redhat.com \
    --cc=dbassey@redhat.com \
    --cc=farosas@suse.de \
    --cc=hi@alyssa.is \
    --cc=jasowang@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=manos.pitsidianakis@linaro.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=slp@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=stevensd@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).