[PATCH v2 0/2] Postcopy migration and vhost-user errors

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Postcopy migration and vhost-user errors
@ 2024-08-28 10:09 Prasad Pandit
  2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Prasad Pandit @ 2024-08-28 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Hello,

* virsh(1) offers multiple options to initiate Postcopy migration:

    1) virsh migrate --postcopy --postcopy-after-precopy
    2) virsh migrate --postcopy + virsh migrate-postcopy
    3) virsh migrate --postcopy --timeout <N> --timeout-postcopy

When Postcopy migration is invoked via method (2) or (3) above,
the migrated guest on the destination host hangs sometimes.

* During Postcopy migration, multiple threads are spawned on the destination
host to start the guest and setup devices. One such thread starts vhost
device via vhost_dev_start() function and another called fault_thread handles
page faults in user space using kernel's userfaultfd(2) system.

* When fault_thread exits upon completion of Postcopy migration, it sends a
'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
message is sent while vhost device is being setup via vhost_dev_start().

     Thread-1                                  Thread-2

vhost_dev_start                        postcopy_ram_incoming_cleanup
 vhost_device_iotlb_miss                postcopy_notify
  vhost_backend_update_device_iotlb      vhost_user_postcopy_notifier
   vhost_user_send_device_iotlb_msg       vhost_user_postcopy_end
    process_message_reply                  process_message_reply
     vhost_user_read                        vhost_user_read
      vhost_user_read_header                 vhost_user_read_header
       "Fail to update device iotlb"          "Failed to receive reply to postcopy_end"

This creates confusion when vhost device receives 'postcopy_end' message while
it is still trying to update IOTLB entries.

This seems to leave the guest in a stranded/hung state because fault_thread
has exited saying Postcopy migration has ended, but vhost-device is probably
still expecting updates. QEMU logs following errors on the destination host
===
...
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
===

* Couple of patches here help to fix/handle these errors.

Thank you.
---
Prasad Pandit (2):
  vhost: fail device start if iotlb update fails
  vhost-user: add a request-reply lock

 hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
 hw/virtio/vhost.c              |  6 ++-
 include/hw/virtio/vhost-user.h |  3 ++
 3 files changed, 82 insertions(+), 1 deletion(-)

--
2.46.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] vhost: fail device start if iotlb update fails
  2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit
@ 2024-08-28 10:09 ` Prasad Pandit
  2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit
  2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin
  2 siblings, 0 replies; 15+ messages in thread
From: Prasad Pandit @ 2024-08-28 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

While starting a vhost device, updating iotlb entries
via 'vhost_device_iotlb_miss' may return an error.

  qemu-kvm: vhost_device_iotlb_miss:
    700871,700871: Fail to update device iotlb

Fail device start when such an error occurs.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 hw/virtio/vhost.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

v2:
 - Needs review/ack

v1:
 - https://lore.kernel.org/qemu-devel/20240808095147.291626-2-ppandit@redhat.com/

v0:
 - https://lore.kernel.org/all/20240711131424.181615-3-ppandit@redhat.com/

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 06fc71746e..a70b7422b5 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2151,7 +2151,11 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
          * vhost-kernel code requires for this.*/
         for (i = 0; i < hdev->nvqs; ++i) {
             struct vhost_virtqueue *vq = hdev->vqs + i;
-            vhost_device_iotlb_miss(hdev, vq->used_phys, true);
+            r = vhost_device_iotlb_miss(hdev, vq->used_phys, true);
+            if (r) {
+                VHOST_OPS_DEBUG(r, "vhost_device_iotlb_miss failed");
+                goto fail_start;
+            }
         }
     }
     vhost_start_config_intr(hdev);
--
2.46.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit
  2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit
@ 2024-08-28 10:09 ` Prasad Pandit
  2024-08-28 11:15   ` Michael S. Tsirkin
  2024-08-29  7:42   ` Michael S. Tsirkin
  2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin
  2 siblings, 2 replies; 15+ messages in thread
From: Prasad Pandit @ 2024-08-28 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

QEMU threads use vhost_user_write/read calls to send
and receive request/reply messages from a vhost-user
device. When multiple threads communicate with the
same vhost-user device, they can receive each other's
messages, resulting in an erroneous state.

When fault_thread exits upon completion of Postcopy
migration, it sends a 'postcopy_end' message to the
vhost-user device. But sometimes 'postcopy_end' message
is sent while vhost device is being setup via
vhost_dev_start().

     Thread-1                           Thread-2

 vhost_dev_start                    postcopy_ram_incoming_cleanup
 vhost_device_iotlb_miss            postcopy_notify
 vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
 vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
 process_message_reply              process_message_reply
 vhost_user_read                    vhost_user_read
 vhost_user_read_header             vhost_user_read_header
 "Fail to update device iotlb"      "Failed to receive reply to postcopy_end"

This creates confusion when vhost-user device receives
'postcopy_end' message while it is trying to update IOTLB entries.

 vhost_user_read_header:
  700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
 vhost_device_iotlb_miss:
  700871,700871: Fail to update device iotlb
 vhost_user_postcopy_end:
  700871,700900: Failed to receive reply to postcopy_end
 vhost_user_read_header:
  700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.

Here fault thread seems to end the postcopy migration
while another thread is starting the vhost-user device.

Add a mutex lock to hold for one request-reply cycle
and avoid such race condition.

Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |  3 ++
 2 files changed, 77 insertions(+)

v2:
 - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
   the lock for longer fails some tests during rpmbuild(8).
 - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
   this patch, whereas Fedora SRPM does not build.
 - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
   on RHEL-9, but not on Fedora-40 machine.
 - koji builds successful with this patch
   https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
   https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369

v1: Use QEMU_LOCK_GUARD(), rename lock variable
 - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/

v0:
 - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
 - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 00561daa06..7b030ae2cd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -24,6 +24,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/uuid.h"
 #include "qemu/sockets.h"
+#include "qemu/lockable.h"
 #include "sysemu/runstate.h"
 #include "sysemu/cryptodev.h"
 #include "migration/postcopy-ram.h"
@@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
         .hdr.size = sizeof(msg.payload.log),
     };

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     /* Send only once with first queue pair */
     if (dev->vq_index != 0) {
         return 0;
@@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
                                bool reply_supported)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     struct vhost_memory_region *shadow_reg;
     int i, fd, shadow_reg_idx, ret;
     ram_addr_t offset;
@@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
             vhost_user_fill_msg_region(&region_buffer, shadow_reg, 0);
             msg->payload.mem_reg.region = region_buffer;

+            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
             ret = vhost_user_write(dev, msg, NULL, 0);
             if (ret < 0) {
                 return ret;
@@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
                             bool reply_supported, bool track_ramblocks)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     int i, fd, ret, reg_idx, reg_fd_idx;
     struct vhost_memory_region *reg;
     MemoryRegion *mr;
@@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev,
             vhost_user_fill_msg_region(&region_buffer, reg, offset);
             msg->payload.mem_reg.region = region_buffer;

+            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
             ret = vhost_user_write(dev, msg, &fd, 1);
             if (ret < 0) {
                 return ret;
@@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                              bool config_mem_slots)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
     size_t fd_num = 0;
     VhostUserMsg msg_reply;
@@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
             return ret;
         }

+        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
         ret = vhost_user_write(dev, &msg, fds, fd_num);
         if (ret < 0) {
             return ret;
@@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
     size_t fd_num = 0;
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
@@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
             return ret;
         }

+        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
         ret = vhost_user_write(dev, &msg, fds, fd_num);
         if (ret < 0) {
             return ret;
@@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
         return 0;
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
         }
     }

+/*  struct vhost_user *u = dev->opaque;
+ *  struct VhostUserState *us = u->user;
+ *  QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+ */
     ret = vhost_user_write(dev, msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .hdr.size = sizeof(msg.payload.state),
     };
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);

     VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
     if (n) {
@@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid,
     };
     memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid));

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev)
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, &sv[1], 1);
     if (ret) {
         goto out;
@@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
         .hdr.flags = VHOST_USER_VERSION,
     };

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg(errp, "Failed to send postcopy_advise to vhost");
@@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)

     trace_vhost_user_postcopy_listen();

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg(errp, "Failed to send postcopy_listen to vhost");
@@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)

     trace_vhost_user_postcopy_end_entry();

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg(errp, "Failed to send postcopy_end to vhost");
@@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
         .payload.iotlb = *imsg,
     };

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config,

     assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE);

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     msg.payload.config.offset = 0;
     msg.payload.config.size = config_len;
     ret = vhost_user_write(dev, &msg, NULL, 0);
@@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data,
     p = msg.payload.config.region;
     memcpy(p, data, size);

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev,
         }
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     msg.payload.session.op_code = backend_info->op_code;
     msg.payload.session.session_id = backend_info->session_id;
     ret = vhost_user_write(dev, &msg, NULL, 0);
@@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev,
         return 0;
     }

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
     user->memory_slots = 0;
     user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4,
                                            &vhost_user_state_destroy);
+    qemu_mutex_init(&user->vhost_user_request_reply_lock);
     return true;
 }

@@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user)
     user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true);
     memory_region_transaction_commit();
     user->chr = NULL;
+    qemu_mutex_destroy(&user->vhost_user_request_reply_lock);
 }


@@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
         return -ENOTSUP;
     }

+    struct VhostUserState *us = vu->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, &fd, 1);
     close(fd);
     if (ret < 0) {
@@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
         return -ENOTSUP;
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg_errno(errp, -ret,
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 324cd8663a..e96f12d449 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -67,6 +67,9 @@ typedef struct VhostUserState {
     GPtrArray *notifiers;
     int memory_slots;
     bool supports_config;
+
+    /* Hold lock for a request-reply cycle */
+    QemuMutex vhost_user_request_reply_lock;
 } VhostUserState;

 /**
--
2.46.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit
@ 2024-08-28 11:15   ` Michael S. Tsirkin
  2024-08-29  5:39     ` Prasad Pandit
  2024-08-29  7:42   ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-08-28 11:15 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().
> 
>      Thread-1                           Thread-2
> 
>  vhost_dev_start                    postcopy_ram_incoming_cleanup
>  vhost_device_iotlb_miss            postcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply              process_message_reply
>  vhost_user_read                    vhost_user_read
>  vhost_user_read_header             vhost_user_read_header
>  "Fail to update device iotlb"      "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> ---
>  hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>    the lock for longer fails some tests during rpmbuild(8).

what do you mean fails rpmbuild? that qemu with this
patch can not be compiled?

>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>    this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>    on RHEL-9, but not on Fedora-40 machine.
>  - koji builds successful with this patch
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369
> 
> v1: Use QEMU_LOCK_GUARD(), rename lock variable
>  - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/
> 
> v0:
>  - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
>  - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>          .hdr.size = sizeof(msg.payload.log),
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      /* Send only once with first queue pair */
>      if (dev->vq_index != 0) {
>          return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
>                                 bool reply_supported)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      struct vhost_memory_region *shadow_reg;
>      int i, fd, shadow_reg_idx, ret;
>      ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, shadow_reg, 0);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, NULL, 0);
>              if (ret < 0) {
>                  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>                              bool reply_supported, bool track_ramblocks)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int i, fd, ret, reg_idx, reg_fd_idx;
>      struct vhost_memory_region *reg;
>      MemoryRegion *mr;
> @@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, reg, offset);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, &fd, 1);
>              if (ret < 0) {
>                  return ret;
> @@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>                                               bool config_mem_slots)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
> @@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>                                      struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
> @@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
>          return 0;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
>          }
>      }
> 
> +/*  struct vhost_user *u = dev->opaque;
> + *  struct VhostUserState *us = u->user;
> + *  QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> + */
>      ret = vhost_user_write(dev, msg, NULL, 0);
>      if (ret < 0) {
>          return ret;


What is this comment saying?

> @@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> 
>      VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
>      if (n) {
> @@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid,
>      };
>      memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid));
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &sv[1], 1);
>      if (ret) {
>          goto out;
> @@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>          .hdr.flags = VHOST_USER_VERSION,
>      };
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_advise to vhost");
> @@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_listen();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_listen to vhost");
> @@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_end_entry();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_end to vhost");
> @@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
>          .payload.iotlb = *imsg,
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config,
> 
>      assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.config.offset = 0;
>      msg.payload.config.size = config_len;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data,
>      p = msg.payload.config.region;
>      memcpy(p, data, size);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev,
>          }
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.session.op_code = backend_info->op_code;
>      msg.payload.session.session_id = backend_info->session_id;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev,
>          return 0;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
>      user->memory_slots = 0;
>      user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4,
>                                             &vhost_user_state_destroy);
> +    qemu_mutex_init(&user->vhost_user_request_reply_lock);
>      return true;
>  }
> 
> @@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user)
>      user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true);
>      memory_region_transaction_commit();
>      user->chr = NULL;
> +    qemu_mutex_destroy(&user->vhost_user_request_reply_lock);
>  }
> 
> 
> @@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
>          return -ENOTSUP;
>      }
> 
> +    struct VhostUserState *us = vu->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &fd, 1);
>      close(fd);
>      if (ret < 0) {
> @@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
>          return -ENOTSUP;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg_errno(errp, -ret,
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 324cd8663a..e96f12d449 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -67,6 +67,9 @@ typedef struct VhostUserState {
>      GPtrArray *notifiers;
>      int memory_slots;
>      bool supports_config;
> +
> +    /* Hold lock for a request-reply cycle */
> +    QemuMutex vhost_user_request_reply_lock;
>  } VhostUserState;
> 
>  /**
> --
> 2.46.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-28 11:15   ` Michael S. Tsirkin
@ 2024-08-29  5:39     ` Prasad Pandit
  2024-08-29  6:23       ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Prasad Pandit @ 2024-08-29  5:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

On Wed, 28 Aug 2024 at 16:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
> >    the lock for longer fails some tests during rpmbuild(8).
>
> what do you mean fails rpmbuild? that qemu with this patch can not be compiled?

* In V1 of this patch, QEMU_LOCK_GUARD was placed near beginning of
the function. But that caused some unit tests to fail reporting
TIMEOUT errors. In this V2, QEMU_LOCK_GUARD is placed near
vhost_user_write() calls, to reduce the time that lock is held.

* Both (V1 & V2) compile well, but fail at '%check' stage while
running unit tests (on some machines), ie. rpm package is not built.
rpmbuild(8) on F40 machine failed, but koji scratch build with the
same SRPM worked fine. Those scratch builds are shared above. RHEL-9
SRPM built well on RHEL-9 host, but failed to build on F40 machine
reporting failure at '%check' stage of rpmbuild(8).

Thank you.
---
  - Prasad

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-29  5:39     ` Prasad Pandit
@ 2024-08-29  6:23       ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-08-29  6:23 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

On Thu, Aug 29, 2024 at 11:09:44AM +0530, Prasad Pandit wrote:
> On Wed, 28 Aug 2024 at 16:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
> > >    the lock for longer fails some tests during rpmbuild(8).
> >
> > what do you mean fails rpmbuild? that qemu with this patch can not be compiled?
> 
> * In V1 of this patch, QEMU_LOCK_GUARD was placed near beginning of
> the function. But that caused some unit tests to fail reporting
> TIMEOUT errors. In this V2, QEMU_LOCK_GUARD is placed near
> vhost_user_write() calls, to reduce the time that lock is held.
> 
> * Both (V1 & V2) compile well, but fail at '%check' stage while
> running unit tests (on some machines), ie. rpm package is not built.
> rpmbuild(8) on F40 machine failed, but koji scratch build with the
> same SRPM worked fine. Those scratch builds are shared above. RHEL-9
> SRPM built well on RHEL-9 host, but failed to build on F40 machine
> reporting failure at '%check' stage of rpmbuild(8).
> 
> Thank you.
> ---
>   - Prasad

Weird.  Seems to indicate some kind of deadlock?

-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit
  2024-08-28 11:15   ` Michael S. Tsirkin
@ 2024-08-29  7:42   ` Michael S. Tsirkin
  2024-08-29  9:15     ` Prasad Pandit
  1 sibling, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-08-29  7:42 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit,
	Dr. David Alan Gilbert, Marc-André Lureau

On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().

So maybe vhost_user_postcopy_end should take the BQL?

>      Thread-1                           Thread-2
> 
>  vhost_dev_start                    postcopy_ram_incoming_cleanup
>  vhost_device_iotlb_miss            postcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply              process_message_reply
>  vhost_user_read                    vhost_user_read
>  vhost_user_read_header             vhost_user_read_header
>  "Fail to update device iotlb"      "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>


CC Author and reviewer of the offending commit.


> ---
>  hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>    the lock for longer fails some tests during rpmbuild(8).
>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>    this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>    on RHEL-9, but not on Fedora-40 machine.
>  - koji builds successful with this patch
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369
> 
> v1: Use QEMU_LOCK_GUARD(), rename lock variable
>  - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/
> 
> v0:
>  - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
>  - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>          .hdr.size = sizeof(msg.payload.log),
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      /* Send only once with first queue pair */
>      if (dev->vq_index != 0) {
>          return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
>                                 bool reply_supported)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      struct vhost_memory_region *shadow_reg;
>      int i, fd, shadow_reg_idx, ret;
>      ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, shadow_reg, 0);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, NULL, 0);
>              if (ret < 0) {
>                  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>                              bool reply_supported, bool track_ramblocks)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int i, fd, ret, reg_idx, reg_fd_idx;
>      struct vhost_memory_region *reg;
>      MemoryRegion *mr;
> @@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, reg, offset);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, &fd, 1);
>              if (ret < 0) {
>                  return ret;
> @@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>                                               bool config_mem_slots)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
> @@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>                                      struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
> @@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
>          return 0;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
>          }
>      }
> 
> +/*  struct vhost_user *u = dev->opaque;
> + *  struct VhostUserState *us = u->user;
> + *  QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> + */
>      ret = vhost_user_write(dev, msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> 
>      VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
>      if (n) {
> @@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid,
>      };
>      memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid));
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &sv[1], 1);
>      if (ret) {
>          goto out;
> @@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>          .hdr.flags = VHOST_USER_VERSION,
>      };
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_advise to vhost");
> @@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_listen();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_listen to vhost");
> @@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_end_entry();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_end to vhost");
> @@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
>          .payload.iotlb = *imsg,
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config,
> 
>      assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.config.offset = 0;
>      msg.payload.config.size = config_len;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data,
>      p = msg.payload.config.region;
>      memcpy(p, data, size);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev,
>          }
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.session.op_code = backend_info->op_code;
>      msg.payload.session.session_id = backend_info->session_id;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev,
>          return 0;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
>      user->memory_slots = 0;
>      user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4,
>                                             &vhost_user_state_destroy);
> +    qemu_mutex_init(&user->vhost_user_request_reply_lock);
>      return true;
>  }
> 
> @@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user)
>      user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true);
>      memory_region_transaction_commit();
>      user->chr = NULL;
> +    qemu_mutex_destroy(&user->vhost_user_request_reply_lock);
>  }
> 
> 
> @@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
>          return -ENOTSUP;
>      }
> 
> +    struct VhostUserState *us = vu->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &fd, 1);
>      close(fd);
>      if (ret < 0) {
> @@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
>          return -ENOTSUP;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg_errno(errp, -ret,
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 324cd8663a..e96f12d449 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -67,6 +67,9 @@ typedef struct VhostUserState {
>      GPtrArray *notifiers;
>      int memory_slots;
>      bool supports_config;
> +
> +    /* Hold lock for a request-reply cycle */
> +    QemuMutex vhost_user_request_reply_lock;
>  } VhostUserState;
> 
>  /**
> --
> 2.46.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-29  7:42   ` Michael S. Tsirkin
@ 2024-08-29  9:15     ` Prasad Pandit
  2024-08-29  9:21       ` Michael S. Tsirkin
  2024-08-29 14:29       ` Peter Xu
  0 siblings, 2 replies; 15+ messages in thread
From: Prasad Pandit @ 2024-08-29  9:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit,
	Dr. David Alan Gilbert, Marc-André Lureau

Hello Michael,

On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> Weird.  Seems to indicate some kind of deadlock?

* Such a deadlock should occur across all environments I guess, not
sure why it happens selectively. It is strange.

> So maybe vhost_user_postcopy_end should take the BQL?
===
diff --git a/migration/savevm.c b/migration/savevm.c
index e7c1215671..31acda3818 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
          */
         qemu_event_wait(&mis->main_thread_load_event);
     }
+    bql_lock();
     postcopy_ram_incoming_cleanup(mis);
+    bql_unlock();

     if (load_res < 0) {
         /*
===

* Actually a BQL patch above was tested and it worked fine. But not
sure if it is an acceptable solution. Another contention was taking
BQL could make things more complicated, so a local vhost-user specific
lock should be better.

...wdyt?
---
  - Prasad



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-29  9:15     ` Prasad Pandit
@ 2024-08-29  9:21       ` Michael S. Tsirkin
  2024-08-29 14:29       ` Peter Xu
  1 sibling, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-08-29  9:21 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit,
	Dr. David Alan Gilbert, Marc-André Lureau

On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> Hello Michael,
> 
> On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Weird.  Seems to indicate some kind of deadlock?
> 
> * Such a deadlock should occur across all environments I guess, not
> sure why it happens selectively. It is strange.

Some kind of race?

> > So maybe vhost_user_postcopy_end should take the BQL?
> ===
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e7c1215671..31acda3818 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>           */
>          qemu_event_wait(&mis->main_thread_load_event);
>      }
> +    bql_lock();
>      postcopy_ram_incoming_cleanup(mis);
> +    bql_unlock();
> 
>      if (load_res < 0) {
>          /*
> ===
> 
> * Actually a BQL patch above was tested and it worked fine. But not
> sure if it is an acceptable solution. Another contention was taking
> BQL could make things more complicated, so a local vhost-user specific
> lock should be better.
> 
> ...wdyt?
> ---
>   - Prasad

Keep it simple, is my advice. Not causing regressions is good.

-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-29  9:15     ` Prasad Pandit
  2024-08-29  9:21       ` Michael S. Tsirkin
@ 2024-08-29 14:29       ` Peter Xu
  2024-08-29 15:05         ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Xu @ 2024-08-29 14:29 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: Michael S. Tsirkin, qemu-devel, farosas, jasowang, mcoqueli,
	Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau

On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> Hello Michael,
> 
> On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Weird.  Seems to indicate some kind of deadlock?
> 
> * Such a deadlock should occur across all environments I guess, not
> sure why it happens selectively. It is strange.
> 
> > So maybe vhost_user_postcopy_end should take the BQL?
> ===
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e7c1215671..31acda3818 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>           */
>          qemu_event_wait(&mis->main_thread_load_event);
>      }
> +    bql_lock();
>      postcopy_ram_incoming_cleanup(mis);
> +    bql_unlock();
> 
>      if (load_res < 0) {
>          /*
> ===
> 
> * Actually a BQL patch above was tested and it worked fine. But not
> sure if it is an acceptable solution. Another contention was taking
> BQL could make things more complicated, so a local vhost-user specific
> lock should be better.
> 
> ...wdyt?

I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not
in postcopy code directly.  I'm recently looking at how to make precopy
load even take less bql and even make it a separate thread. Above is
definitely going backwards, per we discussed already internally.

I cherish postcopy doesn't need to take bql on its own in most paths, and
we shouldn't add unnecessary bql requirement even if vhost-user isn't used.

Personally I still prefer we look into why a separate mutex won't work and
why that timed out; that could be part of whoever is going to investigate
the whole issue (including the hang later on). Otherwise I'm ok from
migration pov that we take bql in the vhost-user hook, but not in savevm.c.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-29 14:29       ` Peter Xu
@ 2024-08-29 15:05         ` Michael S. Tsirkin
  2024-08-29 20:29           ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-08-29 15:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: Prasad Pandit, qemu-devel, farosas, jasowang, mcoqueli,
	Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau

On Thu, Aug 29, 2024 at 10:29:24AM -0400, Peter Xu wrote:
> On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> > Hello Michael,
> > 
> > On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > Weird.  Seems to indicate some kind of deadlock?
> > 
> > * Such a deadlock should occur across all environments I guess, not
> > sure why it happens selectively. It is strange.
> > 
> > > So maybe vhost_user_postcopy_end should take the BQL?
> > ===
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index e7c1215671..31acda3818 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
> >           */
> >          qemu_event_wait(&mis->main_thread_load_event);
> >      }
> > +    bql_lock();
> >      postcopy_ram_incoming_cleanup(mis);
> > +    bql_unlock();
> > 
> >      if (load_res < 0) {
> >          /*
> > ===
> > 
> > * Actually a BQL patch above was tested and it worked fine. But not
> > sure if it is an acceptable solution. Another contention was taking
> > BQL could make things more complicated, so a local vhost-user specific
> > lock should be better.
> > 
> > ...wdyt?
> 
> I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not
> in postcopy code directly.

maybe that's better, ok.

>  I'm recently looking at how to make precopy
> load even take less bql and even make it a separate thread. Above is
> definitely going backwards, per we discussed already internally.


At the same time a small bugfix is better, can be backported.


> I cherish postcopy doesn't need to take bql on its own in most paths, and
> we shouldn't add unnecessary bql requirement even if vhost-user isn't used.
> 
> Personally I still prefer we look into why a separate mutex won't work and
> why that timed out; that could be part of whoever is going to investigate
> the whole issue (including the hang later on). Otherwise I'm ok from
> migration pov that we take bql in the vhost-user hook, but not in savevm.c.
> 
> Thanks,

ok

> -- 
> Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
  2024-08-29 15:05         ` Michael S. Tsirkin
@ 2024-08-29 20:29           ` Peter Xu
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2024-08-29 20:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Prasad Pandit, qemu-devel, farosas, jasowang, mcoqueli,
	Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau

On Thu, Aug 29, 2024 at 11:05:15AM -0400, Michael S. Tsirkin wrote:
> > Personally I still prefer we look into why a separate mutex won't work and
> > why that timed out; that could be part of whoever is going to investigate
> > the whole issue (including the hang later on). Otherwise I'm ok from
> > migration pov that we take bql in the vhost-user hook, but not in savevm.c.
> 
> ok

Just something as a heads-up comment in case someone might keep looking
into the hang problem: I'm not sure whether the brew build failure on the
test case is relevant to the hang issue we observed, or even it is the hang
issue itself - if the failure is about a timeout that one qemu hanged.

IOW, whoever cannot reproduce the hang might leverage the mutex patch to
reproduce, if we want to figure out the last missing piece of the puzzle..

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors
  2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit
  2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit
  2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit
@ 2024-09-10 17:10 ` Michael S. Tsirkin
  2024-09-11  7:14   ` Prasad Pandit
  2 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-09-10 17:10 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

On Wed, Aug 28, 2024 at 03:39:12PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> Hello,
> 
> * virsh(1) offers multiple options to initiate Postcopy migration:
> 
>     1) virsh migrate --postcopy --postcopy-after-precopy
>     2) virsh migrate --postcopy + virsh migrate-postcopy
>     3) virsh migrate --postcopy --timeout <N> --timeout-postcopy
> 
> When Postcopy migration is invoked via method (2) or (3) above,
> the migrated guest on the destination host hangs sometimes.
> 
> * During Postcopy migration, multiple threads are spawned on the destination
> host to start the guest and setup devices. One such thread starts vhost
> device via vhost_dev_start() function and another called fault_thread handles
> page faults in user space using kernel's userfaultfd(2) system.
> 
> * When fault_thread exits upon completion of Postcopy migration, it sends a
> 'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
> message is sent while vhost device is being setup via vhost_dev_start().
> 
>      Thread-1                                  Thread-2
> 
> vhost_dev_start                        postcopy_ram_incoming_cleanup
>  vhost_device_iotlb_miss                postcopy_notify
>   vhost_backend_update_device_iotlb      vhost_user_postcopy_notifier
>    vhost_user_send_device_iotlb_msg       vhost_user_postcopy_end
>     process_message_reply                  process_message_reply
>      vhost_user_read                        vhost_user_read
>       vhost_user_read_header                 vhost_user_read_header
>        "Fail to update device iotlb"          "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost device receives 'postcopy_end' message while
> it is still trying to update IOTLB entries.
> 
> This seems to leave the guest in a stranded/hung state because fault_thread
> has exited saying Postcopy migration has ended, but vhost-device is probably
> still expecting updates. QEMU logs following errors on the destination host
> ===
> ...
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> ===


So are we going to see a version with BQL?

> * Couple of patches here help to fix/handle these errors.
> 
> Thank you.
> ---
> Prasad Pandit (2):
>   vhost: fail device start if iotlb update fails
>   vhost-user: add a request-reply lock
> 
>  hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
>  hw/virtio/vhost.c              |  6 ++-
>  include/hw/virtio/vhost-user.h |  3 ++
>  3 files changed, 82 insertions(+), 1 deletion(-)
> 
> --
> 2.46.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors
  2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin
@ 2024-09-11  7:14   ` Prasad Pandit
  2024-09-11  9:46     ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Prasad Pandit @ 2024-09-11  7:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

Hello Michael,

On Tue, 10 Sept 2024 at 22:40, Michael S. Tsirkin <mst@redhat.com> wrote:
> So are we going to see a version with BQL?

===
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index cdf9af4a4b..96ac0ed93b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2079,6 +2079,7 @@ static int vhost_user_postcopy_end(struct
vhost_dev *dev, Error **errp)

     trace_vhost_user_postcopy_end_entry();

+    BQL_LOCK_GUARD();
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg(errp, "Failed to send postcopy_end to vhost");
-- 
2.43.5
===

* We ran the test with the above BQL patch, but it did not help to fix
race errors. I'm continuing to debug it, will update here soon.

Thank you.
---
  - Prasad



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors
  2024-09-11  7:14   ` Prasad Pandit
@ 2024-09-11  9:46     ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2024-09-11  9:46 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit

On Wed, Sep 11, 2024 at 12:44:59PM +0530, Prasad Pandit wrote:
> Hello Michael,
> 
> On Tue, 10 Sept 2024 at 22:40, Michael S. Tsirkin <mst@redhat.com> wrote:
> > So are we going to see a version with BQL?
> 
> ===
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..96ac0ed93b 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -2079,6 +2079,7 @@ static int vhost_user_postcopy_end(struct
> vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_end_entry();
> 
> +    BQL_LOCK_GUARD();
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_end to vhost");
> -- 
> 2.43.5
> ===
> 
> * We ran the test with the above BQL patch, but it did not help to fix
> race errors. I'm continuing to debug it, will update here soon.
> 
> Thank you.
> ---
>   - Prasad


Thanks! I have a suspicion there's a path where we wait for the
migration thread until BQL.

-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-09-11  9:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit
2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit
2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit
2024-08-28 11:15   ` Michael S. Tsirkin
2024-08-29  5:39     ` Prasad Pandit
2024-08-29  6:23       ` Michael S. Tsirkin
2024-08-29  7:42   ` Michael S. Tsirkin
2024-08-29  9:15     ` Prasad Pandit
2024-08-29  9:21       ` Michael S. Tsirkin
2024-08-29 14:29       ` Peter Xu
2024-08-29 15:05         ` Michael S. Tsirkin
2024-08-29 20:29           ` Peter Xu
2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin
2024-09-11  7:14   ` Prasad Pandit
2024-09-11  9:46     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).