* [PATCH v2 0/2] Postcopy migration and vhost-user errors @ 2024-08-28 10:09 Prasad Pandit 2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Prasad Pandit @ 2024-08-28 10:09 UTC (permalink / raw) To: qemu-devel; +Cc: mst, farosas, jasowang, mcoqueli, peterx, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Hello, * virsh(1) offers multiple options to initiate Postcopy migration: 1) virsh migrate --postcopy --postcopy-after-precopy 2) virsh migrate --postcopy + virsh migrate-postcopy 3) virsh migrate --postcopy --timeout <N> --timeout-postcopy When Postcopy migration is invoked via method (2) or (3) above, the migrated guest on the destination host hangs sometimes. * During Postcopy migration, multiple threads are spawned on the destination host to start the guest and setup devices. One such thread starts vhost device via vhost_dev_start() function and another called fault_thread handles page faults in user space using kernel's userfaultfd(2) system. * When fault_thread exits upon completion of Postcopy migration, it sends a 'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end' message is sent while vhost device is being setup via vhost_dev_start(). Thread-1 Thread-2 vhost_dev_start postcopy_ram_incoming_cleanup vhost_device_iotlb_miss postcopy_notify vhost_backend_update_device_iotlb vhost_user_postcopy_notifier vhost_user_send_device_iotlb_msg vhost_user_postcopy_end process_message_reply process_message_reply vhost_user_read vhost_user_read vhost_user_read_header vhost_user_read_header "Fail to update device iotlb" "Failed to receive reply to postcopy_end" This creates confusion when vhost device receives 'postcopy_end' message while it is still trying to update IOTLB entries. This seems to leave the guest in a stranded/hung state because fault_thread has exited saying Postcopy migration has ended, but vhost-device is probably still expecting updates. QEMU logs following errors on the destination host === ... qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5. qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5. qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb === * Couple of patches here help to fix/handle these errors. Thank you. --- Prasad Pandit (2): vhost: fail device start if iotlb update fails vhost-user: add a request-reply lock hw/virtio/vhost-user.c | 74 ++++++++++++++++++++++++++++++++++ hw/virtio/vhost.c | 6 ++- include/hw/virtio/vhost-user.h | 3 ++ 3 files changed, 82 insertions(+), 1 deletion(-) -- 2.46.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/2] vhost: fail device start if iotlb update fails 2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit @ 2024-08-28 10:09 ` Prasad Pandit 2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit 2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin 2 siblings, 0 replies; 15+ messages in thread From: Prasad Pandit @ 2024-08-28 10:09 UTC (permalink / raw) To: qemu-devel; +Cc: mst, farosas, jasowang, mcoqueli, peterx, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> While starting a vhost device, updating iotlb entries via 'vhost_device_iotlb_miss' may return an error. qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb Fail device start when such an error occurs. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- hw/virtio/vhost.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) v2: - Needs review/ack v1: - https://lore.kernel.org/qemu-devel/20240808095147.291626-2-ppandit@redhat.com/ v0: - https://lore.kernel.org/all/20240711131424.181615-3-ppandit@redhat.com/ diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 06fc71746e..a70b7422b5 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -2151,7 +2151,11 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) * vhost-kernel code requires for this.*/ for (i = 0; i < hdev->nvqs; ++i) { struct vhost_virtqueue *vq = hdev->vqs + i; - vhost_device_iotlb_miss(hdev, vq->used_phys, true); + r = vhost_device_iotlb_miss(hdev, vq->used_phys, true); + if (r) { + VHOST_OPS_DEBUG(r, "vhost_device_iotlb_miss failed"); + goto fail_start; + } } } vhost_start_config_intr(hdev); -- 2.46.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit 2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit @ 2024-08-28 10:09 ` Prasad Pandit 2024-08-28 11:15 ` Michael S. Tsirkin 2024-08-29 7:42 ` Michael S. Tsirkin 2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin 2 siblings, 2 replies; 15+ messages in thread From: Prasad Pandit @ 2024-08-28 10:09 UTC (permalink / raw) To: qemu-devel; +Cc: mst, farosas, jasowang, mcoqueli, peterx, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> QEMU threads use vhost_user_write/read calls to send and receive request/reply messages from a vhost-user device. When multiple threads communicate with the same vhost-user device, they can receive each other's messages, resulting in an erroneous state. When fault_thread exits upon completion of Postcopy migration, it sends a 'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end' message is sent while vhost device is being setup via vhost_dev_start(). Thread-1 Thread-2 vhost_dev_start postcopy_ram_incoming_cleanup vhost_device_iotlb_miss postcopy_notify vhost_backend_update_device_iotlb vhost_user_postcopy_notifier vhost_user_send_device_iotlb_msg vhost_user_postcopy_end process_message_reply process_message_reply vhost_user_read vhost_user_read vhost_user_read_header vhost_user_read_header "Fail to update device iotlb" "Failed to receive reply to postcopy_end" This creates confusion when vhost-user device receives 'postcopy_end' message while it is trying to update IOTLB entries. vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. Here fault thread seems to end the postcopy migration while another thread is starting the vhost-user device. Add a mutex lock to hold for one request-reply cycle and avoid such race condition. Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify") Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- hw/virtio/vhost-user.c | 74 ++++++++++++++++++++++++++++++++++ include/hw/virtio/vhost-user.h | 3 ++ 2 files changed, 77 insertions(+) v2: - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding the lock for longer fails some tests during rpmbuild(8). - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with this patch, whereas Fedora SRPM does not build. - The host OS also seems to affect rpmbuild(8). Some SRPMs build well on RHEL-9, but not on Fedora-40 machine. - koji builds successful with this patch https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011 https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369 v1: Use QEMU_LOCK_GUARD(), rename lock variable - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/ v0: - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/ - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/ diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 00561daa06..7b030ae2cd 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -24,6 +24,7 @@ #include "qemu/main-loop.h" #include "qemu/uuid.h" #include "qemu/sockets.h" +#include "qemu/lockable.h" #include "sysemu/runstate.h" #include "sysemu/cryptodev.h" #include "migration/postcopy-ram.h" @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, .hdr.size = sizeof(msg.payload.log), }; + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + /* Send only once with first queue pair */ if (dev->vq_index != 0) { return 0; @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev, bool reply_supported) { struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; struct vhost_memory_region *shadow_reg; int i, fd, shadow_reg_idx, ret; ram_addr_t offset; @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev, vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0); msg->payload.mem_reg.region = region_buffer; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, msg, NULL, 0); if (ret < 0) { return ret; @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev, bool reply_supported, bool track_ramblocks) { struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; int i, fd, ret, reg_idx, reg_fd_idx; struct vhost_memory_region *reg; MemoryRegion *mr; @@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev, vhost_user_fill_msg_region(®ion_buffer, reg, offset); msg->payload.mem_reg.region = region_buffer; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, msg, &fd, 1); if (ret < 0) { return ret; @@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, bool config_mem_slots) { struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; int fds[VHOST_MEMORY_BASELINE_NREGIONS]; size_t fd_num = 0; VhostUserMsg msg_reply; @@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, return ret; } + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, fds, fd_num); if (ret < 0) { return ret; @@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, struct vhost_memory *mem) { struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; int fds[VHOST_MEMORY_BASELINE_NREGIONS]; size_t fd_num = 0; bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; @@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, return ret; } + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, fds, fd_num); if (ret < 0) { return ret; @@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64) return 0; } + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { return ret; @@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg, } } +/* struct vhost_user *u = dev->opaque; + * struct VhostUserState *us = u->user; + * QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + */ ret = vhost_user_write(dev, msg, NULL, 0); if (ret < 0) { return ret; @@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev, .hdr.size = sizeof(msg.payload.state), }; struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index); if (n) { @@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid, }; memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid)); + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { return ret; @@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev) msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; } + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, &sv[1], 1); if (ret) { goto out; @@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp) .hdr.flags = VHOST_USER_VERSION, }; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { error_setg(errp, "Failed to send postcopy_advise to vhost"); @@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp) trace_vhost_user_postcopy_listen(); + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { error_setg(errp, "Failed to send postcopy_listen to vhost"); @@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp) trace_vhost_user_postcopy_end_entry(); + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { error_setg(errp, "Failed to send postcopy_end to vhost"); @@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu) msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; } + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { return ret; @@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev, .payload.iotlb = *imsg, }; + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { return ret; @@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config, assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE); + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + msg.payload.config.offset = 0; msg.payload.config.size = config_len; ret = vhost_user_write(dev, &msg, NULL, 0); @@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data, p = msg.payload.config.region; memcpy(p, data, size); + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { return ret; @@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev, } } + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + msg.payload.session.op_code = backend_info->op_code; msg.payload.session.session_id = backend_info->session_id; ret = vhost_user_write(dev, &msg, NULL, 0); @@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev, return 0; } + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { return ret; @@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp) user->memory_slots = 0; user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4, &vhost_user_state_destroy); + qemu_mutex_init(&user->vhost_user_request_reply_lock); return true; } @@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user) user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true); memory_region_transaction_commit(); user->chr = NULL; + qemu_mutex_destroy(&user->vhost_user_request_reply_lock); } @@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev, return -ENOTSUP; } + struct VhostUserState *us = vu->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, &fd, 1); close(fd); if (ret < 0) { @@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp) return -ENOTSUP; } + struct vhost_user *u = dev->opaque; + struct VhostUserState *us = u->user; + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); + ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { error_setg_errno(errp, -ret, diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h index 324cd8663a..e96f12d449 100644 --- a/include/hw/virtio/vhost-user.h +++ b/include/hw/virtio/vhost-user.h @@ -67,6 +67,9 @@ typedef struct VhostUserState { GPtrArray *notifiers; int memory_slots; bool supports_config; + + /* Hold lock for a request-reply cycle */ + QemuMutex vhost_user_request_reply_lock; } VhostUserState; /** -- 2.46.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit @ 2024-08-28 11:15 ` Michael S. Tsirkin 2024-08-29 5:39 ` Prasad Pandit 2024-08-29 7:42 ` Michael S. Tsirkin 1 sibling, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2024-08-28 11:15 UTC (permalink / raw) To: Prasad Pandit Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote: > From: Prasad Pandit <pjp@fedoraproject.org> > > QEMU threads use vhost_user_write/read calls to send > and receive request/reply messages from a vhost-user > device. When multiple threads communicate with the > same vhost-user device, they can receive each other's > messages, resulting in an erroneous state. > > When fault_thread exits upon completion of Postcopy > migration, it sends a 'postcopy_end' message to the > vhost-user device. But sometimes 'postcopy_end' message > is sent while vhost device is being setup via > vhost_dev_start(). > > Thread-1 Thread-2 > > vhost_dev_start postcopy_ram_incoming_cleanup > vhost_device_iotlb_miss postcopy_notify > vhost_backend_update_device_iotlb vhost_user_postcopy_notifier > vhost_user_send_device_iotlb_msg vhost_user_postcopy_end > process_message_reply process_message_reply > vhost_user_read vhost_user_read > vhost_user_read_header vhost_user_read_header > "Fail to update device iotlb" "Failed to receive reply to postcopy_end" > > This creates confusion when vhost-user device receives > 'postcopy_end' message while it is trying to update IOTLB entries. > > vhost_user_read_header: > 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. > vhost_device_iotlb_miss: > 700871,700871: Fail to update device iotlb > vhost_user_postcopy_end: > 700871,700900: Failed to receive reply to postcopy_end > vhost_user_read_header: > 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. > > Here fault thread seems to end the postcopy migration > while another thread is starting the vhost-user device. > > Add a mutex lock to hold for one request-reply cycle > and avoid such race condition. > > Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify") > Suggested-by: Peter Xu <peterx@redhat.com> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> > --- > hw/virtio/vhost-user.c | 74 ++++++++++++++++++++++++++++++++++ > include/hw/virtio/vhost-user.h | 3 ++ > 2 files changed, 77 insertions(+) > > v2: > - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding > the lock for longer fails some tests during rpmbuild(8). what do you mean fails rpmbuild? that qemu with this patch can not be compiled? > - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with > this patch, whereas Fedora SRPM does not build. > - The host OS also seems to affect rpmbuild(8). Some SRPMs build well > on RHEL-9, but not on Fedora-40 machine. > - koji builds successful with this patch > https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011 > https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369 > > v1: Use QEMU_LOCK_GUARD(), rename lock variable > - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/ > > v0: > - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/ > - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/ > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 00561daa06..7b030ae2cd 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -24,6 +24,7 @@ > #include "qemu/main-loop.h" > #include "qemu/uuid.h" > #include "qemu/sockets.h" > +#include "qemu/lockable.h" > #include "sysemu/runstate.h" > #include "sysemu/cryptodev.h" > #include "migration/postcopy-ram.h" > @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, > .hdr.size = sizeof(msg.payload.log), > }; > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > /* Send only once with first queue pair */ > if (dev->vq_index != 0) { > return 0; > @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev, > bool reply_supported) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > struct vhost_memory_region *shadow_reg; > int i, fd, shadow_reg_idx, ret; > ram_addr_t offset; > @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev, > vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0); > msg->payload.mem_reg.region = region_buffer; > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev, > bool reply_supported, bool track_ramblocks) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > int i, fd, ret, reg_idx, reg_fd_idx; > struct vhost_memory_region *reg; > MemoryRegion *mr; > @@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev, > vhost_user_fill_msg_region(®ion_buffer, reg, offset); > msg->payload.mem_reg.region = region_buffer; > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, msg, &fd, 1); > if (ret < 0) { > return ret; > @@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > bool config_mem_slots) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > int fds[VHOST_MEMORY_BASELINE_NREGIONS]; > size_t fd_num = 0; > VhostUserMsg msg_reply; > @@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > return ret; > } > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, fds, fd_num); > if (ret < 0) { > return ret; > @@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, > struct vhost_memory *mem) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > int fds[VHOST_MEMORY_BASELINE_NREGIONS]; > size_t fd_num = 0; > bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; > @@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, > return ret; > } > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, fds, fd_num); > if (ret < 0) { > return ret; > @@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64) > return 0; > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg, > } > } > > +/* struct vhost_user *u = dev->opaque; > + * struct VhostUserState *us = u->user; > + * QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + */ > ret = vhost_user_write(dev, msg, NULL, 0); > if (ret < 0) { > return ret; What is this comment saying? > @@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev, > .hdr.size = sizeof(msg.payload.state), > }; > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > > VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index); > if (n) { > @@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid, > }; > memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid)); > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev) > msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; > } > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, &sv[1], 1); > if (ret) { > goto out; > @@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp) > .hdr.flags = VHOST_USER_VERSION, > }; > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg(errp, "Failed to send postcopy_advise to vhost"); > @@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp) > > trace_vhost_user_postcopy_listen(); > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg(errp, "Failed to send postcopy_listen to vhost"); > @@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp) > > trace_vhost_user_postcopy_end_entry(); > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg(errp, "Failed to send postcopy_end to vhost"); > @@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu) > msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev, > .payload.iotlb = *imsg, > }; > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config, > > assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE); > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > msg.payload.config.offset = 0; > msg.payload.config.size = config_len; > ret = vhost_user_write(dev, &msg, NULL, 0); > @@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data, > p = msg.payload.config.region; > memcpy(p, data, size); > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev, > } > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > msg.payload.session.op_code = backend_info->op_code; > msg.payload.session.session_id = backend_info->session_id; > ret = vhost_user_write(dev, &msg, NULL, 0); > @@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev, > return 0; > } > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp) > user->memory_slots = 0; > user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4, > &vhost_user_state_destroy); > + qemu_mutex_init(&user->vhost_user_request_reply_lock); > return true; > } > > @@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user) > user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true); > memory_region_transaction_commit(); > user->chr = NULL; > + qemu_mutex_destroy(&user->vhost_user_request_reply_lock); > } > > > @@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev, > return -ENOTSUP; > } > > + struct VhostUserState *us = vu->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, &fd, 1); > close(fd); > if (ret < 0) { > @@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp) > return -ENOTSUP; > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg_errno(errp, -ret, > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h > index 324cd8663a..e96f12d449 100644 > --- a/include/hw/virtio/vhost-user.h > +++ b/include/hw/virtio/vhost-user.h > @@ -67,6 +67,9 @@ typedef struct VhostUserState { > GPtrArray *notifiers; > int memory_slots; > bool supports_config; > + > + /* Hold lock for a request-reply cycle */ > + QemuMutex vhost_user_request_reply_lock; > } VhostUserState; > > /** > -- > 2.46.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-28 11:15 ` Michael S. Tsirkin @ 2024-08-29 5:39 ` Prasad Pandit 2024-08-29 6:23 ` Michael S. Tsirkin 0 siblings, 1 reply; 15+ messages in thread From: Prasad Pandit @ 2024-08-29 5:39 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit On Wed, 28 Aug 2024 at 16:45, Michael S. Tsirkin <mst@redhat.com> wrote: > > - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding > > the lock for longer fails some tests during rpmbuild(8). > > what do you mean fails rpmbuild? that qemu with this patch can not be compiled? * In V1 of this patch, QEMU_LOCK_GUARD was placed near beginning of the function. But that caused some unit tests to fail reporting TIMEOUT errors. In this V2, QEMU_LOCK_GUARD is placed near vhost_user_write() calls, to reduce the time that lock is held. * Both (V1 & V2) compile well, but fail at '%check' stage while running unit tests (on some machines), ie. rpm package is not built. rpmbuild(8) on F40 machine failed, but koji scratch build with the same SRPM worked fine. Those scratch builds are shared above. RHEL-9 SRPM built well on RHEL-9 host, but failed to build on F40 machine reporting failure at '%check' stage of rpmbuild(8). Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-29 5:39 ` Prasad Pandit @ 2024-08-29 6:23 ` Michael S. Tsirkin 0 siblings, 0 replies; 15+ messages in thread From: Michael S. Tsirkin @ 2024-08-29 6:23 UTC (permalink / raw) To: Prasad Pandit Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit On Thu, Aug 29, 2024 at 11:09:44AM +0530, Prasad Pandit wrote: > On Wed, 28 Aug 2024 at 16:45, Michael S. Tsirkin <mst@redhat.com> wrote: > > > - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding > > > the lock for longer fails some tests during rpmbuild(8). > > > > what do you mean fails rpmbuild? that qemu with this patch can not be compiled? > > * In V1 of this patch, QEMU_LOCK_GUARD was placed near beginning of > the function. But that caused some unit tests to fail reporting > TIMEOUT errors. In this V2, QEMU_LOCK_GUARD is placed near > vhost_user_write() calls, to reduce the time that lock is held. > > * Both (V1 & V2) compile well, but fail at '%check' stage while > running unit tests (on some machines), ie. rpm package is not built. > rpmbuild(8) on F40 machine failed, but koji scratch build with the > same SRPM worked fine. Those scratch builds are shared above. RHEL-9 > SRPM built well on RHEL-9 host, but failed to build on F40 machine > reporting failure at '%check' stage of rpmbuild(8). > > Thank you. > --- > - Prasad Weird. Seems to indicate some kind of deadlock? -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit 2024-08-28 11:15 ` Michael S. Tsirkin @ 2024-08-29 7:42 ` Michael S. Tsirkin 2024-08-29 9:15 ` Prasad Pandit 1 sibling, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2024-08-29 7:42 UTC (permalink / raw) To: Prasad Pandit Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote: > From: Prasad Pandit <pjp@fedoraproject.org> > > QEMU threads use vhost_user_write/read calls to send > and receive request/reply messages from a vhost-user > device. When multiple threads communicate with the > same vhost-user device, they can receive each other's > messages, resulting in an erroneous state. > > When fault_thread exits upon completion of Postcopy > migration, it sends a 'postcopy_end' message to the > vhost-user device. But sometimes 'postcopy_end' message > is sent while vhost device is being setup via > vhost_dev_start(). So maybe vhost_user_postcopy_end should take the BQL? > Thread-1 Thread-2 > > vhost_dev_start postcopy_ram_incoming_cleanup > vhost_device_iotlb_miss postcopy_notify > vhost_backend_update_device_iotlb vhost_user_postcopy_notifier > vhost_user_send_device_iotlb_msg vhost_user_postcopy_end > process_message_reply process_message_reply > vhost_user_read vhost_user_read > vhost_user_read_header vhost_user_read_header > "Fail to update device iotlb" "Failed to receive reply to postcopy_end" > > This creates confusion when vhost-user device receives > 'postcopy_end' message while it is trying to update IOTLB entries. > > vhost_user_read_header: > 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. > vhost_device_iotlb_miss: > 700871,700871: Fail to update device iotlb > vhost_user_postcopy_end: > 700871,700900: Failed to receive reply to postcopy_end > vhost_user_read_header: > 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. > > Here fault thread seems to end the postcopy migration > while another thread is starting the vhost-user device. > > Add a mutex lock to hold for one request-reply cycle > and avoid such race condition. > > Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify") > Suggested-by: Peter Xu <peterx@redhat.com> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> CC Author and reviewer of the offending commit. > --- > hw/virtio/vhost-user.c | 74 ++++++++++++++++++++++++++++++++++ > include/hw/virtio/vhost-user.h | 3 ++ > 2 files changed, 77 insertions(+) > > v2: > - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding > the lock for longer fails some tests during rpmbuild(8). > - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with > this patch, whereas Fedora SRPM does not build. > - The host OS also seems to affect rpmbuild(8). Some SRPMs build well > on RHEL-9, but not on Fedora-40 machine. > - koji builds successful with this patch > https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011 > https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369 > > v1: Use QEMU_LOCK_GUARD(), rename lock variable > - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/ > > v0: > - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/ > - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/ > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 00561daa06..7b030ae2cd 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -24,6 +24,7 @@ > #include "qemu/main-loop.h" > #include "qemu/uuid.h" > #include "qemu/sockets.h" > +#include "qemu/lockable.h" > #include "sysemu/runstate.h" > #include "sysemu/cryptodev.h" > #include "migration/postcopy-ram.h" > @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, > .hdr.size = sizeof(msg.payload.log), > }; > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > /* Send only once with first queue pair */ > if (dev->vq_index != 0) { > return 0; > @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev, > bool reply_supported) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > struct vhost_memory_region *shadow_reg; > int i, fd, shadow_reg_idx, ret; > ram_addr_t offset; > @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev, > vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0); > msg->payload.mem_reg.region = region_buffer; > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev, > bool reply_supported, bool track_ramblocks) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > int i, fd, ret, reg_idx, reg_fd_idx; > struct vhost_memory_region *reg; > MemoryRegion *mr; > @@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev, > vhost_user_fill_msg_region(®ion_buffer, reg, offset); > msg->payload.mem_reg.region = region_buffer; > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, msg, &fd, 1); > if (ret < 0) { > return ret; > @@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > bool config_mem_slots) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > int fds[VHOST_MEMORY_BASELINE_NREGIONS]; > size_t fd_num = 0; > VhostUserMsg msg_reply; > @@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > return ret; > } > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, fds, fd_num); > if (ret < 0) { > return ret; > @@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, > struct vhost_memory *mem) > { > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > int fds[VHOST_MEMORY_BASELINE_NREGIONS]; > size_t fd_num = 0; > bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; > @@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, > return ret; > } > > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, fds, fd_num); > if (ret < 0) { > return ret; > @@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64) > return 0; > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg, > } > } > > +/* struct vhost_user *u = dev->opaque; > + * struct VhostUserState *us = u->user; > + * QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + */ > ret = vhost_user_write(dev, msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev, > .hdr.size = sizeof(msg.payload.state), > }; > struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > > VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index); > if (n) { > @@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid, > }; > memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid)); > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev) > msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; > } > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, &sv[1], 1); > if (ret) { > goto out; > @@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp) > .hdr.flags = VHOST_USER_VERSION, > }; > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg(errp, "Failed to send postcopy_advise to vhost"); > @@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp) > > trace_vhost_user_postcopy_listen(); > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg(errp, "Failed to send postcopy_listen to vhost"); > @@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp) > > trace_vhost_user_postcopy_end_entry(); > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg(errp, "Failed to send postcopy_end to vhost"); > @@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu) > msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev, > .payload.iotlb = *imsg, > }; > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config, > > assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE); > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > msg.payload.config.offset = 0; > msg.payload.config.size = config_len; > ret = vhost_user_write(dev, &msg, NULL, 0); > @@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data, > p = msg.payload.config.region; > memcpy(p, data, size); > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev, > } > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > msg.payload.session.op_code = backend_info->op_code; > msg.payload.session.session_id = backend_info->session_id; > ret = vhost_user_write(dev, &msg, NULL, 0); > @@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev, > return 0; > } > > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > return ret; > @@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp) > user->memory_slots = 0; > user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4, > &vhost_user_state_destroy); > + qemu_mutex_init(&user->vhost_user_request_reply_lock); > return true; > } > > @@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user) > user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true); > memory_region_transaction_commit(); > user->chr = NULL; > + qemu_mutex_destroy(&user->vhost_user_request_reply_lock); > } > > > @@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev, > return -ENOTSUP; > } > > + struct VhostUserState *us = vu->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, &fd, 1); > close(fd); > if (ret < 0) { > @@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp) > return -ENOTSUP; > } > > + struct vhost_user *u = dev->opaque; > + struct VhostUserState *us = u->user; > + QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock); > + > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg_errno(errp, -ret, > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h > index 324cd8663a..e96f12d449 100644 > --- a/include/hw/virtio/vhost-user.h > +++ b/include/hw/virtio/vhost-user.h > @@ -67,6 +67,9 @@ typedef struct VhostUserState { > GPtrArray *notifiers; > int memory_slots; > bool supports_config; > + > + /* Hold lock for a request-reply cycle */ > + QemuMutex vhost_user_request_reply_lock; > } VhostUserState; > > /** > -- > 2.46.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-29 7:42 ` Michael S. Tsirkin @ 2024-08-29 9:15 ` Prasad Pandit 2024-08-29 9:21 ` Michael S. Tsirkin 2024-08-29 14:29 ` Peter Xu 0 siblings, 2 replies; 15+ messages in thread From: Prasad Pandit @ 2024-08-29 9:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau Hello Michael, On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote: > Weird. Seems to indicate some kind of deadlock? * Such a deadlock should occur across all environments I guess, not sure why it happens selectively. It is strange. > So maybe vhost_user_postcopy_end should take the BQL? === diff --git a/migration/savevm.c b/migration/savevm.c index e7c1215671..31acda3818 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque) */ qemu_event_wait(&mis->main_thread_load_event); } + bql_lock(); postcopy_ram_incoming_cleanup(mis); + bql_unlock(); if (load_res < 0) { /* === * Actually a BQL patch above was tested and it worked fine. But not sure if it is an acceptable solution. Another contention was taking BQL could make things more complicated, so a local vhost-user specific lock should be better. ...wdyt? --- - Prasad ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-29 9:15 ` Prasad Pandit @ 2024-08-29 9:21 ` Michael S. Tsirkin 2024-08-29 14:29 ` Peter Xu 1 sibling, 0 replies; 15+ messages in thread From: Michael S. Tsirkin @ 2024-08-29 9:21 UTC (permalink / raw) To: Prasad Pandit Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote: > Hello Michael, > > On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote: > > Weird. Seems to indicate some kind of deadlock? > > * Such a deadlock should occur across all environments I guess, not > sure why it happens selectively. It is strange. Some kind of race? > > So maybe vhost_user_postcopy_end should take the BQL? > === > diff --git a/migration/savevm.c b/migration/savevm.c > index e7c1215671..31acda3818 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque) > */ > qemu_event_wait(&mis->main_thread_load_event); > } > + bql_lock(); > postcopy_ram_incoming_cleanup(mis); > + bql_unlock(); > > if (load_res < 0) { > /* > === > > * Actually a BQL patch above was tested and it worked fine. But not > sure if it is an acceptable solution. Another contention was taking > BQL could make things more complicated, so a local vhost-user specific > lock should be better. > > ...wdyt? > --- > - Prasad Keep it simple, is my advice. Not causing regressions is good. -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-29 9:15 ` Prasad Pandit 2024-08-29 9:21 ` Michael S. Tsirkin @ 2024-08-29 14:29 ` Peter Xu 2024-08-29 15:05 ` Michael S. Tsirkin 1 sibling, 1 reply; 15+ messages in thread From: Peter Xu @ 2024-08-29 14:29 UTC (permalink / raw) To: Prasad Pandit Cc: Michael S. Tsirkin, qemu-devel, farosas, jasowang, mcoqueli, Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote: > Hello Michael, > > On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote: > > Weird. Seems to indicate some kind of deadlock? > > * Such a deadlock should occur across all environments I guess, not > sure why it happens selectively. It is strange. > > > So maybe vhost_user_postcopy_end should take the BQL? > === > diff --git a/migration/savevm.c b/migration/savevm.c > index e7c1215671..31acda3818 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque) > */ > qemu_event_wait(&mis->main_thread_load_event); > } > + bql_lock(); > postcopy_ram_incoming_cleanup(mis); > + bql_unlock(); > > if (load_res < 0) { > /* > === > > * Actually a BQL patch above was tested and it worked fine. But not > sure if it is an acceptable solution. Another contention was taking > BQL could make things more complicated, so a local vhost-user specific > lock should be better. > > ...wdyt? I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not in postcopy code directly. I'm recently looking at how to make precopy load even take less bql and even make it a separate thread. Above is definitely going backwards, per we discussed already internally. I cherish postcopy doesn't need to take bql on its own in most paths, and we shouldn't add unnecessary bql requirement even if vhost-user isn't used. Personally I still prefer we look into why a separate mutex won't work and why that timed out; that could be part of whoever is going to investigate the whole issue (including the hang later on). Otherwise I'm ok from migration pov that we take bql in the vhost-user hook, but not in savevm.c. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-29 14:29 ` Peter Xu @ 2024-08-29 15:05 ` Michael S. Tsirkin 2024-08-29 20:29 ` Peter Xu 0 siblings, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2024-08-29 15:05 UTC (permalink / raw) To: Peter Xu Cc: Prasad Pandit, qemu-devel, farosas, jasowang, mcoqueli, Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau On Thu, Aug 29, 2024 at 10:29:24AM -0400, Peter Xu wrote: > On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote: > > Hello Michael, > > > > On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote: > > > Weird. Seems to indicate some kind of deadlock? > > > > * Such a deadlock should occur across all environments I guess, not > > sure why it happens selectively. It is strange. > > > > > So maybe vhost_user_postcopy_end should take the BQL? > > === > > diff --git a/migration/savevm.c b/migration/savevm.c > > index e7c1215671..31acda3818 100644 > > --- a/migration/savevm.c > > +++ b/migration/savevm.c > > @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque) > > */ > > qemu_event_wait(&mis->main_thread_load_event); > > } > > + bql_lock(); > > postcopy_ram_incoming_cleanup(mis); > > + bql_unlock(); > > > > if (load_res < 0) { > > /* > > === > > > > * Actually a BQL patch above was tested and it worked fine. But not > > sure if it is an acceptable solution. Another contention was taking > > BQL could make things more complicated, so a local vhost-user specific > > lock should be better. > > > > ...wdyt? > > I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not > in postcopy code directly. maybe that's better, ok. > I'm recently looking at how to make precopy > load even take less bql and even make it a separate thread. Above is > definitely going backwards, per we discussed already internally. At the same time a small bugfix is better, can be backported. > I cherish postcopy doesn't need to take bql on its own in most paths, and > we shouldn't add unnecessary bql requirement even if vhost-user isn't used. > > Personally I still prefer we look into why a separate mutex won't work and > why that timed out; that could be part of whoever is going to investigate > the whole issue (including the hang later on). Otherwise I'm ok from > migration pov that we take bql in the vhost-user hook, but not in savevm.c. > > Thanks, ok > -- > Peter Xu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] vhost-user: add a request-reply lock 2024-08-29 15:05 ` Michael S. Tsirkin @ 2024-08-29 20:29 ` Peter Xu 0 siblings, 0 replies; 15+ messages in thread From: Peter Xu @ 2024-08-29 20:29 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Prasad Pandit, qemu-devel, farosas, jasowang, mcoqueli, Prasad Pandit, Dr. David Alan Gilbert, Marc-André Lureau On Thu, Aug 29, 2024 at 11:05:15AM -0400, Michael S. Tsirkin wrote: > > Personally I still prefer we look into why a separate mutex won't work and > > why that timed out; that could be part of whoever is going to investigate > > the whole issue (including the hang later on). Otherwise I'm ok from > > migration pov that we take bql in the vhost-user hook, but not in savevm.c. > > ok Just something as a heads-up comment in case someone might keep looking into the hang problem: I'm not sure whether the brew build failure on the test case is relevant to the hang issue we observed, or even it is the hang issue itself - if the failure is about a timeout that one qemu hanged. IOW, whoever cannot reproduce the hang might leverage the mutex patch to reproduce, if we want to figure out the last missing piece of the puzzle.. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors 2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit 2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit 2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit @ 2024-09-10 17:10 ` Michael S. Tsirkin 2024-09-11 7:14 ` Prasad Pandit 2 siblings, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2024-09-10 17:10 UTC (permalink / raw) To: Prasad Pandit Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit On Wed, Aug 28, 2024 at 03:39:12PM +0530, Prasad Pandit wrote: > From: Prasad Pandit <pjp@fedoraproject.org> > > Hello, > > * virsh(1) offers multiple options to initiate Postcopy migration: > > 1) virsh migrate --postcopy --postcopy-after-precopy > 2) virsh migrate --postcopy + virsh migrate-postcopy > 3) virsh migrate --postcopy --timeout <N> --timeout-postcopy > > When Postcopy migration is invoked via method (2) or (3) above, > the migrated guest on the destination host hangs sometimes. > > * During Postcopy migration, multiple threads are spawned on the destination > host to start the guest and setup devices. One such thread starts vhost > device via vhost_dev_start() function and another called fault_thread handles > page faults in user space using kernel's userfaultfd(2) system. > > * When fault_thread exits upon completion of Postcopy migration, it sends a > 'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end' > message is sent while vhost device is being setup via vhost_dev_start(). > > Thread-1 Thread-2 > > vhost_dev_start postcopy_ram_incoming_cleanup > vhost_device_iotlb_miss postcopy_notify > vhost_backend_update_device_iotlb vhost_user_postcopy_notifier > vhost_user_send_device_iotlb_msg vhost_user_postcopy_end > process_message_reply process_message_reply > vhost_user_read vhost_user_read > vhost_user_read_header vhost_user_read_header > "Fail to update device iotlb" "Failed to receive reply to postcopy_end" > > This creates confusion when vhost device receives 'postcopy_end' message while > it is still trying to update IOTLB entries. > > This seems to leave the guest in a stranded/hung state because fault_thread > has exited saying Postcopy migration has ended, but vhost-device is probably > still expecting updates. QEMU logs following errors on the destination host > === > ... > qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. > qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb > qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end > qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. > qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb > qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5. > qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb > qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5. > qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb > qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5. > qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb > === So are we going to see a version with BQL? > * Couple of patches here help to fix/handle these errors. > > Thank you. > --- > Prasad Pandit (2): > vhost: fail device start if iotlb update fails > vhost-user: add a request-reply lock > > hw/virtio/vhost-user.c | 74 ++++++++++++++++++++++++++++++++++ > hw/virtio/vhost.c | 6 ++- > include/hw/virtio/vhost-user.h | 3 ++ > 3 files changed, 82 insertions(+), 1 deletion(-) > > -- > 2.46.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors 2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin @ 2024-09-11 7:14 ` Prasad Pandit 2024-09-11 9:46 ` Michael S. Tsirkin 0 siblings, 1 reply; 15+ messages in thread From: Prasad Pandit @ 2024-09-11 7:14 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit Hello Michael, On Tue, 10 Sept 2024 at 22:40, Michael S. Tsirkin <mst@redhat.com> wrote: > So are we going to see a version with BQL? === diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index cdf9af4a4b..96ac0ed93b 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2079,6 +2079,7 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp) trace_vhost_user_postcopy_end_entry(); + BQL_LOCK_GUARD(); ret = vhost_user_write(dev, &msg, NULL, 0); if (ret < 0) { error_setg(errp, "Failed to send postcopy_end to vhost"); -- 2.43.5 === * We ran the test with the above BQL patch, but it did not help to fix race errors. I'm continuing to debug it, will update here soon. Thank you. --- - Prasad ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors 2024-09-11 7:14 ` Prasad Pandit @ 2024-09-11 9:46 ` Michael S. Tsirkin 0 siblings, 0 replies; 15+ messages in thread From: Michael S. Tsirkin @ 2024-09-11 9:46 UTC (permalink / raw) To: Prasad Pandit Cc: qemu-devel, farosas, jasowang, mcoqueli, peterx, Prasad Pandit On Wed, Sep 11, 2024 at 12:44:59PM +0530, Prasad Pandit wrote: > Hello Michael, > > On Tue, 10 Sept 2024 at 22:40, Michael S. Tsirkin <mst@redhat.com> wrote: > > So are we going to see a version with BQL? > > === > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index cdf9af4a4b..96ac0ed93b 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -2079,6 +2079,7 @@ static int vhost_user_postcopy_end(struct > vhost_dev *dev, Error **errp) > > trace_vhost_user_postcopy_end_entry(); > > + BQL_LOCK_GUARD(); > ret = vhost_user_write(dev, &msg, NULL, 0); > if (ret < 0) { > error_setg(errp, "Failed to send postcopy_end to vhost"); > -- > 2.43.5 > === > > * We ran the test with the above BQL patch, but it did not help to fix > race errors. I'm continuing to debug it, will update here soon. > > Thank you. > --- > - Prasad Thanks! I have a suspicion there's a path where we wait for the migration thread until BQL. -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-09-11 9:46 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-28 10:09 [PATCH v2 0/2] Postcopy migration and vhost-user errors Prasad Pandit 2024-08-28 10:09 ` [PATCH v2 1/2] vhost: fail device start if iotlb update fails Prasad Pandit 2024-08-28 10:09 ` [PATCH v2 2/2] vhost-user: add a request-reply lock Prasad Pandit 2024-08-28 11:15 ` Michael S. Tsirkin 2024-08-29 5:39 ` Prasad Pandit 2024-08-29 6:23 ` Michael S. Tsirkin 2024-08-29 7:42 ` Michael S. Tsirkin 2024-08-29 9:15 ` Prasad Pandit 2024-08-29 9:21 ` Michael S. Tsirkin 2024-08-29 14:29 ` Peter Xu 2024-08-29 15:05 ` Michael S. Tsirkin 2024-08-29 20:29 ` Peter Xu 2024-09-10 17:10 ` [PATCH v2 0/2] Postcopy migration and vhost-user errors Michael S. Tsirkin 2024-09-11 7:14 ` Prasad Pandit 2024-09-11 9:46 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).