qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Postcopy migration and vhost-user errors
@ 2024-07-11 13:14 Prasad Pandit
  2024-07-11 13:14 ` [PATCH 1/2] vhost-user: add a write-read lock Prasad Pandit
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Prasad Pandit @ 2024-07-11 13:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Jason Wang, Michael S . Tsirkin,
	mcoqueli, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Hello,

* virsh(1) offers multiple options to initiate Postcopy migration:

    1) virsh migrate --postcopy --postcopy-after-precopy
    2) virsh migrate --postcopy + virsh migrate-postcopy
    3) virsh migrate --postcopy --timeout <N> --timeout-postcopy

When Postcopy migration is invoked via method (2) or (3) above,
the guest on the destination host seems to hang or get stuck sometimes.

* During Postcopy migration, multiple threads are spawned on the destination
host to start the guest and setup devices. One such thread starts vhost
device via vhost_dev_start() function and another called fault_thread handles
page faults in user space using kernel's userfaultfd(2) system.

When fault_thread exits upon completion of Postcopy migration, it sends a
'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
message is sent while vhost device is being setup via vhost_dev_start().

     Thread-1                                  Thread-2

vhost_dev_start                        postcopy_ram_incoming_cleanup
 vhost_device_iotlb_miss                postcopy_notify
  vhost_backend_update_device_iotlb      vhost_user_postcopy_notifier
   vhost_user_send_device_iotlb_msg       vhost_user_postcopy_end
    process_message_reply                  process_message_reply
     vhost_user_read                        vhost_user_read
      vhost_user_read_header                 vhost_user_read_header
       "Fail to update device iotlb"          "Failed to receive reply to postcopy_end"

This creates confusion when vhost device receives 'postcopy_end' message while
it is still trying to update IOTLB entries.

This seems to leave the guest in a stranded/hung state because fault_thread
has exited saying Postcopy migration has ended, but vhost-device is probably
still expecting updates. QEMU logs following errors on the destination host
===
...
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
===

* Couple of patches here help to fix/handle these errors.

Thank you.
---
Prasad Pandit (2):
  vhost-user: add a write-read lock
  vhost: fail device start if iotlb update fails

 hw/virtio/vhost-user.c         | 423 +++++++++++++++++++--------------
 hw/virtio/vhost.c              |   6 +-
 include/hw/virtio/vhost-user.h |   3 +
 3 files changed, 259 insertions(+), 173 deletions(-)

--
2.45.2



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-07-23 17:58 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-11 13:14 [PATCH 0/2] Postcopy migration and vhost-user errors Prasad Pandit
2024-07-11 13:14 ` [PATCH 1/2] vhost-user: add a write-read lock Prasad Pandit
2024-07-11 14:39   ` Michael S. Tsirkin
2024-07-15 10:57     ` Prasad Pandit
2024-07-11 15:41   ` Peter Xu
2024-07-15  8:14     ` Prasad Pandit
2024-07-15 13:27       ` Peter Xu
2024-07-16 10:19         ` Prasad Pandit
2024-07-20 19:41   ` Michael S. Tsirkin
2024-07-23  4:58     ` Prasad Pandit
2024-07-11 13:14 ` [PATCH 2/2] vhost: fail device start if iotlb update fails Prasad Pandit
2024-07-11 15:38 ` [PATCH 0/2] Postcopy migration and vhost-user errors Peter Xu
2024-07-15 10:14   ` Prasad Pandit
2024-07-15 13:39     ` Peter Xu
2024-07-16 10:14       ` Prasad Pandit
2024-07-16 22:02         ` Peter Xu
2024-07-17  8:55           ` Michael S. Tsirkin
2024-07-17 13:33             ` Peter Xu
2024-07-17 13:40               ` Michael S. Tsirkin
2024-07-17 13:47                 ` Peter Xu
2024-07-20 19:41                   ` Michael S. Tsirkin
2024-07-23  5:03                     ` Prasad Pandit
2024-07-23 17:52                       ` Peter Xu
2024-07-23 17:57                       ` Prasad Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).