qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Prasad Pandit <ppandit@redhat.com>
To: qemu-devel@nongnu.org
Cc: farosas@suse.de, jasowang@redhat.com, mcoqueli@redhat.com,
	mst@redhat.com, peterx@redhat.com, pjp@fedoraproject.org
Subject: [RFC-PATCH v1 0/2] Postcopy migration and vhost-user errors
Date: Thu,  8 Aug 2024 15:21:45 +0530	[thread overview]
Message-ID: <20240808095147.291626-1-ppandit@redhat.com> (raw)

From: Prasad Pandit <pjp@fedoraproject.org>

Hello,

* virsh(1) offers multiple options to initiate Postcopy migration:

    1) virsh migrate --postcopy --postcopy-after-precopy
    2) virsh migrate --postcopy + virsh migrate-postcopy
    3) virsh migrate --postcopy --timeout <N> --timeout-postcopy

When Postcopy migration is invoked via method (2) or (3) above,
the guest on the destination host seems to hang or get stuck sometimes.

* During Postcopy migration, multiple threads are spawned on the destination
host to start the guest and setup devices. One such thread starts vhost
device via vhost_dev_start() function and another called fault_thread handles
page faults in user space using kernel's userfaultfd(2) system.

* When fault_thread exits upon completion of Postcopy migration, it sends a
'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
message is sent while vhost device is being setup via vhost_dev_start().

     Thread-1                                  Thread-2

vhost_dev_start                        postcopy_ram_incoming_cleanup
 vhost_device_iotlb_miss                postcopy_notify
  vhost_backend_update_device_iotlb      vhost_user_postcopy_notifier
   vhost_user_send_device_iotlb_msg       vhost_user_postcopy_end
    process_message_reply                  process_message_reply
     vhost_user_read                        vhost_user_read
      vhost_user_read_header                 vhost_user_read_header
       "Fail to update device iotlb"          "Failed to receive reply to postcopy_end"

This creates confusion when vhost device receives 'postcopy_end' message while
it is still trying to update IOTLB entries.

This seems to leave the guest in a stranded/hung state because fault_thread
has exited saying Postcopy migration has ended, but vhost-device is probably
still expecting updates. QEMU logs following errors on the destination host
===
...
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
===

* Couple of patches here help to fix/handle these errors.

Thank you.
---
Prasad Pandit (2):
  vhost: fail device start if iotlb update fails
  vhost-user: add a request-reply lock

 hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
 hw/virtio/vhost.c              |  6 ++-
 include/hw/virtio/vhost-user.h |  3 ++
 3 files changed, 82 insertions(+), 1 deletion(-)

--
2.46.0



             reply	other threads:[~2024-08-08  9:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-08  9:51 Prasad Pandit [this message]
2024-08-08  9:51 ` [RFC-PATCH v1 1/2] vhost: fail device start if iotlb update fails Prasad Pandit
2024-08-08  9:51 ` [RFC-PATCH v1 2/2] vhost-user: add a request-reply lock Prasad Pandit
2024-08-19 12:02   ` [RFC-PATCH v2] " Prasad Pandit
2024-08-19 15:41     ` Michael S. Tsirkin
2024-08-19 15:50       ` Michael S. Tsirkin
2024-08-27 10:59         ` Prasad Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240808095147.291626-1-ppandit@redhat.com \
    --to=ppandit@redhat.com \
    --cc=farosas@suse.de \
    --cc=jasowang@redhat.com \
    --cc=mcoqueli@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pjp@fedoraproject.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).