From: Prasad Pandit <ppandit@redhat.com>
To: qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>,
Jason Wang <jasowang@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
mcoqueli@redhat.com, Prasad Pandit <pjp@fedoraproject.org>
Subject: [PATCH 0/2] Postcopy migration and vhost-user errors
Date: Thu, 11 Jul 2024 18:44:22 +0530 [thread overview]
Message-ID: <20240711131424.181615-1-ppandit@redhat.com> (raw)
From: Prasad Pandit <pjp@fedoraproject.org>
Hello,
* virsh(1) offers multiple options to initiate Postcopy migration:
1) virsh migrate --postcopy --postcopy-after-precopy
2) virsh migrate --postcopy + virsh migrate-postcopy
3) virsh migrate --postcopy --timeout <N> --timeout-postcopy
When Postcopy migration is invoked via method (2) or (3) above,
the guest on the destination host seems to hang or get stuck sometimes.
* During Postcopy migration, multiple threads are spawned on the destination
host to start the guest and setup devices. One such thread starts vhost
device via vhost_dev_start() function and another called fault_thread handles
page faults in user space using kernel's userfaultfd(2) system.
When fault_thread exits upon completion of Postcopy migration, it sends a
'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
message is sent while vhost device is being setup via vhost_dev_start().
Thread-1 Thread-2
vhost_dev_start postcopy_ram_incoming_cleanup
vhost_device_iotlb_miss postcopy_notify
vhost_backend_update_device_iotlb vhost_user_postcopy_notifier
vhost_user_send_device_iotlb_msg vhost_user_postcopy_end
process_message_reply process_message_reply
vhost_user_read vhost_user_read
vhost_user_read_header vhost_user_read_header
"Fail to update device iotlb" "Failed to receive reply to postcopy_end"
This creates confusion when vhost device receives 'postcopy_end' message while
it is still trying to update IOTLB entries.
This seems to leave the guest in a stranded/hung state because fault_thread
has exited saying Postcopy migration has ended, but vhost-device is probably
still expecting updates. QEMU logs following errors on the destination host
===
...
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
===
* Couple of patches here help to fix/handle these errors.
Thank you.
---
Prasad Pandit (2):
vhost-user: add a write-read lock
vhost: fail device start if iotlb update fails
hw/virtio/vhost-user.c | 423 +++++++++++++++++++--------------
hw/virtio/vhost.c | 6 +-
include/hw/virtio/vhost-user.h | 3 +
3 files changed, 259 insertions(+), 173 deletions(-)
--
2.45.2
next reply other threads:[~2024-07-11 13:17 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-11 13:14 Prasad Pandit [this message]
2024-07-11 13:14 ` [PATCH 1/2] vhost-user: add a write-read lock Prasad Pandit
2024-07-11 14:39 ` Michael S. Tsirkin
2024-07-15 10:57 ` Prasad Pandit
2024-07-11 15:41 ` Peter Xu
2024-07-15 8:14 ` Prasad Pandit
2024-07-15 13:27 ` Peter Xu
2024-07-16 10:19 ` Prasad Pandit
2024-07-20 19:41 ` Michael S. Tsirkin
2024-07-23 4:58 ` Prasad Pandit
2024-07-11 13:14 ` [PATCH 2/2] vhost: fail device start if iotlb update fails Prasad Pandit
2024-07-11 15:38 ` [PATCH 0/2] Postcopy migration and vhost-user errors Peter Xu
2024-07-15 10:14 ` Prasad Pandit
2024-07-15 13:39 ` Peter Xu
2024-07-16 10:14 ` Prasad Pandit
2024-07-16 22:02 ` Peter Xu
2024-07-17 8:55 ` Michael S. Tsirkin
2024-07-17 13:33 ` Peter Xu
2024-07-17 13:40 ` Michael S. Tsirkin
2024-07-17 13:47 ` Peter Xu
2024-07-20 19:41 ` Michael S. Tsirkin
2024-07-23 5:03 ` Prasad Pandit
2024-07-23 17:52 ` Peter Xu
2024-07-23 17:57 ` Prasad Pandit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240711131424.181615-1-ppandit@redhat.com \
--to=ppandit@redhat.com \
--cc=farosas@suse.de \
--cc=jasowang@redhat.com \
--cc=mcoqueli@redhat.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=pjp@fedoraproject.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).