qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Prasad Pandit <ppandit@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Jason Wang <jasowang@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	mcoqueli@redhat.com, Prasad Pandit <pjp@fedoraproject.org>
Subject: Re: [PATCH 0/2] Postcopy migration and vhost-user errors
Date: Thu, 11 Jul 2024 11:38:38 -0400	[thread overview]
Message-ID: <Zo_8fpKH8oBA8WV1@x1n> (raw)
In-Reply-To: <20240711131424.181615-1-ppandit@redhat.com>

On Thu, Jul 11, 2024 at 06:44:22PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> Hello,
> 
> * virsh(1) offers multiple options to initiate Postcopy migration:
> 
>     1) virsh migrate --postcopy --postcopy-after-precopy
>     2) virsh migrate --postcopy + virsh migrate-postcopy
>     3) virsh migrate --postcopy --timeout <N> --timeout-postcopy
> 
> When Postcopy migration is invoked via method (2) or (3) above,
> the guest on the destination host seems to hang or get stuck sometimes.
> 
> * During Postcopy migration, multiple threads are spawned on the destination
> host to start the guest and setup devices. One such thread starts vhost
> device via vhost_dev_start() function and another called fault_thread handles

Hmm, I thought it was one of the vcpu threads that invoked
vhost_dev_start(), rather than any migration thread?

> page faults in user space using kernel's userfaultfd(2) system.
> 
> When fault_thread exits upon completion of Postcopy migration, it sends a
> 'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
> message is sent while vhost device is being setup via vhost_dev_start().
> 
>      Thread-1                                  Thread-2
> 
> vhost_dev_start                        postcopy_ram_incoming_cleanup
>  vhost_device_iotlb_miss                postcopy_notify
>   vhost_backend_update_device_iotlb      vhost_user_postcopy_notifier
>    vhost_user_send_device_iotlb_msg       vhost_user_postcopy_end
>     process_message_reply                  process_message_reply
>      vhost_user_read                        vhost_user_read
>       vhost_user_read_header                 vhost_user_read_header
>        "Fail to update device iotlb"          "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost device receives 'postcopy_end' message while
> it is still trying to update IOTLB entries.
> 
> This seems to leave the guest in a stranded/hung state because fault_thread
> has exited saying Postcopy migration has ended, but vhost-device is probably
> still expecting updates. QEMU logs following errors on the destination host
> ===
> ...
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to postcopy_end
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x8 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x16 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> ===
> 
> * Couple of patches here help to fix/handle these errors.

I remember after you added the rwlock, there's still a hang issue.

Did you investigated that?  Or do you mean this series will fix all the
problems?

Thanks,

> 
> Thank you.
> ---
> Prasad Pandit (2):
>   vhost-user: add a write-read lock
>   vhost: fail device start if iotlb update fails
> 
>  hw/virtio/vhost-user.c         | 423 +++++++++++++++++++--------------
>  hw/virtio/vhost.c              |   6 +-
>  include/hw/virtio/vhost-user.h |   3 +
>  3 files changed, 259 insertions(+), 173 deletions(-)
> 
> --
> 2.45.2
> 

-- 
Peter Xu



  parent reply	other threads:[~2024-07-11 15:39 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11 13:14 [PATCH 0/2] Postcopy migration and vhost-user errors Prasad Pandit
2024-07-11 13:14 ` [PATCH 1/2] vhost-user: add a write-read lock Prasad Pandit
2024-07-11 14:39   ` Michael S. Tsirkin
2024-07-15 10:57     ` Prasad Pandit
2024-07-11 15:41   ` Peter Xu
2024-07-15  8:14     ` Prasad Pandit
2024-07-15 13:27       ` Peter Xu
2024-07-16 10:19         ` Prasad Pandit
2024-07-20 19:41   ` Michael S. Tsirkin
2024-07-23  4:58     ` Prasad Pandit
2024-07-11 13:14 ` [PATCH 2/2] vhost: fail device start if iotlb update fails Prasad Pandit
2024-07-11 15:38 ` Peter Xu [this message]
2024-07-15 10:14   ` [PATCH 0/2] Postcopy migration and vhost-user errors Prasad Pandit
2024-07-15 13:39     ` Peter Xu
2024-07-16 10:14       ` Prasad Pandit
2024-07-16 22:02         ` Peter Xu
2024-07-17  8:55           ` Michael S. Tsirkin
2024-07-17 13:33             ` Peter Xu
2024-07-17 13:40               ` Michael S. Tsirkin
2024-07-17 13:47                 ` Peter Xu
2024-07-20 19:41                   ` Michael S. Tsirkin
2024-07-23  5:03                     ` Prasad Pandit
2024-07-23 17:52                       ` Peter Xu
2024-07-23 17:57                       ` Prasad Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zo_8fpKH8oBA8WV1@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=jasowang@redhat.com \
    --cc=mcoqueli@redhat.com \
    --cc=mst@redhat.com \
    --cc=pjp@fedoraproject.org \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).