qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Straub <lukasstraub2@web.de>
To: Yong Huang <yong.huang@smartx.com>
Cc: qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	Fabiano Rosas <farosas@suse.de>
Subject: Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
Date: Fri, 8 Aug 2025 09:01:28 +0200	[thread overview]
Message-ID: <20250808090054.13cb8342@penguin> (raw)
In-Reply-To: <CAK9dgmZb=5uEwVq65Ygcza0+qtng+-5zmtQRdviX2npg_qhJRQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3159 bytes --]

On Fri, 8 Aug 2025 10:36:24 +0800
Yong Huang <yong.huang@smartx.com> wrote:

> On Thu, Aug 7, 2025 at 5:36 PM Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > On Thu,  7 Aug 2025 10:41:17 +0800
> > yong.huang@smartx.com wrote:
> >  
> > > From: Hyman Huang <yong.huang@smartx.com>
> > >
> > > When there are network issues like missing TCP ACKs on the send
> > > side during the multifd live migration. At the send side, the error
> > > "Connection timed out" is thrown out and source QEMU process stop
> > > sending data, at the receive side, The IO-channels may be blocked
> > > at recvmsg() and thus the main loop gets stuck and fails to respond
> > > to QMP commands consequently.
> > > ...  
> >
> > Hi Hyman Huang,
> >
> > Have you tried the 'yank' command to shutdown the sockets? It exactly
> > meant to recover from hangs and should solve your issue.
> >
> > https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature  
> 
> 
> Thanks for the comment and advice.
> 
> Let me give more details about the migration state when the issue happens:
> 
> On the source side, libvirt has already aborted the migration job:
> 
> $ virsh domjobinfo fdecd242-f278-4308-8c3b-46e144e55f63
> Job type:         Failed
> Operation:        Outgoing migration
> 
> QMP query-yank shows that there is no migration yank instance:
> 
> $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
> '{"execute":"query-yank"}' --pretty
> {
>   "return": [
>     {
>       "type": "chardev",
>       "id": "charmonitor"
>     },
>     {
>       "type": "chardev",
>       "id": "charchannel0"
>     },
>     {
>       "type": "chardev",
>       "id": "libvirt-2-virtio-format"
>     }
>   ],
>   "id": "libvirt-5217"
> }

You are supposed to run it on the destination side, there the migration
yank instance should be present if qemu hangs in the migration code.

Also, you need to execute it as an out-of-band command to bypass the
main loop. Like this:

'{"exec-oob": "yank", "id": "yank0", "arguments": {"instances": [ {"type": "migration"} ] } }'

I'm not sure if libvirt can do that, maybe you need to add an
additional qmp socket and do it outside of libvirt. Note that you need
to enable the oob feature during qmp negotiation, like this:

'{ "execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }'

Regards,
Lukas Straub

> 
> The libvirt migration job is stuck as the following backtrace shows; it
> shows that migration is waiting for the "Finish" RPC on the destination
> side to return.
> 
> ...
> 
> IMHO, the key reason for the issue is that QEMU fails to run the main loop
> and fails to respond to QMP, which is not what we usually expected.
> 
> Giving the Libvirt a window of time to issue a QMP and kill the VM is the
> ideal solution for this issue; this provides an automatic method.
> 
> I do not dig the yank feature, perhaps it is helpful, but only manually?
> 
> After all, these two options are not exclusive of one another,  I think.
> 
> 
> >
> > Best regards,
> > Lukas Straub
> >  
> 
> Thanks,
> Yong
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-08-08  7:04 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
2025-08-07  9:32 ` Lukas Straub
2025-08-07  9:36 ` Lukas Straub
2025-08-08  2:36   ` Yong Huang
2025-08-08  7:01     ` Lukas Straub [this message]
2025-08-08  8:02       ` Yong Huang
2025-08-08 13:55         ` Fabiano Rosas
2025-08-08 15:37           ` Peter Xu
2025-08-11  2:25             ` Yong Huang
2025-08-11  7:03             ` Lukas Straub
2025-08-11 13:53               ` Fabiano Rosas
2025-08-19 10:31                 ` Daniel P. Berrangé
2025-08-19 12:03                   ` Lukas Straub
2025-08-19 12:07                     ` Daniel P. Berrangé
2025-08-19 20:03                       ` Peter Xu
2025-08-11  2:27           ` Yong Huang
2025-08-08  6:36 ` Yong Huang
2025-08-08 15:42 ` Peter Xu
2025-08-11  2:02   ` Yong Huang
2025-08-19 10:19 ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250808090054.13cb8342@penguin \
    --to=lukasstraub2@web.de \
    --cc=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).